DeepSeek Researchers Unveil DeepSeek-V3.2 and DeepSeek-V3.2-Speciale for Enhanced Long Context Reasoning and Agentic Tasks

DeepSeek research has recently unveiled two innovative models, DeepSeek-V3.2 and its special version, DeepSeek-V3.2-Speciale. These models are designed to enhance reasoning capabilities, especially for tasks that involve long contexts and tool usage. They aim to provide high-quality reasoning without the hefty computational costs typically associated with such advanced systems.

The new models leverage a technique called DeepSeek Sparse Attention (DSA). This method allows the models to handle long sequences more efficiently by changing the way attention is calculated. Instead of the traditional approach, which can be very resource-intensive, DSA reduces the complexity from quadratic to linear. This means that DeepSeek-V3.2 can process longer inputs while using less memory and computational power, achieving a 50% reduction in costs for long context tasks.

DeepSeek-V3.2 has about 671 billion parameters, with 37 billion active parameters per token. This architecture builds on the previous version, DeepSeek-V3.1, but introduces a new approach to attention that focuses on the most relevant parts of the input data. The model first identifies key tokens and then processes them, which improves speed and efficiency.

Furthermore, the training process for DSA involves two stages. Initially, it uses dense attention to establish a baseline, then transitions to a sparse approach to refine its performance. This careful training allows the model to maintain high accuracy while being less resource-intensive.

In addition to its architectural advancements, DeepSeek-V3.2 employs Group Relative Policy Optimization (GRPO) for reinforcement learning. This method enhances the model’s ability to solve specific tasks, such as mathematics and coding, by focusing on specialized training runs. The team reports that more than 10% of the training compute budget is now allocated to this reinforcement learning phase, which helps improve the model’s performance on various benchmarks.

Both models have been tested against public benchmarks and competitions, achieving results comparable to GPT-5 and even reaching levels similar to Gemini 3.0 Pro in some instances. Notably, DeepSeek-V3.2-Speciale has excelled in prestigious competitions, earning gold medals in events like the International Mathematical Olympiad and the International Olympiad in Informatics.

The research team has also created a large synthetic dataset to train the models, consisting of over 1,800 environments and 85,000 tasks. This dataset helps the models learn to tackle complex problems effectively.

One of the standout features of DeepSeek-V3.2 is its ability to integrate reasoning into tool use. During inference, the model can switch between thinking and non-thinking modes, allowing it to maintain a chain of thought across multiple tool interactions. This is a significant advancement for applications requiring consistent reasoning over time.

Overall, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale represent a leap forward in making powerful reasoning models more accessible and efficient. With their open weights and production APIs, they promise to be valuable tools for developers and researchers alike.