DeepSeek AI has launched a new AI model called DeepSeekMath-V2, designed to tackle challenging math problems typically seen in competitions like the International Mathematical Olympiad (IMO) and the Putnam Competition. This model is built on a larger framework known as DeepSeek-V3.2 and features a remarkable 685 billion parameters. It’s available for public use on Hugging Face under an open-source license.
In recent evaluations, DeepSeekMath-V2 has shown impressive results. It scored at a gold level in the IMO 2025 and the CMO 2024 competitions. Additionally, it achieved 118 out of 120 points in the Putnam 2024 exam, which is a significant accomplishment, especially when compared to the best human scores.
One of the key innovations of DeepSeekMath-V2 is its focus on proof quality rather than just the final answer. Many existing models only reward correct answers, but this can lead to situations where a model arrives at the right answer through faulty reasoning. DeepSeekMath-V2 addresses this by evaluating whether the proof is logically sound and complete, which is crucial for tasks that require detailed explanations, like theorem proving.
The development process of DeepSeekMath-V2 involved training a verifier before the proof generator. This verifier assesses both the problem and the proposed proof, providing a quality score and a natural language analysis. The training data was sourced from various math competitions, ensuring the model learns from a diverse set of proof-style problems.
To enhance the reliability of the verifier, a meta verifier was introduced. This component checks the verifier’s analysis for accuracy and consistency, helping to prevent it from generating misleading critiques. This two-tier verification system improves the overall quality of the proofs generated by the model.
Once the verifier was established, the team trained the proof generator. This generator not only produces solutions but also includes a self-analysis that aligns with the verifier’s criteria. The model learns to improve its proofs through a process of sequential refinement, where it iteratively enhances its outputs based on feedback from the verifier.
As the model continues to improve, it generates increasingly complex proofs. To keep the training data relevant, an automatic labeling system was created. This system uses the verifier’s analyses to classify proofs as correct or incorrect, reducing the need for human intervention.
DeepSeekMath-V2 has been tested against other leading models and has consistently outperformed them in various categories, demonstrating its effectiveness in solving complex math problems. Its performance in competitions like the IMO and Putnam highlights its potential to assist students and researchers in mathematical reasoning.
In summary, DeepSeekMath-V2 represents a significant advancement in AI’s ability to understand and generate mathematical proofs. By prioritizing logical reasoning and proof quality, it sets a new standard for AI in mathematics, making it a valuable tool for anyone looking to tackle challenging mathematical problems.