DeepSeek-V3.2
Pushing the Frontier of Open Large Language Models
DeepSeek-AI ⢠2025
TL;DR
DeepSeek-V3.2 is an open-source model that matches GPT-5 and approaches Gemini-3.0-Pro through three key innovations: a new sparse attention mechanism (DSA), massive reinforcement learning compute (10%+ of pre-training cost), and synthetic agentic task generation at scale.
The Problem
Open-source models have been falling behind closed-source ones. DeepSeek identified three critical gaps:
Attention Bottleneck
Standard attention is O(L²) - quadratic complexity kills performance on long sequences
Insufficient Post-Training
Open models don't invest enough compute in RL/fine-tuning after pre-training
Weak Agent Capabilities
Open models struggle with tool use, instruction following, and real-world tasks
DeepSeek Sparse Attention (DSA)
Traditional Attention
Every token attends to every other token. O(L²) complexity.
DSA (Sparse)
Lightning indexer selects top-k relevant tokens. O(Lk) complexity.
How DSA Works
Lightning Indexer
Computes relevance scores between query and all keys (cheap, FP8)
Top-k Selection
Selects only the 2048 most relevant tokens
Sparse Attention
Full attention computed only on selected tokens
Inference Cost Savings
DSA dramatically reduces costs at longer context lengths while maintaining quality
Scaled Reinforcement Learning
of pre-training compute spent on RL post-training
GRPO Algorithm Improvements
- Unbiased KL Estimate
Fixes gradient bias when tokens have low probability under current policy
- Off-Policy Sequence Masking
Masks negative samples with high policy divergence to stabilize training
- Keep Routing
Preserves MoE expert routing paths between inference and training
- Keep Sampling Mask
Maintains top-p/top-k truncation masks for consistent action spaces
Training Pipeline
Specialist Distillation
Mixed RL Training
Large-Scale Agentic Task Synthesis
Code Agent
Search Agent
General Agent
Code Interpreter
Synthesis Pipeline
Environment Construction
Agent creates databases, retrieves data from web, builds sandbox environments
Tool Synthesis
Generates task-specific tools as Python functions
Task Generation
Creates verifiable tasks with solutions and verification functions
Difficulty Scaling
Iteratively increases complexity while maintaining verifiability
Thinking Context Management
Key insight: Reasoning traces are only discarded when a new user message arrives, not on tool outputs. This prevents redundant re-reasoning.
Results
Reasoning Benchmarks
Agentic Benchmarks
DeepSeek-V3.2-Speciale
Extended ThinkingBy relaxing length constraints, Speciale achieves gold medal performance in:
IMO 2025
CMO 2025
IOI 2025
ICPC WF 2025
Key Takeaways
Sparse Attention is Production-Ready
DSA maintains quality while dramatically reducing costs on long contexts. The lightning indexer + top-k selection pattern is elegant and efficient.
RL Compute Matters
Spending 10%+ of pre-training compute on RL post-training unlocks significant capability gains. Most open models underinvest here.
Synthetic Data Works
Automatically synthesized agentic tasks generalize to real-world benchmarks. Hard-to-solve, easy-to-verify tasks enable scalable RL.
Context Management for Agents
Keeping reasoning traces during tool calls (until new user message) prevents wasteful re-reasoning. Simple but high-impact.
Limitations
- ā¢Less world knowledge than Gemini-3.0-Pro due to fewer pre-training FLOPs
- ā¢Token efficiency still lower - requires more tokens to match Gemini output quality
- ā¢Complex task performance still below frontier closed-source models