šŸ”/papers/25/

DeepSeek-V3.2

Open Source LLM

DeepSeek-V3.2

Pushing the Frontier of Open Large Language Models

DeepSeek-AI • 2025

TL;DR

DeepSeek-V3.2 is an open-source model that matches GPT-5 and approaches Gemini-3.0-Pro through three key innovations: a new sparse attention mechanism (DSA), massive reinforcement learning compute (10%+ of pre-training cost), and synthetic agentic task generation at scale.

šŸŽÆ

The Problem

Open-source models have been falling behind closed-source ones. DeepSeek identified three critical gaps:

⚔#1

Attention Bottleneck

Standard attention is O(L²) - quadratic complexity kills performance on long sequences

šŸ’°#2

Insufficient Post-Training

Open models don't invest enough compute in RL/fine-tuning after pre-training

šŸ¤–#3

Weak Agent Capabilities

Open models struggle with tool use, instruction following, and real-world tasks

1

DeepSeek Sparse Attention (DSA)

Traditional Attention

← KeysQueries ↓

Every token attends to every other token. O(L²) complexity.

DSA (Sparse)

← KeysQueries ↓

Lightning indexer selects top-k relevant tokens. O(Lk) complexity.

How DSA Works

1
Lightning Indexer

Computes relevance scores between query and all keys (cheap, FP8)

2
Top-k Selection

Selects only the 2048 most relevant tokens

3
Sparse Attention

Full attention computed only on selected tokens

Inference Cost Savings

$0.7$0.35$0
0K32K64K96K128K
V3.1-Terminus
V3.2 (DSA)

DSA dramatically reduces costs at longer context lengths while maintaining quality

2

Scaled Reinforcement Learning

>10%

of pre-training compute spent on RL post-training

GRPO Algorithm Improvements

  • Unbiased KL Estimate

    Fixes gradient bias when tokens have low probability under current policy

  • Off-Policy Sequence Masking

    Masks negative samples with high policy divergence to stabilize training

  • Keep Routing

    Preserves MoE expert routing paths between inference and training

  • Keep Sampling Mask

    Maintains top-p/top-k truncation masks for consistent action spaces

Training Pipeline

1
Specialist Distillation
MathCodingReasoningAgentsSearch
2
Mixed RL Training
Reasoning + Agent + Alignment in one stage
3

Large-Scale Agentic Task Synthesis

Code Agent
24,667
RealExtracted
Search Agent
50,275
RealSynthesized
General Agent
4,417
SynthesizedSynthesized
Code Interpreter
5,908
RealExtracted

Synthesis Pipeline

1
Environment Construction

Agent creates databases, retrieves data from web, builds sandbox environments

2
Tool Synthesis

Generates task-specific tools as Python functions

3
Task Generation

Creates verifiable tasks with solutions and verification functions

4
Difficulty Scaling

Iteratively increases complexity while maintaining verifiability

Thinking Context Management

Turn 1.1
User message 1
Thinking 1.1
Tool call 1.1
Turn 1.2
User message 1
Thinking 1.1 āœ“
Tool call 1.1
Tool result 1.1
Thinking 1.2
Turn 2.1 (New User)
User message 1
Tool call 1.1
Tool result 1.1
Thinking discarded
User message 2

Key insight: Reasoning traces are only discarded when a new user message arrives, not on tool outputs. This prevents redundant re-reasoning.

šŸ“Š

Results

Reasoning Benchmarks

AIME 2025GPT-5: 94.6
93.1%
HMMT Feb 2025Gemini: 97.5
92.5%
LiveCodeBenchGemini: 90.7
83.3%
CodeforcesGemini: 2708
2386

Agentic Benchmarks

SWE-VerifiedClaude: 77.2
73.1%
Terminal BenchGemini: 54.2
46.4%
τ²-BenchGemini: 85.4
80.3%
Tool-DecathlonClaude: 38.6
35.2%
šŸ†

DeepSeek-V3.2-Speciale

Extended Thinking

By relaxing length constraints, Speciale achieves gold medal performance in:

šŸ„‡
IMO 2025
35/42
Gold
šŸ„‡
CMO 2025
102/126
Gold
šŸ„‡
IOI 2025
492/600
Gold
šŸ„‡
ICPC WF 2025
10/12
Gold
šŸ’”

Key Takeaways

1

Sparse Attention is Production-Ready

DSA maintains quality while dramatically reducing costs on long contexts. The lightning indexer + top-k selection pattern is elegant and efficient.

2

RL Compute Matters

Spending 10%+ of pre-training compute on RL post-training unlocks significant capability gains. Most open models underinvest here.

3

Synthetic Data Works

Automatically synthesized agentic tasks generalize to real-world benchmarks. Hard-to-solve, easy-to-verify tasks enable scalable RL.

4

Context Management for Agents

Keeping reasoning traces during tool calls (until new user message) prevents wasteful re-reasoning. Simple but high-impact.

Limitations

  • •Less world knowledge than Gemini-3.0-Pro due to fewer pre-training FLOPs
  • •Token efficiency still lower - requires more tokens to match Gemini output quality
  • •Complex task performance still below frontier closed-source models