Open Source LLM

DeepSeek-V3.2

Pushing the Frontier of Open Large Language Models

DeepSeek-AI • 2025

TL;DR

DeepSeek-V3.2 is an open-source model that matches GPT-5 and approaches Gemini-3.0-Pro through three key innovations: a new sparse attention mechanism (DSA), massive reinforcement learning compute (10%+ of pre-training cost), and synthetic agentic task generation at scale.

🎯

The Problem

Open-source models have been falling behind closed-source ones. DeepSeek identified three critical gaps:

⚡#1

Attention Bottleneck

Standard attention is O(L²) - quadratic complexity kills performance on long sequences

💰#2

Insufficient Post-Training

Open models don't invest enough compute in RL/fine-tuning after pre-training

🤖#3

Weak Agent Capabilities

Open models struggle with tool use, instruction following, and real-world tasks

DeepSeek Sparse Attention (DSA)

Traditional Attention

← KeysQueries ↓

Every token attends to every other token. O(L²) complexity.

DSA (Sparse)

← KeysQueries ↓

Lightning indexer selects top-k relevant tokens. O(Lk) complexity.

How DSA Works

Lightning Indexer

Computes relevance scores between query and all keys (cheap, FP8)

Top-k Selection

Selects only the 2048 most relevant tokens

Sparse Attention

Full attention computed only on selected tokens

Inference Cost Savings

$0.7$0.35$0

0K32K64K96K128K

V3.1-Terminus

V3.2 (DSA)

DSA dramatically reduces costs at longer context lengths while maintaining quality

Scaled Reinforcement Learning

>10%

of pre-training compute spent on RL post-training

GRPO Algorithm Improvements

Unbiased KL Estimate
Fixes gradient bias when tokens have low probability under current policy
Off-Policy Sequence Masking
Masks negative samples with high policy divergence to stabilize training
Keep Routing
Preserves MoE expert routing paths between inference and training
Keep Sampling Mask
Maintains top-p/top-k truncation masks for consistent action spaces

Training Pipeline

Specialist Distillation

MathCodingReasoningAgentsSearch

Mixed RL Training

Reasoning + Agent + Alignment in one stage

Large-Scale Agentic Task Synthesis

Code Agent

24,667

RealExtracted

Search Agent

50,275

RealSynthesized

General Agent

4,417

SynthesizedSynthesized

Code Interpreter

5,908

RealExtracted

Synthesis Pipeline

Environment Construction

Agent creates databases, retrieves data from web, builds sandbox environments

Tool Synthesis

Generates task-specific tools as Python functions

Task Generation

Creates verifiable tasks with solutions and verification functions

Difficulty Scaling

Iteratively increases complexity while maintaining verifiability

Thinking Context Management

Turn 1.1

User message 1

Thinking 1.1

Tool call 1.1

Turn 1.2

User message 1

Thinking 1.1 ✓

Tool call 1.1

Tool result 1.1

Thinking 1.2

Turn 2.1 (New User)

User message 1

Tool call 1.1

Tool result 1.1

Thinking discarded

User message 2

Key insight: Reasoning traces are only discarded when a new user message arrives, not on tool outputs. This prevents redundant re-reasoning.

📊

Results

Reasoning Benchmarks

AIME 2025GPT-5: 94.6

93.1%

HMMT Feb 2025Gemini: 97.5

92.5%

LiveCodeBenchGemini: 90.7

83.3%

CodeforcesGemini: 2708

2386

Agentic Benchmarks

SWE-VerifiedClaude: 77.2

73.1%

Terminal BenchGemini: 54.2

46.4%

τ²-BenchGemini: 85.4

80.3%

Tool-DecathlonClaude: 38.6

35.2%

🏆

DeepSeek-V3.2-Speciale

Extended Thinking

By relaxing length constraints, Speciale achieves gold medal performance in:

🥇

IMO 2025

35/42

Gold

🥇

CMO 2025

102/126

Gold

🥇

IOI 2025

492/600

Gold

🥇

ICPC WF 2025

10/12

Gold

💡

Key Takeaways

Sparse Attention is Production-Ready

DSA maintains quality while dramatically reducing costs on long contexts. The lightning indexer + top-k selection pattern is elegant and efficient.

RL Compute Matters

Spending 10%+ of pre-training compute on RL post-training unlocks significant capability gains. Most open models underinvest here.

Synthetic Data Works

Automatically synthesized agentic tasks generalize to real-world benchmarks. Hard-to-solve, easy-to-verify tasks enable scalable RL.

Context Management for Agents

Keeping reasoning traces during tool calls (until new user message) prevents wasteful re-reasoning. Simple but high-impact.

Limitations

•Less world knowledge than Gemini-3.0-Pro due to fewer pre-training FLOPs
•Token efficiency still lower - requires more tokens to match Gemini output quality
•Complex task performance still below frontier closed-source models