Status — V4 is released. DeepSeek launched the V4 Preview on April 24, 2026, shipping two production-ready open-weights models — V4-Pro and V4-Flash — under the MIT license. This report is being built incrementally; see the changelog for what’s been added.

How to read this report

This page is the executive summary. Pick the path that matches what you came for:

Total reading time end-to-end: ~25 minutes. Each subpage is independently navigable.

DeepSeek V4 tech-report cover and headline figure Cover page of DeepSeek_V4.pdf. The headline chart compares V4-Pro-Max to Claude Opus 4.6 Max, GPT-5.4 xHigh, and Gemini 3.1 Pro High; the right panels show V4’s FLOPs and KV-cache reductions vs V3.2 at 1M context.

DeepSeek V4 spec sheet Official DeepSeek V4 specification card. Source: api-docs.deepseek.com.


At a glance

  DeepSeek V4-Pro DeepSeek V4-Flash
Total params 1.6T 284B
Active params 49B 13B
Context 1M tokens 1M tokens
Architecture MoE + DeepSeek Sparse Attention (DSA) MoE + DeepSeek Sparse Attention (DSA)
Modes Thinking / Non-Thinking Thinking / Non-Thinking
License MIT (open weights) MIT (open weights)
API model ID deepseek-v4-pro deepseek-v4-flash
Input (cache-miss) $1.74 / 1M tokens $0.14 / 1M tokens
Input (cache-hit) $0.145 / 1M tokens $0.028 / 1M tokens
Output $3.48 / 1M tokens $0.28 / 1M tokens

Sources: DeepSeek API Docs — V4 Preview Release · DeepSeek API Pricing


What’s new vs V3

  • DeepSeek Sparse Attention (DSA) — token-wise compression that, in 1M-token settings, brings V4-Pro to roughly 27% of single-token inference FLOPs and 10% of the KV cache of V3.2.
  • 1M context as the default across all official services.
  • Two-tier lineup (Pro for frontier reasoning, Flash for cost-sensitive deployment).
  • Hybrid Thinking / Non-Thinking modes in a single model — replacing the V3-era split between deepseek-chat and deepseek-reasoner.
  • Aggressive pricing — V4-Flash output at $0.28 / 1M tokens is roughly an order of magnitude below frontier closed models.

Site map

Thirteen pages, grouped by intended use. The sidebar mirrors this grouping.

Read first

  • The V4 thesis — six observations that will still matter about V4 a year from now. Curatorial editorial summary.
  • API documentation — endpoints, auth, JSON mode, function calling, cost calculator, cross-vendor pricing, alternative providers, “Choosing V4-Pro vs V4-Flash” decision flow with concrete app patterns.
  • V3 → V4 migration — one-line code change, July 24 retirement deadline, cost recalibration with explicit ratios.

Detail

  • News & timeline — release timeline, official announcement quotes, press coverage, community reception (Hacker News + Simon Willison + community quants), industry reactions, geopolitical / IP controversy.
  • Technical details — config-backed architecture (CSA + HCA, mHC, MoE expert counts), V3 → V3.2 → V4 diff, training pipeline, training-stability candor, FP4/FP8 quantisation, architecture-decisions journal.
  • Benchmarks — official Tables 1, 6, and 7 verbatim, formal-reasoning Putnam regimes, MRCR long-context curve, reasoning-effort scaling, win-rate analysis vs Opus.
  • Self-hosting — hardware budgets per variant, serving framework matrix (NIM / SGLang / vLLM / GGUF / MLX), real-world throughput numbers, when the API is the better answer.
  • Independent testing — three reproducible test prompts with a Python harness (coding, reasoning, 750K-token needle).
  • Limitations & safety — DeepSeek’s own Section 6 candor, V3-era red-team findings that likely extrapolate, content-filtering on the hosted API.

Reference

  • References — full bibliography with access dates.
  • Glossary — ~25 acronyms (CSA, HCA, DSA, MLA, mHC, MoE, OPD, GRPO, YaRN, etc.) defined in one place.
  • Errata — material corrections to past claims, dated and traceable.

Cross-site:

  • Changelog — dated log of every iteration’s contribution.

Migration note

DeepSeek’s existing deepseek-chat and deepseek-reasoner endpoints currently route to V4-Flash in Non-Thinking and Thinking modes respectively. Both will be fully retired after 2026-07-24, 15:59 UTC. New code should target deepseek-v4-pro or deepseek-v4-flash directly.

Source: DeepSeek API Docs — V4 Preview Release.


Citing this report

This is a living document. If you reference a specific claim, link to the section anchor and note the access date — the page may have been re-verified or corrected since.

Suggested citation format:

1011-a. (2026). DeepSeek V4 — research report. Reports.
Retrieved <ACCESS_DATE> from https://1011-a.github.io/reports/deepseek-v4/

BibTeX:

@misc{1011a_deepseekv4_2026,
  author       = ,
  title        = {DeepSeek V4 --- research report},
  howpublished = {Reports},
  year         = {2026},
  url          = {https://1011-a.github.io/reports/deepseek-v4/},
  note         = {Accessed: <ACCESS_DATE>}
}

For specific subpages, append the subpage path (e.g., /deepseek-v4/benchmarks/) to the URL. Every long page has a “Last verified” date stamp under its TL;DR — that’s the most accurate reference point for when its claims were last reconciled with primary sources.

If you spot something materially wrong, the Errata page documents prior corrections; new errata can be opened as issues at github.com/1011-a/reports.

The full source of this report — markdown, configs, and rendered figures — lives at the same GitHub URL under MIT-equivalent terms (the report cites primary sources whose own licenses apply to their content).