On this page

This is the report distilled — six observations that will still matter about V4 a year from now, after the launch-week press cycle has faded. Each is sourced to the relevant page; click through if you need the receipts.


1. The market shock has already been absorbed

V4 launched into a market where Chinese open-weight models already command 45% of OpenRouter’s weekly volume. Morningstar’s Ivan Su captured the analyst consensus in one line: “V4’s debut is unlikely to have the same market impact as R1, because traders have already priced in the reality that Chinese AI is competitive and cheaper to use.”

R1’s January 2025 release was the paradigm shift. V4 is the natural execution of the thesis R1 introduced — better, but not surprising. If you’re reasoning about the long-run trajectory of open-weight models, V4 is confirmation, not news.


2. The 1.6T parameter count is risk management, not capability

V4-Pro reuses V3’s exact backbone shapehidden_size: 7168, 61 layers, 128 attention heads. The 671B → 1.6T parameter growth comes entirely from MoE expansion (256 → 384 routed experts, expert-FFN width 2048 → 3072). Innovation is concentrated in the new components: hybrid CSA + HCA attention, mHC residuals, FP4-QAT routed experts.

DeepSeek themselves explicitly flag this as a deliberate choice in Section 6: they retained “many preliminarily validated components and tricks” to minimise risk, and commit to “more comprehensive and principled investigations” for V5. V4 is a wide-experts iteration on V3’s backbone, not a from-first-principles redesign. V5 is more likely to revisit the backbone shape than V4 was.


3. The cheap-1M-context story is a hardware bet, not an algorithm trick

The headline claim — 1M context at 27% of V3.2’s FLOPs and 10% of its KV cache — is delivered by hybrid CSA + HCA attention where m=4 and m′=128 compression ratios are interleaved per layer. The actual compute wins manifest most cleanly on NVIDIA Blackwell’s FP4 tensor cores, which match V4’s QAT’d routed-expert weights directly. B200 delivers ~3× the V4 throughput of H200 — the largest published gap I know of for any model on Hopper-vs-Blackwell.

V4 is the first frontier-class open model where the architecture and the contemporary GPU hardware generation are co-designed. That makes it future-proof in a way V3 isn’t, and it makes deployment economics on older hardware (A100, H100) noticeably worse than the headline numbers suggest.


4. Training stability is held together by techniques DeepSeek themselves don’t fully understand

Section 4.2.3 of the tech report is unusually candid: V4’s training stability depends on Anticipatory Routing (decouple routing-net updates from backbone updates by Δt steps; ~20% wall-clock overhead) and SwiGLU Clamping (clamp linear component to [−10, 10]; surfaces as swiglu_limit: 10.0 in config.json). DeepSeek explicitly says the theoretical underpinnings of both “remain insufficiently understood” and commits to studying them.

This honesty is unusual for frontier-model tech reports and is a leading indicator: V5’s most interesting research direction is likely “principled training stability for MoE at trillion-parameter scale,” because that’s where DeepSeek themselves have flagged the open question.


5. The economic gap to closed-source frontier is a moat that’s eroding fast

V4-Pro output is 7.2× cheaper than Opus 4.7, 8.6× cheaper than GPT-5.5, and 3.4× cheaper than Gemini 3.1 Pro. V4-Flash output is 89× cheaper than Opus 4.7. A 32M-input + 1M-output monthly heavy-fleet workload costs $4.76/month on Flash versus $185/month on Opus.

The standard-tier-vs-extended-tier dynamic adds a sharper edge: Gemini doubles input rates above 200K context, where V4-Pro charges flat $1.74 / 1M for the entire 1M-token window. V4-Pro is the cheapest frontier option once you cross 200K context — a real differentiation that will land before deployers fully internalise it.

This isn’t sustainable for the proprietary frontier. Either Anthropic / OpenAI / Google compress their margins, or they make a capability bet that justifies the gap. The economic pressure on the closed-source frontier is now structural, not cyclical.


6. The V3 → V4 migration is a one-line code change. Treat it accordingly.

Existing V3 / V3.2 / R1 / deepseek-chat / deepseek-reasoner integrations need a single code change to migrate: swap the model parameter. Auth, base URL, request shape, streaming, tools, JSON mode are all unchanged. The legacy IDs are scheduled for retirement after 2026-07-24, 15:59 UTCset a calendar reminder for 2026-07-15 if you haven’t migrated.

Most teams should land on V4-Flash + Thinking for the hot path with V4-Pro escalation for genuinely-hard turns (routing pattern). The trap is standardising on Pro everywhere (you’ll overpay 12× on the 80% of traffic that doesn’t need Pro-class quality) or on Flash everywhere (the long tail of knowledge-breadth-hard prompts will regress).


When V4 is the wrong choice

Most of the report describes when V4 fits. The inverse is shorter but worth being explicit about:

  • You need image, video, or audio input. V4 is text-only; multimodal is a Section 6 future direction, not a V4 feature. Use Gemini 3.1 Pro or Opus 4.7 instead.
  • Your application is politically-sensitive on Chinese topics. Both the hosted API and the open weights inherit pre-training-baked geopolitical bias — Enkrypt AI measured ~91% pro-China-government framing on R1’s controversial answers, and the bias persists even in community-uncensored fine-tunes. If your product needs neutral framing on Tiananmen / Taiwan / Xinjiang, V4 is the wrong stack.
  • You need strict, format-rule-driven output and you can’t tolerate occasional noncompliance. DeepSeek’s own tech-report Section 5.4 acknowledges V4-Pro-Max trails Opus on instruction-following. For e.g. tool-use systems where a single malformed JSON breaks the pipeline, Opus or GPT-5 are safer.
  • You need state-of-the-art knowledge breadth (SimpleQA-class factuality). Gemini-3.1-Pro wins SimpleQA-Verified by ~18 absolute pp (Table 6). For research-assistant or fact-checking workloads, Gemini is still ahead.
  • You’re operating in a procurement context that excludes Chinese-origin AI. Some US federal, defence, and finance contexts will increasingly do so; the White House OSTP April 2026 memo is a warning shot. Check your compliance constraints before committing operational dependencies.
  • You need long-horizon multi-round agentic tasks at the frontier today. V4-Pro is competitive on single-turn agent tasks (Terminal-Bench 67.9, SWE-Pro 55.4) but not yet on long-horizon ones. DeepSeek explicitly flags this as a Section-6 future direction, meaning V5 territory.
  • You need predictable single-tenant latency on US/EU hardware. Direct DeepSeek API runs from PRC; alternative providers (NIM, DeepInfra) are US-jurisdiction but each adds variance. If predictable sub-300ms TTFT under load matters, benchmark on your specific provider.

If none of these apply, V4 is genuinely the right choice for most workloads.


What we don’t know yet

  • Independent red-team data for V4 specifically (V3-era findings likely extrapolate but haven’t been re-measured).
  • V4-specific OpenRouter / NIM usage figures (only platform-wide aggregates so far).
  • How robust V4 is to sub-FP4 quantisation for consumer hardware — current GGUF and MLX work suggests “not robust,” but more attempts are likely in the next month.
  • What V5 looks like. Section 6 of the tech report commits DeepSeek to architectural simplification, multimodality, sparser embeddings (Engram-line), low-latency architectures, long-horizon agentic tasks, and training-stability theory. The Future-proofing section on the migration page extrapolates pricing (V5-Flash ~$0.07 / $0.14, V5-Pro ~$0.87 / $1.74) but those numbers are explicitly speculative.

What this report doesn’t try to be

  • A rolling news ticker. New community benchmarks, fine-tunes, and provider rates land regularly; we capture material developments in the news page but don’t try to mirror everything published.
  • A substitute for DeepSeek’s own materials. The official tech report, the Hugging Face model cards, and the API docs are the canonical sources. This report cites them, sometimes corrects third-party misreadings against them, and adds a layer of decision-support narrative on top — but if there’s a conflict between this page and the primary sources, the primary sources win and we update.
  • An endorsement. V4 has real failure modes — knowledge breadth still trails Gemini, content filtering and pre-training-baked geopolitical bias are documented limitations, and the Anthropic / OpenAI distillation accusations are unresolved. These aren’t dealbreakers for most deployments but they are real tradeoffs that don’t show up in the benchmark tables.

The point of having a thesis page is to take a position: V4 is not the most capable frontier model, but it’s the model that makes “frontier-class capability at low cost” the default option for most workloads. That’s the takeaway worth remembering once the launch noise fades.