Overview & News

Release timeline

Date (UTC)	Event	Source
2025-12	DeepSeek-V3.2 paper introduces DeepSeek Sparse Attention (DSA) — the attention mechanism that V4 builds on.	arXiv:2512.02556
2025-12-31 / 2026-01-05	DeepSeek posts the mHC: Manifold-Constrained Hyper-Connections paper (v1 then v2) — the residual-stream technique that becomes load-bearing for V4. Wenfeng Liang appears as last author.	arXiv:2512.24880
2026-04-24	DeepSeek publishes the V4 Preview announcement; `deepseek-v4-pro` and `deepseek-v4-flash` go live in the API; weights published on Hugging Face under MIT; `DeepSeek_V4.pdf` tech report posted on the Pro model card.	DeepSeek API Docs
2026-04-24	V4 becomes available in `chat.deepseek.com` via Expert Mode (V4-Pro, Thinking) and Instant Mode (V4-Flash, Non-Thinking).	DeepSeek API Docs
2026-04-25	LMSYS publishes a Day-0 deep-dive on V4 deployment with SGLang, including FP4 expert-weight handling and verified-RL training.	LMSYS Blog
2026-07-24 15:59	Hard retirement of legacy `deepseek-chat` and `deepseek-reasoner` endpoints (currently routing to V4-Flash).	DeepSeek API Docs

Official announcement — key claims

From the DeepSeek V4 Preview Release post (2026-04-24):

“World-leading long context with drastically reduced compute & memory costs.”
1M context is the standard default across all official services.
V4-Pro reaches its efficiency target through token-wise compression and DeepSeek Sparse Attention (DSA): at 1M tokens, ~27% of V3.2’s single-token inference FLOPs and ~10% of the KV cache.
The API is OpenAI- and Anthropic-compatible: callers keep the same base_url and only swap the model name.
Open weights are published under the MIT license on Hugging Face.

Source: api-docs.deepseek.com/news/news260424

Press coverage

Outlet	Headline	Date	Link
CNBC	“China’s DeepSeek releases preview of long-awaited V4 model as AI race intensifies”	2026-04-24	cnbc.com
Bloomberg	“DeepSeek Unveils Newest Flagship AI Model a Year after Upending Silicon Valley”	2026-04-24	bloomberg.com
Al Jazeera	“China’s DeepSeek unveils latest models a year after upending global tech”	2026-04-24	aljazeera.com
Euronews	“China’s DeepSeek releases new AI model V4. Here’s everything to know as the AI race speeds up”	2026-04-24	euronews.com
VentureBeat	“DeepSeek-V4 arrives with near state-of-the-art intelligence at 1/6th the cost of Opus 4.7, GPT-5.5”	2026-04-24	venturebeat.com
Artificial Analysis	“DeepSeek is back among the leading open weights models with V4 Pro and V4 Flash”	2026-04-24	artificialanalysis.ai

Community reception

Captured on 2026-04-26, two days after release.

Simon Willison

In DeepSeek V4 — almost on the frontier, a fraction of the price (simonwillison.net, 2026-04-24), Willison ran his SVG-pelican benchmark against both V4 models via OpenRouter:

“Flash produced a solid result with good bicycle details, while Pro’s output showed anatomical issues — an oversized body and misaligned features.”

His pricing benchmark — the cleanest summary in the wild:

“V4-Flash is the cheapest of the small models and V4-Pro is the cheapest of the larger frontier models. Flash undercuts GPT-5.4 Nano ($0.20/$1.25) and Gemini 3.1 Flash-Lite ($0.25/$1.50). Pro at $1.74/$3.48 beats Gemini 3.1 Pro and GPT-5.4.”

He paraphrased DeepSeek’s own framing of where V4 sits relative to the frontier:

“V4-Pro trails state-of-the-art frontier models by approximately 3 to 6 months, though reasoning capabilities narrow this gap.”

Hacker News

Three threads on the front page within 24 hours: v4 announcement, tech report, coding-focused breakdown. Top-comment themes:

Pricing is the headline. “Flash is only $0.28 / 1M and seems quite competent.” (HN 47885014)
Closes the open-vs-closed gap. “Looks like DeepSeek is just about 2 months behind the leaders now.”
Practical capability. “The Common Lisp code was very good.” (a non-trivial test the user posted with their own V4-Flash transcript).
Value math for heavy users. A 40M-token-per-month workload with prefix caching lands at $30–70/mo on V4-Pro — “around double the usage compared to GPT-5.5 on the $200 sub.”
Local deployment is theoretically reachable but slow. “Theoretically with streaming, any model that fits the disk can run on consumer hardware, just terribly slow.”

Hugging Face Forums

A thread titled DeepSeek V4 is live in preview — should your team switch? is gathering migration-cost reports from teams with existing V3.x integrations. Headline finding: the only required change is the model parameter — base_url, auth, and request shape are unchanged.

Adoption signals (within first 4 days)

Independent of V4-specific traffic, OpenRouter’s April 2026 market data shows Chinese-origin models (Xiaomi, Alibaba, MiniMax, DeepSeek, Moonshot, Zhipu) crossing 45% of total platform traffic.

A sharper data point: in the week of March 30 – April 5, 2026, OpenRouter served:

Origin	Tokens that week	Share
Chinese models	12.96T	~80%
US models	3.03T	~19%

Total platform volume that week: 27T tokens. The Chinese share grew 31.48% week-on-week into early April. Source: Dataconomy, aicost.org.

V4 launching into that traffic mix — at headline pricing of $0.14 / $0.28 for Flash and $1.74 / $3.48 for Pro — is now expected to deepen that share further. The pricing-leadership story isn’t “DeepSeek will eventually win share” — it’s “the share is already there; V4 makes it stickier.”

OpenClaw makes V4-Flash its default model (2026-04-27)

TechNode reported on 2026-04-27 — three days after release — that OpenClaw switched its default model to V4-Flash. The article frames the decision as a capability-density bet:

“V4 Flash, designed with 284 billion parameters and 13 billion activated parameters, is now the default model, delivering Max mode inference performance close to the 1.6 trillion parameter V4 Pro.”

This validates the report’s Choosing-the-model recommendation that Flash-Max often gets “within 1 pp of Pro-High” on math/code (Tables 6 and 7) at fractional cost.

Industry reactions (date-stamped)

Neil Shah, VP of Research at Counterpoint Research (via CNN):

“DeepSeek’s V4 preview [is] a serious flex”

— specifically calling out the lower-than-prior-models inference cost.

Ivan Su, Senior Equity Analyst at Morningstar (via CNN, same article):

V4’s debut is unlikely to have the same market impact as R1, because traders have already priced in the reality that Chinese AI is competitive and cheaper to use.

This reads as: V4 is a strong product, but the thesis it represents (open-weight Chinese frontier-class) is no longer surprising. Markets digested R1’s January 2025 jolt; V4 is the natural next step, not a paradigm shift.

Lian Jye Su, Chief Analyst at Omdia (via US News):

“Based on the benchmark results, it does appear DeepSeek V4 is going to be very competitive against its U.S. rivals.”

Omdia frames the gap as “3 to 6 months behind state-of-the-art on the hardest coding and reasoning benchmarks, but delivers near-frontier performance at roughly a third of the API cost.” That’s the same lag Simon Willison cited at launch (see Community reception) — independent confirmation from a different analyst house.

Geopolitical / IP controversy (2026-04 ongoing)

The April 24 V4 launch coincided with a sharper Western public stance on Chinese model distillation:

Anthropic and OpenAI have accused DeepSeek of “illegally extracting capabilities” from their models — i.e., training V3/R1/V4 against Anthropic and OpenAI completions to copy capabilities.
Michael Kratsios, White House OSTP director, on 2026-04-23 issued a memo accusing “foreign entities primarily based in China” of conducting “industrial-scale” campaigns to “distill” frontier AI models from US companies. The memo did not name DeepSeek, but the context made the implication clear.
Source: TechCrunch coverage.

For deployers: this is a legal-risk axis to watch but not — at this writing — an obstacle to using V4 commercially. The MIT licence on the open weights is unaffected by these accusations; the legal exposure (if any) sits with DeepSeek, not its downstream users. But the political environment around Chinese open-weight models in the US is hardening, and a deployer who commits operational dependencies to V4 should plan for the possibility that future regulation tightens beyond what the current MIT licence permits.

Community quantisations (within 48 hours of release)

Two community conversions had landed by 2026-04-26:

Conversion	Format	Size on disk	Repo
V4-Flash GGUF	GGUF (deepseek2 architecture, registered as 158B params)	~170 GB	tecaprovn/deepseek-v4-flash-gguf
V4-Flash 8-bit MLX	MLX 8-bit (mlx-lm 0.31.3)	302 GB	mlx-community/deepseek-ai-DeepSeek-V4-Flash-8bit — 11K downloads in the first month
Unsloth V4-Flash	Fine-tuning-optimised re-pack of the official weights	(varies)	unsloth/DeepSeek-V4-Flash
Unsloth V4-Pro	Fine-tuning-optimised re-pack of the official weights	(varies)	unsloth/DeepSeek-V4-Pro

Important caveat for further quantisation: V4 ships in mixed FP4 (routed experts + lightning indexer) + FP8 (everything else). Sub-Q4 quantisation is unlikely to produce usable output because the source weights are already at 4-bit precision in the most parameter-heavy modules. The GGUF and MLX-8bit conversions both up-cast and re-quantise.

V4-Pro at 1.6T parameters is not yet a realistic local-deploy target — even at GGUF 4-bit it would land somewhere around 800 GB on disk, far above any consumer hardware. Flash is the realistic local target.

Unsloth’s variants of both V4-Pro and V4-Flash appeared 1–2 days after release. Unsloth specialises in low-VRAM fine-tuning; their re-packs use the same MIT-licensed weights but reorganised to make LoRA / QLoRA training tractable on smaller setups. For teams planning to fine-tune V4 (rather than infer against it), the Unsloth variants are usually the fastest path.

Source: tecaprovn/deepseek-v4-flash-gguf, mlx-community/deepseek-ai-DeepSeek-V4-Flash-8bit, unsloth/DeepSeek-V4-Pro, unsloth/DeepSeek-V4-Flash, allthings.how — DeepSeek V4 GGUF Status.

Watching for

Threads to monitor as the V4 ecosystem matures:

r/LocalLLaMA and X / Twitter for substantive technical reception once community threads accumulate beyond launch-week chatter.
DeepSeek’s own API change log for V4-specific updates (price changes, new endpoints, deprecations beyond the July 24 retirement).
Hugging Face model card READMEs for V4-Pro and V4-Flash to stabilise — these will eventually carry community usage notes that are currently scattered across third-party blogs.
Further community fine-tunes, distillations, and quantisations beyond the iter-35 snapshot of GGUF + MLX + Unsloth.