deepseek-v4

Highlights across 38+ iterations

Major milestones, in rough order of usefulness for someone skimming this log:

  • Initial scaffold + V4-cover seeded (iter 1, 2026-04-26)
  • Full primary-source ingestion of DeepSeek_V4.pdf and both V4 config.json files; CSA + HCA naming corrected vs third-party shorthand (iter 3)
  • Custom Jekyll design replacing just-the-docs, editorial typography (iter 4)
  • Tables 1, 6, 7 verbatim from tech report; Figures 8, 9, 10, 11–12 rendered (iters 5, 9)
  • V3 → V3.2 → V4 architectural diff table with config-confirmed numbers (iters 6, 9)
  • API page with cost calculator, cross-vendor pricing (V4-Pro 7.2× cheaper than Opus, 8.6× cheaper than GPT-5.5), JSON mode + tool use, alternative providers (iters 7, 11, 12, 14, 16, 22, 26)
  • V3 → V4 migration guide with 4.9×-cheaper-than-reasoner worked example + July 24 retirement deadline (iter 12)
  • Limitations from primary sources + V3-era red-team findings (Cisco 100% jailbreak rate, Enkrypt 91% pro-PRC bias) (iters 6, 15)
  • Self-hosting page with hardware budgets per variant + framework matrix + real-world throughput (iters 24, 25)
  • Three CI lints: unguarded Liquid (iter 23), orphan assets (iter 28), broken anchors (iter 30)
  • Editorial synthesis in The V4 thesis — six observations that will still matter a year from now (iter 32)
  • Errata page documenting three substantive corrections + one process-failure correction (iters 19, 21)
  • Glossary page for ~25 acronyms (iter 19)

2026-04-27

  • Iteration 19 — Glossary + Errata pages. Created deepseek-v4/glossary.md with ~25 entries covering architecture (CSA, HCA, DSA, MLA, MoE, DeepSeekMoE, mHC, MTP, hash routing, noaux_tc), training (GRPO, OPD, Muon, Anticipatory Routing, SwiGLU Clamping, YaRN), numerical formats (FP8 e4m3, FP4-QAT, UE8M0), inference (MQA, grouped output projection, sliding window, reasoning-effort modes), and miscellaneous (cache-hit/miss pricing, V4-Pro-Max, legacy deepseek-chat/deepseek-reasoner IDs, Ascend, Engram). Created deepseek-v4/errata.md documenting three substantive corrections from the iteration history: the iter-2 → iter-3 CSA/HCA naming flip (third-party shorthand → official names from tech report Section 2.3), the iter-2 → iter-9 V3-context-window correction (128K → 160K, primary-sourced from V3’s config.json), and the iter-2 → iter-3 quantisation correction (FP8-only → FP8 base + FP4 routed experts + FP4 indexer). Each errata entry names the original claim, the correction, the root cause, and the iteration in which the fix landed. Wired both pages into _config.yml sidebar and added two new reading paths to the topic landing.
  • Iteration 18 — Architecture decisions journal + V5 future-proofing. Added a major “Architecture decisions journal” section to technical.md covering 7 key V4 design choices — hybrid CSA+HCA vs pure DSA; mHC vs vanilla residuals; FP4 mixed-precision; OPD replacing mixed-RL; reusing V3’s backbone shape (7168/61/128); hash routing as load-balance backstop; YaRN extension instead of native 1M training. Each decision documents DeepSeek’s choice, the constraint it solves, the cost paid, and what V5 is likely to do differently per Section 6’s stated future directions. Added a “Future-proofing — what V5 is likely to look like” section to migration.md covering the seven Section-6-committed future directions, an extrapolated V5 pricing projection (Flash ~$0.07/$0.14, Pro ~$0.87/$1.74) marked as speculative, what V4 commitments likely persist into V5 (MIT license, OpenAI-compat API, DSA, DeepSeekMoE, noaux_tc), and what’s at risk (hash routing, the CSA+HCA split, Anticipatory Routing as a knob).
  • Iteration 16 — Alternative API providers. Added an “Alternative API providers — where else to call V4” section to api.md covering DeepSeek direct, OpenRouter (deepseek/deepseek-v4-pro and deepseek-v4-flash both available, with provider-status link), NVIDIA NIM (day-0 launch with both hosted endpoint and deployable NGC container for Blackwell), DeepInfra (US-jurisdiction billing), and the announced-but-not-yet-listed Together / Fireworks / Anyscale. Flagged the OpenRouter advertised rate ($0.435 in / $0.87 out) as variable upstream-provider-dependent pricing, with the direct DeepSeek rate as the contractually stable baseline. Linked NVIDIA’s launch-day Blackwell walkthrough for teams self-hosting via NIM. Added 7 references covering OpenRouter pages, NIM references, NGC catalog, NVIDIA developer blog, and DeepInfra.
  • Iteration 15 — Safety section from V3-era research + stale-TBD audit. Major expansion of limitations.md with a primary-source-cited safety section: Cisco’s reported 100% jailbreak success rate against DeepSeek-R1 (verbatim Adversa AI quote), Enkrypt AI’s measurement that 91.2% of R1’s China-controversy answers leaned pro-government and that the bias persists in community-uncensored fine-tunes (so it’s pre-training-baked, not just SFT-baked), Promptfoo’s CCP-Sensitive-Prompts dataset (1,360 prompts × 68 topics, ~85% refusal rate). Added a “what this means for V4 deployments” matrix (general chat, politically-sensitive, compliance, open-weight redistribution). Documented concrete observed refusal behaviours on Tiananmen, Xinjiang, Taiwan, Xi Jinping’s human-rights record. Flagged that V4’s tech report does not document any safety-specific change vs V3 — so V3-era findings extrapolate. Added 5 references for safety/bias research. Cleaned stale TBDs across benchmarks.md (the “cross-model comparison TBD” section now points up to the actual Table 6), technical.md (PDF ingestion no longer “in progress”), and limitations.md (next-iterations refreshed to actionable items).
  • Iteration 14 — Choosing-the-model decision section + footer feed link. Added a comprehensive “Choosing V4-Pro vs V4-Flash” section near the top of api.md, immediately after the Quick Start. Includes the headline tradeoff table (cost / reasoning / long-context recall / knowledge / local deploy / throughput), a 7-step decision flow that walks readers from the hardest constraint downward, a “use both” pattern with a sample router function, and a cost × quality positioning paragraph linking to Artificial Analysis. Verified the feed.xml is generating (empty body, but valid Atom — jekyll-feed only auto-includes posts, not pages). Linked the Atom feed and a GitHub watch-repo URL from the footer with appropriate framing — readers who want change notifications should watch the repo, since the changelog is the primary update vehicle.
  • Iteration 13 — On-this-page TOC + social card. Added a hand-styled “On this page” disclosure component to the design system (page-toc CSS class with editorial typography, eyebrow-style summary, hover states matching the existing palette). Inserted TOC blocks at the top of all five long pages: technical.md, benchmarks.md, api.md, migration.md, limitations.md — each renders a kramdown-generated nested table of contents from the page’s H2/H3 structure, collapsible via native <details>. Configured jekyll-seo-tag with summary_large_image Twitter card, default OG image (/assets/og-default.png, the V4 tech-report cover), and per-directory image override so DeepSeek-V4 pages share the V4 cover specifically when linked.
  • Iteration 12 — Migration guide page + Gemini pricing. Created deepseek-v4/migration.md, a dedicated V3 → V4 migration guide (Step 1: change model parameter; Step 2: pick variant via a routing table; Step 3: recalibrate cost with explicit ratios; Step 4: behavioural deltas including context-window jump, hybrid reasoning, and the format-strictness regression vs Opus; Step 5: July 24 retirement deadline; Step 6: operational checklist). Wired the new page into the topic sidebar via _config.yml and into the topic landing’s “How to read this report” reading-paths block. Added Gemini 3.1 Pro ($2 in / $12 out, doubling above 200K context) to the cross-vendor pricing table in api.md, with the new tipping-point insight: V4-Pro charges the same $1.74/1M for the full 1M-token window, making it the cheapest frontier option once you cross 200K context.
  • Iteration 11 — Pricing math + Hyper-Connections lineage. Added a Cross-vendor pricing table to api.md covering V4-Pro/Flash vs OpenAI GPT-5.4 / GPT-5.4 (>272K) / GPT-5.5 / GPT-5.5-Pro, and Anthropic Claude Opus 4.7, with explicit multipliers — V4-Pro output is 7.2× cheaper than Opus 4.7, 8.6× cheaper than GPT-5.5; V4-Flash output is 89× cheaper than Opus 4.7. Added a Python cost calculator that takes input/output tokens + cache-hit fraction and returns USD costs for both V4 models, plus a worked-example table covering single-turn chat, agent steps, full 1M long-context queries, and a heavy-fleet monthly budget. Added the upstream Hyper-Connections paper (arXiv:2409.19606, Defa Zhu et al., ByteDance, Sep 2024) to technical.md’s mHC section so the lineage HC → mHC is explicit, with the verbatim HC abstract on the seesaw between gradient vanishing and representation collapse. Added 1 reference (HC paper).
  • Iteration 10 — Testing page rewritten + topic-landing reading paths. Replaced the stub testing.md with three fully-specified, reproducible test prompts: (1) a Python bug-fix coding task that probes strict instruction-following on minimal-patch output, (2) a reasoning task with deliberate ambiguity to probe Thinking-mode epistemic posture, (3) a 750K-token needle-in-haystack at five depth percentages to verify the 1M-context claim. Each test has a Python harness using the OpenAI SDK against https://api.deepseek.com, JSON-format transcripts written to tests/transcripts/, pass criteria, verdict signals, and a cost estimate ($6.53 total for all three with the long-context sweep dominating). Added a “How to read this report” reading-paths block to the topic landing — five guided paths matching common reader intents, with anchor links into the relevant subpages.
  • Iteration 9 — Figures 9–12 embedded + V3 column hardened. Renamed and embedded the iter-8-rendered Figure 9 (MRCR 1M long-context recall curve) and Figure 10 (HLE + Terminal-Bench 2.0 by reasoning effort) in benchmarks.md with explanatory captions. Rendered Figures 11–12 (V4-Pro-Max vs Opus-4.6-Max win-rate analysis across analysis, generation, etc.) on page 43 and embedded with the verbatim qualitative summary — V4 wins on depth, proactive insight, long-form generation; loses on strict instruction-following, summarisation, and slide-aesthetics. Pulled the DeepSeek-V3 paper (arXiv:2412.19437) and config.json from HuggingFace, mirroring the latter at deepseek-v4/configs/v3-config.json. Used it to harden the V3 column of the diff table: V3 was actually 160K context, not 128K (max_position_embeddings: 163840, YaRN factor 40). Confirmed V3 architecture: hidden 7168, 61 layers, 128 attention heads, MLA with kv_lora_rank: 512, 256 routed + 1 shared experts, top-8, moe_intermediate_size: 2048, first 3 layers dense. Added the key observation: V4-Pro reuses V3’s exact backbone shape (7168 · 61 · 128) — the parameter growth from 671B to 1.6T comes entirely from MoE expansion (256→384 routed experts, 2048→3072 expert intermediate size). Added 2 references (V3 paper + V3 config.json).

2026-04-26

  • Iteration 1 — Initial scaffold: repo created, Jekyll + just-the-docs configured, site live on GitHub Pages. Topic landing page seeded with release status, model variants, context window, pricing summary, and primary-source citations from DeepSeek’s official preview announcement (2026-04-24). Stub pages created for News, Technical Details, API, Benchmarks, Testing, Limitations, and References — each with a “next steps” list to pick up next iteration.
  • Iteration 2 — Architecture deep-dive: rewrote technical.md against verified primary sources (arXiv:2512.02556 for DSA, arXiv:2512.24880 for mHC). Reconciled DSA / Compress-4-Attention / Compress-128-Attention naming and traced V4’s lineage through V3.2’s DSA introduction. Added arXiv abstracts and author lists, including Wenfeng Liang as last author on mHC. Added FP4-on-Blackwell and Huawei Ascend hardware notes. Extended news.md timeline with the December 2025 / January 2026 precursor papers. Added 6 new entries to references.md (LMSYS Day-0 post, Fortune, vLLM Ascend, MODEL1 leak analysis, plus the two arXiv papers and the V4 tech report PDF). Hugging Face TLS was unreachable from this environment, so concrete config.json numbers (num_experts, hidden_size, num_hidden_layers, num_attention_heads, head_dim, vocab_size) remain TBD as the highest-priority gap for iteration 3.
  • Iteration 8 — Quick start, community quants, V3.2 column hardening. Added a Quick Start to the top of api.md with a one-shot curl that just works, plus the migration-cost note (existing V3 integrations only need to change the model parameter). Pulled the DeepSeek-V3.2 paper from arXiv (/pdf/2512.02556) and used it to harden the V3.2 column of the V3 → V3.2 → V4 diff table: V3.2 is a continued-pretraining on V3 (only +943.7B + 2.1B = ~946B additional tokens), uses specialist distillation + mixed-RL via GRPO with reasoning + agent + alignment merged into a single RL stage, and reports >10% post-training compute as a fraction of pre-training cost. Also surfaced V3.2’s Speciale max-compute mode. Added a Community quantisations subsection to news.md covering tecaprovn’s GGUF (~170GB), the MLX 8-bit conversion (302GB, 11K downloads in the first month), and the technical caveat that sub-Q4 won’t work cleanly since V4 ships in FP4+FP8 already. Added 3 references for the community quant ecosystem.
  • Iteration 7 — Figures, API depth, community reception. Rendered Figures 5 (EP scheme, page 15) and 6 (KV cache layout, page 23) from the tech report and embedded them in technical.md alongside Section 3 infrastructure notes. Expanded api.md with full JSON mode (response_format={"type": "json_object"}) and function-calling / tool use (tools parameter, strict-mode beta endpoint, full Python round-trip example). Added a Community reception section to news.md with Simon Willison’s pricing comparison verbatim, three HN-thread top-comment themes, and the Hugging Face migration-cost finding (only the model parameter changes for existing V3 integrations). Added 8 new entries to references.md covering Simon Willison’s post, three HN threads, the HF forum thread, and the official JSON-mode + function-calling guides.
  • Iteration 6 — Limitations from primary source + V3→V4 lineage. Fully rewrote limitations.md from Section 6 (Conclusion, Limitations, Future Directions) and Section 4.2.3 (Mitigating Training Instability) of the tech report. Documented six self-acknowledged limitations: architectural complexity, training-stability fragility (Anticipatory Routing + SwiGLU Clamping as duct tape DeepSeek doesn’t fully understand), knowledge-breadth gap vs Gemini-3.1-Pro, long-context recall trailing Opus-4.6, agentic gaps vs proprietary, and incomplete benchmark entries due to API rate-limits. Captured DeepSeek’s seven future-directions commitments. Added a V3 → V3.2 → V4 architectural diff table to technical.md covering attention, MoE, routing, residual stream, optimiser, quantisation, training, post-training, and reasoning modes. Added a full “Training Stability” section explaining Anticipatory Routing and SwiGLU Clamping (and how the latter maps to the swiglu_limit: 10.0 config field). Pulled tokenizer_config.json (mirrored at deepseek-v4/configs/) and replaced the TBD tokenizer section with confirmed details (BOS/EOS continuity from V3, model_max_length = 1M, PreTrainedTokenizerFast).
  • Iteration 5 — Benchmarks deep-dive + tech-report figures. Added Tables 6 and 7 verbatim to benchmarks.md — full per-benchmark numbers for V4-Pro-Max vs Opus-4.6 / GPT-5.4 / Gemini-3.1-Pro / K2.6 / GLM-5.1 across knowledge, reasoning, long-context, and agentic suites; plus a separate table showing V4-Flash and V4-Pro across Non-Think / High / Max reasoning-effort modes. Added the formal-reasoning Putnam-200 / Putnam-2025 results from Figure 8 (V4 ties Axiom at 120/120 on Putnam-2025, beats Seed-Prover by ~2.3× on Putnam-200 Pass@8). Documented evaluation methodology: 384K reasoning context in Max mode, Codeforces eval setup (14 contests × 114 problems), Lean v4.28 agentic + hybrid pipeline. Rendered five PDF figures into assets/images/deepseek-v4/: cover, overall architecture (Fig 2), CSA (Fig 3), HCA (Fig 4), formal reasoning (Fig 8). Embedded all of them in the relevant pages. Replaced the stale “Training (PDF ingestion pending)” section in technical.md with the actual Section 4–5 content (32T/33T training tokens, OPD post-training, GRPO specialists, MTP loss weight 0.3, hash-routing balance threshold). Added the cover image to the topic landing page.
  • Iteration 3 — Primary-source breakthrough. HuggingFace was reachable; pulled both V4-Pro and V4-Flash config.json files (mirrored at deepseek-v4/configs/) and the official DeepSeek_V4.pdf tech report (4.5 MB). Major rewrite of technical.md with full config-backed architecture tables (hidden_size, layers, heads, MoE expert counts, MLA q/o LoRA ranks, YaRN scaling, multi-token-prediction layers). Corrected the iter-2 framing: the official names are CSA (Compressed Sparse Attention) and HCA (Heavily Compressed Attention) — the 4 and 128 in compress_ratios are the per-layer compression ratios m and m′, not layer names. Added the verbatim CSA/HCA mechanism description from Section 2.3 of the tech report, the layer-by-layer compression schedule, and the lightning-indexer FP4 detail. Corrected quantisation: base FP8 + FP4-QAT routed experts + FP4 indexer attention. Confirmed Muon optimiser (now primary-source). Added training data: V4-Flash on 32T tokens, V4-Pro on 33T. Documented the post-training shift from RL → On-Policy Distillation (OPD), GRPO specialist training, and the V4-Pro-Max reasoning-effort mode. Added the official Table 1 base-model benchmark comparison (V3.2-Base vs V4-Flash-Base vs V4-Pro-Base) to benchmarks.md, and noted Figure 1 comparators (Claude-Opus-4.6-Max, GPT-5.4-xHigh, Gemini-3.1-Pro-High).

site

2026-04-27

2026-04-27 — drift cleanup after final pass

  • Stale “Next iterations” footers cleared across 5 pages. User flagged that technical.md’s footer was still listing items completed in iters 6 / 7 / 29. Audit found the same drift on benchmarks.md, api.md, limitations.md, and news.md. Fixed:
    • technical.md — all 5 items done (V3→V4 diff iter 6, tokenizer iter 6, Figures 5/6 iter 7, OPD detail iters 6+29, Section 6 → limitations iter 6). Section removed.
    • benchmarks.md — all 5 items done (Tables 1/6/7 + Figures 8/9/10/11–12 from iters 5/9). Section removed.
    • api.md — 3 of 5 items done (JSON mode + tools + price screenshot from iters 1/7); kept 2 genuinely-open items as “Still open against primary sources” (Thinking-mode parameter shape, rate limits — DeepSeek hasn’t documented either).
    • limitations.md — all items require running evals with a DEEPSEEK_API_KEY. Retitled to “What would close these gaps” with explicit dataset/tool links — invitation for external contribution rather than self-promise.
    • news.md — 1 item done (PDF deck downloads from iter 1); rest reframed as “Watching for” (r/LocalLLaMA, DeepSeek API change log, HF model card READMEs, future fine-tunes).
  • Lesson: per-page “next iteration” footers are a drift hazard. The iter-33 site-map fix and iter-42 README fix were the same pattern. Going forward, future-work items should land in the changelog or as dated open items, not in the page body.

2026-04-27 — final pass (loop stopped)

  • Loop cancelled at user request. Cron job e806a63f removed.
  • Generic Reports OG card generated. Replaced the iter-13 V4-cover-doing-double-duty placeholder. New assets/og-default.png (1200×630, 53 KB) hand-rendered by scripts/build-og-card.py using the site’s design palette (paper background, burnt-sienna accent rule, Times serif title, sans display eyebrow + footer). Reproducible — re-run the script if the title or accent changes.
  • Page-weight CI lint. New scripts/lint-page-weight.py enforces per-file (default 1.5 MB) and per-directory total (12 MB) asset budgets, with override env vars for intentional bumps. Wired into .github/workflows/lint.yml as a fourth job alongside Liquid / orphan-asset / anchor lints. Site currently 6.8 MB / 12 MB. Updated README accordingly.
  • Backlog audit closed: NIM config.json byte-for-byte verification — page doesn’t expose internals (closed). Editorial-quality lead image for thesis — needs image-editing tools beyond PIL (deferred). Promptfoo CCP dataset run — needs DEEPSEEK_API_KEY (deferred to anyone who has one). HN comments since iter 7 — searched in iters 31, 34, 35, nothing materially new (closed).

The DeepSeek V4 report is complete as a comprehensive reference: 13 content pages + cross-site changelog, 4 CI lints, primary-source-cited claims with date stamps, public errata, runnable test harness, MIT license, OG card, BibTeX export. Future updates land if and when DeepSeek announces something new (V4 update, V5 preview) or the legal/distillation situation produces a concrete development.

  • Iteration 43 — LICENSE file + NOTICE for third-party content. The README has been claiming “MIT for site source” since iter 1 but no LICENSE file actually existed in the repo — that’s a credibility miss for a public artifact. Added the standard MIT LICENSE text, plus a NOTICE.md documenting the file-level breakdown of third-party content (model configs and tokenizer files mirrored under their original MIT, tech-report figures rendered from DeepSeek_V4.pdf and the V3.2 paper, announcement images from api-docs.deepseek.com). Added both to _config.yml’s exclude list so they don’t render as Jekyll pages. Updated the README’s License section to point at both files. Closes a real polish gap that mattered: a public research report claiming a license without shipping the LICENSE file isn’t actually MIT-licensed.
  • Iteration 42 — Repo README rewrite. The repo-root README.md was iter-1 vintage — still claimed Jekyll + just-the-docs (theme dropped in iter 4) and listed only the 7 original page categories. Rewrote to reflect the actual current state: vanilla Jekyll 4.x with hand-rolled _layouts/ + design system, the three CI lints (Liquid / orphan-asset / anchor) with a what-they-catch table, the full repo layout including tests/ and scripts/, the local-preview command, and the issues-as-errata flow. This is what GitHub visitors see first; it should match reality.
  • Iteration 41 — Removed inline-harness duplicate from testing.md. With the canonical harness now living at tests/run.py (iter 40), the 60-line inline Python copy in testing.md’s “The harness” section was a drift hazard — any improvement to the runnable file would silently leave the page out of sync. Replaced the inline copy with a pointer to the committed file plus a small JSON shape example showing the per-test transcript record. Closes a known drift risk.
  • Iteration 40 — Tests harness committed to disk. Created tests/run.py (200 lines) that implements the three reproducible test prompts the testing page specifies — Test 1 coding bug-fix, Test 2 reasoning under ambiguity, Test 3 750K-token needle-in-haystack with deterministic random seed. Added tests/README.md with usage, and tests/.gitignore to exclude transcripts by default. Updated _config.yml to exclude tests/ and scripts/ from the published site (they’re repo artifacts, not pages). Rewrote the “Running it” section of testing.md to point at the committed harness rather than asking readers to copy-paste from the page. The harness is runnable on python tests/run.py once DEEPSEEK_API_KEY is set; supports running a single test by id (e.g. python tests/run.py test-1-coding).
  • Iteration 39 — Changelog highlights header. Added a “Highlights across 38+ iterations” block at the top of the deepseek-v4 changelog section listing the major milestones in rough order of usefulness — initial scaffold, primary-source ingestion, custom design, benchmark tables/figures, V3→V4 diff, API page, migration guide, limitations, self-hosting, three CI lints, editorial synthesis, errata, glossary. Helps readers scan what got added when without parsing every iteration entry.
  • Iteration 38 — BibTeX citation export. Added a BibTeX block to the topic landing’s “Citing this report” section so researchers and writers can pull a clean @misc entry with a placeholder access date. Pairs with the existing plain-text suggested-citation format.
  • Iteration 37 — “When V4 is the wrong choice” section. Added the inverse to the existing patterns guide as a dedicated section in the thesis page: 7 disqualifying conditions where V4 is genuinely the wrong stack pick (multimodal needs, politically-sensitive Chinese topics, strict format-rule output, SOTA knowledge breadth requirements, Chinese-AI procurement exclusions, long-horizon multi-round agentic tasks, predictable single-tenant latency on US/EU hardware). Each condition links to the supporting evidence elsewhere in the report. Closes a real gap — the report had extensive guidance on when to use V4 but no explicit “and here are the cases where it’s the wrong call.” Pure synthesis from existing data, no new sources needed.
  • Iteration 36 — Third analyst voice. Added Lian Jye Su’s (Omdia, chief analyst) quote on V4 competitiveness vs US rivals — “based on the benchmark results, it does appear DeepSeek V4 is going to be very competitive against its U.S. rivals” — plus the Omdia framing that V4 is “3 to 6 months behind state-of-the-art on the hardest coding and reasoning benchmarks, but delivers near-frontier performance at roughly a third of the API cost.” This independently confirms the same gap Simon Willison cited at launch from a different analyst house. Added 1 reference (US News article, 2026-04-24). Process note: three /loop commands stacked into this turn; consolidated to a single iteration’s worth of work rather than padding three.
  • Iteration 35 — Sharper market data + Unsloth fine-tuning variants. Replaced the soft “Chinese models exceed 45%” statement with the stronger weekly data point: in the week of 2026-03-30 to 2026-04-05, OpenRouter served 12.96T tokens to Chinese-origin models vs 3.03T to US models — Chinese share is ~80%, not 45%, on the most recent week with public data. The 45% figure was a softer monthly average; the weekly print shows acceleration. Added Unsloth’s fine-tuning-optimised variants of both V4-Pro and V4-Flash (unsloth/DeepSeek-V4-Pro, unsloth/DeepSeek-V4-Flash) to the community-quantisations section — same MIT weights, reorganised for tractable LoRA / QLoRA training. For teams planning to fine-tune (rather than infer against) V4, the Unsloth variants are typically the fastest path. Added 4 new references covering Dataconomy + AICost (market data) and the two Unsloth model cards.
  • Iteration 34 — Citation block on the topic landing. Added a “Citing this report” section to deepseek-v4/index.md with a suggested citation format, the framing that the report is a living document (reference with an access date and section anchor), instructions for spotting and reporting errata, and a note on source-of-truth ordering (primary sources beat this report on conflict; the per-page “Last verified” stamps are the most accurate reference for when claims were last reconciled). Searched for fresh V4 news and ecosystem updates 4 days post-launch — nothing materially new beyond what the report already covers; community is still settling into adoption mode.
  • Iteration 33 — Site map drift fix on the topic landing. The topic landing (deepseek-v4/index.md) had a stale “Report contents” section listing only 7 pages with .html-style URLs from before the iter-12 pretty-URL migration — it was missing the migration, self-hosting, glossary, errata, thesis, and changelog pages added in iters 12, 19, 24, and 32. Replaced with a current “Site map” using the same Read-first / Detail / Reference grouping as the sidebar, with a one-line purpose under each entry. Bumped the page’s updated frontmatter from 2026-04-26 to 2026-04-27. Process note: the broken-anchor lint (iter 30) didn’t catch this because the .html paths are conventional (just outdated, not broken-as-such — Jekyll will resolve news.html to news.md); a stronger check would be a “permalinks-match-config” lint, but that’s diminishing returns. The audit caught it during the iter-33 read-through.
  • Iteration 32 — Editorial synthesis page. Created deepseek-v4/thesis.md — a curatorial essay distilling 31 iterations of analysis into six observations that will still matter about V4 a year from now. Six points: (1) the market shock is already absorbed (Morningstar’s “already priced in” framing), (2) the 1.6T parameter count is deliberate risk-management not capability — V4-Pro reuses V3’s exact backbone, (3) the cheap-1M-context story is a Blackwell-FP4 hardware co-design bet, not an algorithm trick — B200 delivers ~3× H200’s throughput on V4 specifically, (4) training stability is held together by Anticipatory Routing + SwiGLU Clamping that DeepSeek themselves don’t fully understand, (5) the cost gap to the closed-source frontier (V4-Pro 7.2× cheaper than Opus, V4-Flash 89× cheaper) puts structural pressure on closed-source margins, (6) the V3 → V4 migration is a one-line code change — set a 2026-07-15 calendar reminder. Plus “what we don’t know yet” (V4-specific red-team data, V5 architecture) and “what this report doesn’t try to be” (rolling news ticker, primary-source substitute, endorsement). Wired into the Read-first sidebar group between Overview and Choosing-the-model. Added a top-of-list reading path on the topic landing.
  • Iteration 31 — Fresh post-launch news (3 days in). Restructured news.md’s adoption section into Adoption signals with two subsections (OpenRouter platform-wide market share + the new OpenClaw default-model switch announced 2026-04-27, with verbatim quote). Added Industry reactions with date-stamped quotes from Neil Shah of Counterpoint Research (“a serious flex”) and Ivan Su of Morningstar (the “already priced in” thesis). Added a Geopolitical / IP controversy section documenting Anthropic and OpenAI’s accusation that DeepSeek illegally extracted model capabilities, plus the 2026-04-23 White House OSTP memo accusing China-based entities of “industrial-scale” model distillation. Framed as a legal-risk axis for deployers to watch — not an obstacle to commercial V4 use today (MIT licence stands), but a political environment to plan for. Added 4 new references covering TechNode (OpenClaw), CNN Business, gHacks, and TechCrunch.
  • Iteration 30 — Broken-anchor lint. Added scripts/lint-anchors.py to verify in-site #anchor links resolve to actual headings on the target pages — catches the silent rot when a heading gets renamed or dropped. Walks all .md files, builds a URL → kramdown-id-set map from H2/H3/H4 headings on each page (handling explicit {#id} overrides), then checks every internal anchor link against that map. Initial run found 9 false positives because my first kramdown approximation collapsed non-alphanum runs into a single hyphen — kramdown’s actual rule is to remove non-word non-hyphen non-whitespace chars (em-dashes, slashes) entirely and convert each whitespace to a hyphen without collapsing, so “ — “ becomes “–”. After the fix, all anchor links resolve. Wired into .github/workflows/lint.yml as a third parallel job alongside liquid and assets. Three CI lints now run on every push/PR.
  • Iteration 29 — OpenRouter market context + reverse-KL OPD detail + verification date stamps. Added an adoption signal subsection to news.md noting that Chinese models exceed 45% of OpenRouter’s weekly volume as of April 2026 — V4’s launch into that traffic mix at $0.14–$3.48 is widely expected to push that share higher. Confirmed against the V4 tech report and added one technical detail to technical.md’s OPD section: the post-training distillation explicitly minimises reverse KL loss D_KL(π_θ || π_E_i) against each specialist teacher, with the tech report noting “the reverse KL loss yields more stable gradient estimates and ensures faithful distillation.” Added “· Last verified 2026-04-27” date stamps to all six per-page TL;DR callouts (technical, benchmarks, api, migration, limitations, self-hosting) — readers now know exactly when each summary was last reconciled with primary sources, a small but high-integrity addition for a report that grows over time.
  • Iteration 28 — Orphaned-asset lint. Added scripts/lint-assets.py to detect images committed without any markdown reference — would have caught the iter-27 bug where the V3.2 DSA figure was rendered, committed, and forgotten until a manual review noticed. Heuristic: walk assets/images/, flag any file whose name doesn’t appear in any .md file under the repo (excludes _site, .git, .github, vendor). Permissive enough to allow relative_url / baseurl / direct-path references; only false-negatives if a file is referenced only via JS or dynamic Liquid. Updated .github/workflows/lint.yml to add an assets job alongside the existing liquid job; broadened the paths: triggers to fire on changes under assets/** and scripts/** too. Confirmed clean locally and via CI — every asset currently in the repo has at least one inbound markdown reference.
  • Iteration 27 — Per-page exec summaries + V3.2 DSA figure. Added a hand-styled .page-tldr component to the design system (eyebrow-styled “In one paragraph” label, accent-bordered left rule, max-width matching the wider breakout column). Added one to the top of each long page: technical.md (V3-backbone reuse + new architectural pieces + quantisation + training pipeline), benchmarks.md (V4-Pro-Max wins/losses + Flash-Max-vs-Pro-High proximity), api.md (OpenAI-compat + pricing + endpoints + provider matrix), migration.md (one-line code change + July-24 deadline + 4.9× savings on reasoner→Flash), limitations.md (DeepSeek’s own candor + V3-era red-team findings extrapolate + CAC filtering), self-hosting.md (Flash-tractable / Pro-cluster-only + framework matrix + ~199 / ~266 tok/s + 300-500ms TTFT). Rendered V3.2’s Figure 2 (DSA architecture instantiated under MLA) from the V3.2 PDF and embedded it in technical.md’s DSA subsection so readers see the mechanism the V4 hybrid attention is built on.
  • Iteration 26 — Concrete app patterns. Added a “Concrete app patterns” subsection to api.md’s “Choosing V4-Pro vs V4-Flash” section. Synthesises the cost / quality / latency / context-window tradeoffs documented elsewhere in the report into 7 build-X-with-variant-Y recommendations: coding agent (Flash hot path → Pro escalation), long-document Q&A (Pro 1M context), code review bot (Flash — within 1 pp of Pro on SWE Verified at 12× cheaper output), batch translation / summarisation (Flash), customer-support triage (Flash, TTFT < 500ms), RAG replacement (Pro, exploits the flat 1M pricing tier vs Gemini’s >200K doubling), reasoning copilot (Pro Max — only mode matching frontier-proprietary on Putnam-2025 / Codeforces 3206). Each pattern has variant + mode, rationale, $/request range at no-cache pricing, and the explicit tradeoff. Closing recommendation: most production deployments use two model IDs with a difficulty-heuristic router, not one — standardising on Pro everywhere overpays 12× for easy turns; standardising on Flash misses long-tail knowledge-breadth quality.
  • Iteration 25 — Real-world throughput measurements. Added a Real-world throughput section to self-hosting.md with community-published (not DeepSeek-published) numbers from within 72 hours of release: LMSYS Day-0 single-stream decode (V4-Pro on 8× B200 TP=8 = 180–199 tok/s; V4-Flash on 4× H200 TP=4 = 240–266 tok/s), B200 ~3× H200 ratio for V4 specifically, SGLang vs vLLM on H100 (29% SGLang advantage on DeepSeekMoE class workloads), and TTFT figures (V4-Flash Non-Think 300–500ms vs open-weight median 2.12s). Added explicit caveats: reasoning-effort dominates end-to-end latency, provider variance is real (83 vs 150 tok/s for the same model on different hosts), and long-context degradation matters past 200K. Added 4 references (Artificial Analysis V4-Flash page, BSWEN coding metrics, Particula SGLang-vs-vLLM benchmark, WaveSpeedAI Pro-vs-Flash review).
  • Iteration 24 — Self-hosting page. Created deepseek-v4/self-hosting.md consolidating hosting information that was previously scattered across the news, API, and limitations pages: when self-hosting makes sense (compliance, very-high-volume economics, latency floor, fine-tuning, air-gapped) vs when the API is the better answer; hardware budgets per variant (V4-Flash on 2× H100 80GB or 1× H200, V4-Pro as cluster-class needing 8× H100/H200/B200 or Ascend 910C); serving framework matrix (NIM, SGLang, vLLM, llama.cpp/GGUF, MLX) with the day-0 status of each; a worked memory-budget example for an agent workload (32K input, 4K output, batch 8 → ~162 GB total HBM, fits on 2× H100); and “what you don’t get from self-hosting” (prefix caching, auto-failover, pricing transparency). Wired into Detail sidebar group between Benchmarks and Independent testing; added reading-path link on the topic landing.
  • Iteration 23 — Liquid lint + CI workflow. After two consecutive build failures from the same root cause (unguarded Liquid syntax inside markdown inline code), added a Python lint at scripts/lint-liquid.py that scans markdown files for { % or { { tokens inside backticks, ignoring properly-guarded raw blocks and bare-Liquid usage like image-path filters. Wired it into a GitHub Actions workflow at .github/workflows/lint.yml that runs on every push and PR touching markdown — CI will now fail before deploy if the pattern reappears. Added an errata entry recapping both iter-21 and iter-22 incidents and the lessons (a failure that recurs is a process failure; heuristic CI lints earn their cost on the first prevented failure; documenting Liquid bugs in Liquid-rendered prose is its own hazard, reach for raw first).
  • Iteration 22 — NIM hardware support + home image responsive cap. Pulled the official NIM reference page for V4-Pro and added concrete hardware-support details to api.md’s Alternative-providers section: V4-Pro NIM container officially supports A100, H100, H200, and B200 — three GPU generations, not just Blackwell. Documented the licensing combination (NVIDIA Open Model Agreement + MIT for the model itself) and explicitly noted what the page does not publish (per-GPU throughput, HBM requirements, per-request token limits). Capped the home page featured-card image at a 28rem max-width so it doesn’t render gigantic on >1440px screens; explicit image-rendering: auto prevents pixelation on Retina. Audited markdown files for unguarded Liquid literals — confirmed clean (only the iter-21 errata reference, safely inside a raw-tag block).
  • Iteration 21 — Sidebar regression errata + decision-section promotion. Added a candid errata entry for the iter-4 → iter-20 silent sidebar regression: | second was a just-the-docs plugin filter; after dropping the theme the entire <aside> block falsy-killed for 16 iterations across hundreds of pageviews. The errata names the root cause (failure to curl-verify rendered HTML after a theme removal) and the lessons (verify rendered output, theme-removal audits, don’t trust the changelog without checking). Promoted “Choosing the model” anchor (/deepseek-v4/api/#choosing-v4-pro-vs-v4-flash) into the Read-first sidebar group — readers can now jump directly to the Pro-vs-Flash decision flow without scrolling the API page. Scanned the DeepSeek API change log and external news for V4 updates dated after 2026-04-25; nothing new to report — the V4 preview release remains the only major V4-specific event so far.
  • Iteration 20 — Magazine-style home + grouped sidebar. Rewrote index.md (root home page) to use a featured-report card design surfacing four headline findings from the DeepSeek V4 report (1M context as default-tier, 7.2× cheaper output than Opus 4.7, V3-backbone reuse, training-stability honesty). Added featured: frontmatter schema and matching .featured-card CSS — large editorial layout with cover image on the left and findings list on the right, full-bleed on mobile. Reorganised the per-topic sidebar from a flat 11-entry list into three groups: Read first (Overview, API, Migration), Detail (News, Technical, Benchmarks, Testing, Limitations), Reference (References, Glossary, Errata). Changed _config.yml sidebar: schema to support grouped entries; updated _layouts/page.html to render group labels with eyebrow typography; added .sidebar__group-label CSS. Both schemas keep the existing per-page CSS continuity, no visual breakage on prose pages.

2026-04-26

  • Design — Replaced the just-the-docs remote theme with a custom Jekyll layout system: hand-built _layouts/{default,page,home}.html, _includes/{header,footer}.html, and a single assets/css/style.css design system (Source Serif 4 body + Inter display + JetBrains Mono code, ~38rem prose column with wider breakouts for tables/figures, sticky topic sidebar, dark/light theme toggle persisted to localStorage, paper-and-burnt-sienna palette in light, warm-charcoal in dark). Switched permalinks to pretty URLs (/deepseek-v4/technical/ etc.). Stripped just-the-docs frontmatter (parent, nav_order, has_children) across all 10 pages; replaced with a topics + sidebar map declared once in _config.yml so future report directories drop in by adding one block. Removed the duplicate body H1s now that the layout renders title/subtitle/eyebrow in page__header.