Is DeepSeek V4 truly open source?

DeepSeek V4 follows the same open-weight approach as V3, model weights are publicly available under a permissive license that allows commercial use. You can download, fine-tune, and deploy the model without licensing fees. However, the training code and data pipeline remain proprietary, which is standard for open-weight releases. For most production use cases, the distinction between open-weight and fully open-source is immaterial.

How much VRAM do I need to run DeepSeek V4 locally?

DeepSeek V4 has roughly 1 trillion total parameters, but only ~32B activate per token thanks to its MoE architecture. Full BF16 inference of the resident model requires ~1.34TB of VRAM, practically 16-24 H100 GPUs. With Q4 quantization, you can run the active path on 22-26GB of VRAM, fitting on a single RTX 4090 or A100. For production self-hosting, plan for 8 H100s with Q8 quantization as a practical minimum.

Can Qwen 3.5 replace GPT-5 for coding tasks?

Not yet for the hardest problems. Qwen 3.5 scores 76.4 on SWE-bench Verified versus GPT-5.2's 80.0 and Claude Opus 4.6's 80.9. But for routine code generation, refactoring, and review tasks, which represent 80% of daily coding work, the difference is negligible, and Qwen runs at 10-30x lower cost. We recommend Qwen 3.5 for standard coding workflows and reserving proprietary models for complex multi-file reasoning.

When does self-hosting open-source models break even versus API calls?

Based on our client deployments, self-hosting typically breaks even at 15-40 million tokens per month. Below that threshold, the API pricing for DeepSeek ($0.14/M input) and Qwen ($0.10/M input) is already so cheap that infrastructure costs, $150K-$400K upfront for GPU hardware plus $5K-$15K monthly for operations, don't justify the investment. Above 40M tokens/month, self-hosting saves 80-90% on inference costs.

What's the biggest practical difference between DeepSeek V4 and Qwen 3.5?

Context window and licensing. DeepSeek V4 offers 1M native tokens, ideal for processing entire codebases or long documents in a single pass. Qwen 3.5 provides 256K native tokens (extendable to 1M with YaRN) but ships under Apache 2.0, which is the most permissive license in the space. If you need massive context, pick DeepSeek. If you need maximum legal clarity for commercial deployment, pick Qwen.

Are Chinese AI models safe to use in regulated industries?

The models themselves are just weights, they don't phone home or transmit data. When you self-host DeepSeek or Qwen, your data never leaves your infrastructure, which satisfies GDPR, HIPAA, and most regulatory frameworks. The risk lies in using their hosted APIs, which route through Chinese-jurisdiction servers. For regulated industries, we always recommend self-hosting with on-premise or private cloud deployment.

BLOG/LLMS & MODELS

DeepSeek V4 and Qwen 3.5: Open-Source AI Is Rewriting the Rules in 2026

DeepSeek and Qwen now hold 15% of the global AI market, up from 1% a year ago. Here's what V4 and 3.5 actually deliver, what they cost, and when they beat proprietary models.

Sebastian MondragonMARCH 06, 2026 · 11 MIN READ

DeepSeek V4 and Qwen 3.5: Open-Source AI Is Rewriting the Rules in 2026

DeepSeek V4 lists at roughly $0.14 per million input tokens. GPT-5 sits near $2.50. For a workload classifying and summarizing 50,000 financial documents daily, that's the difference between a $4,000+ monthly API bill and one closer to $200, at parity accuracy on the documents we've benchmarked. The math forces a question every AI budget owner is now answering: why are we paying 20x for this?

The market is already answering. DeepSeek and Qwen have gone from a combined 1% of global AI market share in January 2025 to roughly 15% by January 2026, the fastest adoption curve in AI history. DeepSeek V4 and Qwen 3.5, both released in the last three weeks, represent the most capable open-weight models ever built. They're not catching up to proprietary models. In several benchmarks, they've overtaken them.

This isn't a theoretical shift. It's a practical one that changes how you should architect, budget, and deploy AI systems in production. Here's what both models actually deliver, and exactly when they beat the proprietary alternatives.

01 · The Open-Source AI Inflection Point

The numbers tell a story that most industry analysts missed until it was too late. In January 2025, OpenAI controlled 55% of the global AI market. Qwen held 0.5%. DeepSeek held 0.5%. Twelve months later, OpenAI sits at 40% while Qwen and DeepSeek have climbed to 9% and 6% respectively.

Qwen has surpassed 700 million cumulative downloads on Hugging Face, the most downloaded AI model family in the world. That's not just researchers experimenting. Those are production deployments by companies that ran the cost analysis and decided the pricing gap was too large to ignore.

What changed? Two things. First, the capability gap closed. DeepSeek V3, trained for a reported $5.6 million, versus the hundreds of millions spent by OpenAI, Google, and Anthropic per frontier model, proved that Mixture-of-Experts architectures could match dense models at a fraction of the compute cost. Second, the ecosystem matured. You can now run these models through vLLM, Ollama, or TensorRT-LLM with the same tooling and OpenAI-compatible APIs you'd use for any proprietary model. The switching cost dropped to near zero.

For teams evaluating their model stack, this isn't about ideology or geopolitics. It's about unit economics. When an open-weight model delivers 95% of the performance at 5% of the cost, the business case writes itself.

02 · DeepSeek V4: A Trillion Parameters, 32 Billion Active

DeepSeek V4 launched in early March 2026 as the most ambitious open-weight model ever released. The headline number, roughly 1 trillion total parameters, is misleading without context. Thanks to its Mixture-of-Experts architecture, only ~32 billion parameters activate per token. That's a 50% increase in total model size over V3, but the active parameter count actually dropped from 37B to 32B, meaning V4 is simultaneously more capable and more efficient per query.

Multimodal From the Ground Up

V4 is DeepSeek's first natively multimodal model. Unlike earlier approaches that bolted vision capabilities onto a text model, V4's multimodal architecture was built into pre-training. It processes text, images, and video natively, no adapter layers, no quality degradation from stitching separate models together. This matters for production deployments where you're processing mixed-format documents. Financial reports with charts, medical records with imaging, technical documentation with diagrams, V4 handles all of it in a single pass without routing to specialized sub-models.

The 1-Million-Token Context Window

V4 extends context from 128K tokens (V3) to over 1 million, an 8x increase enabled by two key innovations. DeepSeek Sparse Attention (DSA) with Lightning Indexer technology reduces attention complexity from quadratic to linear, making million-token inference practical rather than theoretically possible. Engram Conditional Memory adds hash-based O(1) lookups for efficient retrieval across the full context. In practice, 1M tokens means you can feed V4 an entire medium-sized codebase (50-100 files), a 600-page technical manual, or months of conversation history in a single request. A typical legal document review workflow that previously chunked 200-page contracts into segments for GPT-5, losing cross-reference context in the process, can now load the entire document into V4 in one pass.

Training Efficiency and Cost Structure

DeepSeek V3 was trained for $5.6 million. While V4's training cost hasn't been officially disclosed, the architecture improvements, Manifold-Constrained Hyper-Connections for training stability, 16 expert pathways per token (up from V3's top-2/top-4 selection), suggest the cost remains in the single-digit millions. Compare that to the estimated $100M+ per training run for GPT-5 and Claude Opus 4.6. API pricing reflects this efficiency: approximately $0.14 per million input tokens and $0.28 per million output tokens. That's roughly 1/20th the cost of GPT-5's API. For high-volume workloads, the savings compound fast.

The Hardware Story

Here's where it gets geopolitically interesting. V4 was optimized for Huawei Ascend and Cambricon chips, with DeepSeek withholding early access from Nvidia and AMD. This is a deliberate architectural bet, and for self-hosting customers, it means V4 runs efficiently on a broader range of hardware than most Western models that assume NVIDIA CUDA throughout the stack.

03 · Qwen 3.5: The Agentic Open-Weight Powerhouse

Alibaba released Qwen 3.5 on February 16, 2026, and the model family immediately became the most compelling option for teams that need both capability and legal clarity. The flagship model packs 397 billion total parameters with 17 billion active per forward pass, a leaner MoE architecture than DeepSeek V4 but with aggressive optimization that shows in the benchmarks.

The pattern is clear: Qwen 3.5 leads on vision, instruction following, and multimodal understanding, areas where production workloads live. Proprietary models still edge ahead on pure mathematical reasoning and complex multi-step coding, but the gap is narrowing with each release.

The Apache 2.0 Advantage

Let's start with what differentiates Qwen 3.5 from every other frontier-class model: it ships under Apache 2.0. That's the most permissive open-source license in widespread use. You can deploy it commercially, modify it, fine-tune it on proprietary data, and sell products built on it, with zero licensing concerns. For enterprise legal teams, this eliminates months of license review. DeepSeek's custom license is permissive but includes clauses that require legal analysis. OpenAI and Anthropic's terms change quarterly. Apache 2.0 is a known quantity that every corporate legal department has already approved.

Benchmark Performance That Matters

Qwen 3.5 doesn't win every benchmark, but it wins the ones that correlate with real-world production value:

Built for Agents

Qwen 3.5 was designed with agentic workflows as a first-class use case. Built-in "thinking" and "non-thinking" inference modes let you toggle between extended chain-of-thought reasoning and fast direct responses at the API level, no prompt engineering tricks required. The model supports native tool use and multi-step planning, scoring 86.7 on Tau2-Bench (agentic tasks), second only to Claude Opus 4.6 among all models tested. For teams building complex AI agents, this makes Qwen 3.5 a serious contender as the backbone model, especially when combined with frameworks like LangGraph or CrewAI.

The Speed Factor

Qwen 3.5 delivers 8.6x to 19x faster decoding throughput compared to Qwen3-Max, thanks to its native FP8 training pipeline and hybrid attention architecture combining Gated Delta Networks with standard gated attention. The FP8 pipeline also reduces activation memory by roughly 50%, which translates directly to lower serving costs. The model family spans from 0.8B to 397B parameters, giving teams a practical on-ramp. Start with the 32B variant on a single GPU for development, validate your pipeline, then scale to the full 397B for production.

Benchmark	Qwen 3.5	GPT-5.2	Claude Opus 4.6	Winner
MathVision	88.6	83.0	82.1	Qwen 3.5
MMMU (multimodal understanding)	85.0	83.2	81.7	Qwen 3.5
IFBench (instruction following)	76.5	75.4	74.8	Qwen 3.5
MultiChallenge	67.6	57.9	60.2	Qwen 3.5
BrowseComp (web browsing)	78.6	76.1	72.3	Qwen 3.5
SWE-bench Verified (coding)	76.4	80.0	80.9	Claude
AIME 2026 (math reasoning)	91.3	96.7	93.3	GPT-5.2
Tau2-Bench (agentic tasks)	86.7	85.2	91.6	Claude

04 · Head-to-Head: DeepSeek V4 vs Qwen 3.5

For teams deciding between these two, here's the comparison that actually matters:

Choose DeepSeek V4 when you need massive context windows (full codebases, long documents), strong coding performance, or multimodal processing including video. Its 1M native context is unmatched in the open-weight space.

Choose Qwen 3.5 when you need agentic capabilities, multilingual support (201 languages versus ~50), the legal simplicity of Apache 2.0, or lower self-hosting requirements. Its leaner architecture means less hardware for comparable performance.

Dimension	DeepSeek V4	Qwen 3.5
Total parameters	~1T (32B active)	397B (17B active)
Architecture	MoE, 16 expert pathways	MoE, hybrid attention
Native context	1M+ tokens	256K (1M with YaRN)
Multimodal	Text, image, video, audio	Text, image, video
License	Custom permissive	Apache 2.0
Languages	~50	201
API input cost	~$0.14/M tokens	~$0.10/M tokens
API output cost	~$0.28/M tokens	~$0.40/M tokens
Coding (SWE-bench)	~80%+ (leaked)	76.4%
Vision (MathVision)	Not independently verified	88.6%
Self-host minimum	8× H100 (Q8)	4× H100 (Q8)
Best for	Long-context, coding, multimodal	Multilingual, agents, vision

05 · When Open-Source Beats Proprietary in Production

The "open-source vs proprietary" framing is outdated. The real question is: for which specific workloads does the cost-performance ratio of open-weight models justify the operational overhead?

Workloads Where Open-Weight Models Win

High-volume classification and extraction. If you're processing 10,000+ documents daily for classification, entity extraction, or summarization, the 10-30x cost advantage of open-weight models compounds into six-figure annual savings. For pure classification tasks, we typically deploy Particula-Classify, a purpose-built model that handles sentiment, intent, and document categorization faster and cheaper than any general-purpose LLM. But when classification is just one step in a larger pipeline that includes extraction or summarization, DeepSeek and Qwen hit the sweet spot. These are pattern-matching tasks where smaller specialized models often outperform flagships anyway. Privacy-sensitive deployments. When data cannot leave your infrastructure, healthcare, legal, financial services, self-hosted open-weight models are the only option besides building from scratch. For HIPAA-constrained workloads, deploying Qwen on-premises is often the difference between shipping and a $500K custom model training project. Multilingual applications. Qwen 3.5's 201-language support crushes every proprietary alternative. For Southeast Asia coverage across 12+ languages, GPT-5 handles English and Mandarin well but struggles with Thai, Vietnamese, and Bahasa. Qwen delivers consistent quality across all of them. Latency-critical applications. Self-hosted models on local hardware eliminate network round-trips entirely. For applications where every millisecond matters, autocomplete, real-time translation, interactive coding assistants, the latency advantage of local inference is absolute. Our guide on choosing the right inference server covers the serving stack in detail.

Workloads Where Proprietary Still Wins

Complex multi-step reasoning. For tasks requiring 5+ chained reasoning steps, advanced mathematical proofs, complex legal analysis, novel algorithm design, GPT-5.2 and Claude Opus 4.6 still maintain a measurable edge. The gap is 3-5 percentage points on hard benchmarks, but those points matter when accuracy is non-negotiable. Bleeding-edge coding tasks. Claude Opus 4.6 leads SWE-bench Verified at 80.9%. If your workflow involves complex cross-repository refactoring or novel architectural decisions, the 4-5 point advantage over Qwen 3.5 translates to fewer failed attempts and less human review. Zero operational overhead. If you don't have infrastructure engineers and don't want to manage GPU clusters, proprietary APIs remain the path of least resistance. The cost premium is effectively a managed service fee.

06 · Self-Hosting: The Real Cost Breakdown

Self-hosting open-weight models is where the largest savings live, but only above a certain scale. Here's what the economics actually look like based on the deployments we've managed.

The breakeven against DeepSeek's own API only makes sense at enormous scale (500M+ tokens/month). But the breakeven against proprietary APIs, where you'd pay $1,500/month for 500M tokens on GPT-5, happens much sooner, around 15-40M tokens monthly depending on your infrastructure choices.

The hidden cost is engineering time. Budget $5,000-$15,000 per month for an engineer maintaining the inference stack, handling model updates, monitoring performance, and managing the GPU cluster. For a deeper dive on serving infrastructure, see our comparison of Ollama vs vLLM and the three-way inference server shootout.

Hardware Requirements

The Breakeven Calculation

Monthly API cost at scale with DeepSeek V4 ($0.14/M input):

Setup	Hardware	Cost	Fits
Development	1× RTX 4090 (24GB)	~$2,000	Qwen 3.5 32B (Q4), DeepSeek V4 active path (Q4)
Small production	2× A100 80GB	~$30,000	Qwen 3.5 72B (Q8), DeepSeek V4 active path (BF16)
Full production	8× H100 80GB	~$250,000	Qwen 3.5 397B (Q4), DeepSeek V4 full (Q8)
Maximum scale	16-24× H100	~$400,000+	DeepSeek V4 full (BF16)

Monthly tokens	DeepSeek API	GPT-5 API	Self-host (amortized)
5M	$0.70	$15	~$8,000
15M	$2.10	$45	~$8,000
50M	$7.00	$150	~$8,000
500M	$70	$1,500	~$8,000

07 · Geopolitical Realities and Supply Chain Implications

We'd be naive to ignore the geopolitical dimension. Both DeepSeek and Qwen are built by Chinese companies, DeepSeek by the Hangzhou-based fund High-Flyer, and Qwen by Alibaba Cloud. This creates real considerations for enterprise adoption, not hypothetical ones.

The Hardware Decoupling

DeepSeek V4's optimization for Huawei Ascend and Cambricon chips, and its deliberate withholding of early access from Nvidia and AMD, signals a broader trend. China's AI industry is actively building an alternative hardware ecosystem. For Western enterprises, this actually reduces supply chain risk in an unexpected way: if these models run efficiently on diverse hardware, you're less locked into NVIDIA's pricing and availability cycles.

Data Sovereignty

The models themselves are weights on disk. They don't contain backdoors (the code is auditable), they don't phone home, and when you self-host, your data stays on your infrastructure. But using the hosted APIs from DeepSeek or Alibaba means your data routes through Chinese-jurisdiction servers, a non-starter for many regulated industries and government contracts. Our recommendation for clients in regulated sectors: always self-host. Download the weights, run them on your infrastructure, and treat the model as a software artifact rather than a service. This eliminates jurisdiction concerns entirely while capturing the cost benefits.

Export Controls and Continuity Risk

U.S. export controls restrict the flow of advanced AI chips to China, which is precisely why DeepSeek invested in Huawei chip compatibility. The risk for Western enterprises adopting these models isn't that the models will stop working, once you have the weights, they're yours. The risk is that future versions may diverge architecturally if the hardware ecosystems fully decouple. Mitigate this by maintaining model-agnostic serving infrastructure (vLLM supports both ecosystems) and avoiding tight coupling to model-specific features.

08 · What This Means for Your AI Strategy

The practical takeaway from DeepSeek V4 and Qwen 3.5 isn't "switch everything to open source." It's "stop defaulting to proprietary models without running the numbers."

For most enterprises, the optimal architecture in 2026 is a routing layer: send 80% of requests, classification, extraction, summarization, translation, to open-weight models that cost a fraction of proprietary alternatives. Reserve the remaining 20%, complex reasoning, novel code generation, nuanced analysis, for GPT-5 or Claude where the quality premium justifies the cost.

At Particula Tech, we've been deploying this hybrid approach since late 2025. The typical result: 60-70% reduction in AI infrastructure costs with no measurable degradation in output quality for the routed workloads. The open-source vs custom model decision has shifted permanently, open-weight models are now the default starting point, not the budget fallback.

The era when "open source" meant "second tier" is over. DeepSeek V4 and Qwen 3.5 didn't just close the gap with proprietary models. For the workloads that matter most to production systems, they've moved ahead. And they're not alone, Xiaomi's MiMo-V2-Pro recently proved that competitive models can come from hardware companies too, secretly topping OpenRouter before anyone knew who built it. April 2026 then delivered the next wave with DeepSeek V4-Pro, Kimi K2.6, and GLM-5.1 in an 18-day window, three open-weight coding models that now match Opus 4.6 within a point. The companies that adjust their model strategy accordingly will save millions. The ones that don't will be paying a premium for inertia.

09 · FAQ

Quick answers to the questions this post tends to raise.

BLOG/LLMS & MODELS

DeepSeek V4 and Qwen 3.5: Open-Source AI Is Rewriting the Rules in 2026

DeepSeek and Qwen now hold 15% of the global AI market, up from 1% a year ago. Here's what V4 and 3.5 actually deliver, what they cost, and when they beat proprietary models.

Sebastian MondragonMARCH 06, 2026 · 11 MIN READ

01 · The Open-Source AI Inflection Point

02 · DeepSeek V4: A Trillion Parameters, 32 Billion Active

Multimodal From the Ground Up

The 1-Million-Token Context Window

Training Efficiency and Cost Structure

The Hardware Story

03 · Qwen 3.5: The Agentic Open-Weight Powerhouse

The Apache 2.0 Advantage

Benchmark Performance That Matters

Qwen 3.5 doesn't win every benchmark, but it wins the ones that correlate with real-world production value:

Built for Agents

The Speed Factor

Benchmark	Qwen 3.5	GPT-5.2	Claude Opus 4.6	Winner
MathVision	88.6	83.0	82.1	Qwen 3.5
MMMU (multimodal understanding)	85.0	83.2	81.7	Qwen 3.5
IFBench (instruction following)	76.5	75.4	74.8	Qwen 3.5
MultiChallenge	67.6	57.9	60.2	Qwen 3.5
BrowseComp (web browsing)	78.6	76.1	72.3	Qwen 3.5
SWE-bench Verified (coding)	76.4	80.0	80.9	Claude
AIME 2026 (math reasoning)	91.3	96.7	93.3	GPT-5.2
Tau2-Bench (agentic tasks)	86.7	85.2	91.6	Claude

04 · Head-to-Head: DeepSeek V4 vs Qwen 3.5

For teams deciding between these two, here's the comparison that actually matters:

Dimension	DeepSeek V4	Qwen 3.5
Total parameters	~1T (32B active)	397B (17B active)
Architecture	MoE, 16 expert pathways	MoE, hybrid attention
Native context	1M+ tokens	256K (1M with YaRN)
Multimodal	Text, image, video, audio	Text, image, video
License	Custom permissive	Apache 2.0
Languages	~50	201
API input cost	~$0.14/M tokens	~$0.10/M tokens
API output cost	~$0.28/M tokens	~$0.40/M tokens
Coding (SWE-bench)	~80%+ (leaked)	76.4%
Vision (MathVision)	Not independently verified	88.6%
Self-host minimum	8× H100 (Q8)	4× H100 (Q8)
Best for	Long-context, coding, multimodal	Multilingual, agents, vision

05 · When Open-Source Beats Proprietary in Production

The "open-source vs proprietary" framing is outdated. The real question is: for which specific workloads does the cost-performance ratio of open-weight models justify the operational overhead?

Workloads Where Open-Weight Models Win

Workloads Where Proprietary Still Wins

06 · Self-Hosting: The Real Cost Breakdown

Self-hosting open-weight models is where the largest savings live, but only above a certain scale. Here's what the economics actually look like based on the deployments we've managed.

Hardware Requirements

The Breakeven Calculation

Monthly API cost at scale with DeepSeek V4 ($0.14/M input):

Setup	Hardware	Cost	Fits
Development	1× RTX 4090 (24GB)	~$2,000	Qwen 3.5 32B (Q4), DeepSeek V4 active path (Q4)
Small production	2× A100 80GB	~$30,000	Qwen 3.5 72B (Q8), DeepSeek V4 active path (BF16)
Full production	8× H100 80GB	~$250,000	Qwen 3.5 397B (Q4), DeepSeek V4 full (Q8)
Maximum scale	16-24× H100	~$400,000+	DeepSeek V4 full (BF16)

Monthly tokens	DeepSeek API	GPT-5 API	Self-host (amortized)
5M	$0.70	$15	~$8,000
15M	$2.10	$45	~$8,000
50M	$7.00	$150	~$8,000
500M	$70	$1,500	~$8,000

07 · Geopolitical Realities and Supply Chain Implications

The Hardware Decoupling

Data Sovereignty

Export Controls and Continuity Risk

08 · What This Means for Your AI Strategy

The practical takeaway from DeepSeek V4 and Qwen 3.5 isn't "switch everything to open source." It's "stop defaulting to proprietary models without running the numbers."

09 · FAQ

Quick answers to the questions this post tends to raise.