DeepSeek V3 — DeepSeek | Modeldex

DeepSeek V3

Active84.0Frontier

Open-weight frontier model competitive with GPT-4o and Claude Sonnet at fraction of training cost.

Updated 4 days agoStructured data from Modeldex catalog

MathAgenticFrontierOpen sourceCode

Intelligence Index

84.0/ 100

Frontierweighted across 5 benchmarks

Math: 97.1
General knowledge: 82.2
Coding: 72.7

Computed as the mean of per-category averages across MMLU, GPQA, SWE-bench, HumanEval, MATH, GSM8K, AIME, Aider Polyglot and more. See each benchmark for methodology.

Overview

DeepSeek V3 is a 671B mixture-of-experts model (37B active parameters) trained for a reported $6M in compute — dramatically less than comparable frontier models. It matches GPT-4o and Claude Sonnet on most standard benchmarks.

History

DeepSeek V3 was released on 2024-12-26.

Training & availability

Weights are publicly available under the DeepSeek License (permissive) license, making this an open-weight model suitable for on-prem deployment and fine-tuning.

Capabilities

Context window: 128K tokens.
Max output: 8K tokens.
Input modalities: text.
Intelligence Index: 84.0/100.

Strongest categories: Math (97), General knowledge (82), Coding (73).

Recommended for: math, agentic, frontier, open-source, code.

Limitations

Text-only — cannot process images, audio, or video inputs.

Pricing

Input: $0.2700 per 1M tokens
Output: $1.1000 per 1M tokens

Use the cost calculator above to estimate monthly spend for your workload.

Example interactions

Curated prompts showing the model's response style — not cherry-picked to impress, picked to show what typical output looks like.

Prompt

Explain how a transformer attention layer works, for a programmer who understands matrices but hasn't touched ML before.

Response

Attention is a weighted lookup. Given a token, the layer computes how much attention to pay to every other token in the sequence, then produces a weighted sum of their values. Three projections per token: - **Query (Q):** what this token is looking for - **Key (K):** what this token offers others - **Value (V):** the information it contributes if attended to For each position, compute scores = Q · Kᵀ (dot product against every other token's key). Softmax normalizes these into weights. Output = weights · V. Multi-head attention runs this in parallel with different Q/K/V projections, so different "heads" learn to track different relationships — syntax, coreference, topic, etc. Concat and project back. The matrix view: if you have n tokens and d_model dims, Q/K/V are (n × d_k) each. Attention output is (n × d_k). With h heads, final output concatenates to (n × d_model).

Quick start

Minimal example using the OpenRouter API. Copy, paste, replace the key.

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="sk-or-...",
)
resp = client.chat.completions.create(
    model="deepseek/deepseek-v3",
    messages=[{"role": "user", "content": "Explain quantum computing in one sentence."}],
)
print(resp.choices[0].message.content)

Cost calculator

Estimate your monthly bill. Presets are typical workload sizes.

Input tokens / month5.0M

@ $0.27/1M

Output tokens / month2.0M

@ $1.1/1M

Input cost

$1.35

5.0M × $0.27/1M

Output cost

$2.2

2.0M × $1.1/1M

Total / month

$3.55

$42.6 / year

Benchmarks

Benchmark	Score	Source
Aider PolyglotCoding	55.1% pass@2	Third-party Papers With Code
GSM8KMath	97.1% accuracy	Self-reported DeepSeek tech report
HumanEvalCoding	90.2pass@1 %	Self-reported DeepSeek tech report

Integrations & tooling support

Tool calling: Supported
Structured outputs: Supported

Price vs quality

Solid value

Competent capability at a low price.

Quality percentile: 65.7%
Effective price: $0.892/1M
Pricing breakdown: $0.27/1M in
$1.1/1M out

Community ratings

No ratings yet. Be the first to rate DeepSeek V3.