Claude Sonnet 4.6 — Anthropic | Modeldex

Claude Sonnet 4.6

Active44.8Basic

Anthropic Claude Sonnet 4.6.

Updated 4 days agoStructured data from Modeldex catalog

VisionAgenticLong contextCode

API release

Feb 17, 2026(2 months ago)

Intelligence Index

44.8/ 100

Basicweighted across 6 benchmarks

Coding: 72.7
Factual grounding: 63.6
Instruction following: 41.7
Long context: 28.2
Medical: 17.7

Computed as the mean of per-category averages across MMLU, GPQA, SWE-bench, HumanEval, MATH, GSM8K, AIME, Aider Polyglot and more. See each for methodology.

History

Claude Sonnet 4.6 became available via the Anthropic API on 2026-02-17.

Training & availability

Anthropic has not released the underlying model weights — access is via their hosted API only.

Capabilities

Context window: 1.0M tokens.
Max output: 64K tokens.
Input modalities: text, image.

Recommended for: vision, agentic, long-context, code.

Example interactions

Curated prompts showing the model's response style — not cherry-picked to impress, picked to show what typical output looks like.

Prompt

Explain how a transformer attention layer works, for a programmer who understands matrices but hasn't touched ML before.

Response

Attention is a weighted lookup. Given a token, the layer computes how much attention to pay to every other token in the sequence, then produces a weighted sum of their values. Three projections per token: - **Query (Q):** what this token is looking for - **Key (K):** what this token offers others - **Value (V):** the information it contributes if attended to For each position, compute scores = Q · Kᵀ (dot product against every other token's key). Softmax normalizes these into weights. Output = weights · V. Multi-head attention runs this in parallel with different Q/K/V projections, so different "heads" learn to track different relationships — syntax, coreference, topic, etc. Concat and project back. The matrix view: if you have n tokens and d_model dims, Q/K/V are (n × d_k) each. Attention output is (n × d_k). With h heads, final output concatenates to (n × d_model).

Quick start

Minimal example using the anthropic API. Copy, paste, replace the key.

from anthropic import Anthropic

client = Anthropic(api_key="sk-ant-...")
resp = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Explain quantum computing in one sentence."}],
)
print(resp.content[0].text)

Benchmarks

Benchmark	Score	Source
FACTS GroundingFactual grounding	63.64%	Third-party llm-stats.com
HealthBenchMedical	17.74%	Third-party llm-stats.com
LongBench v2Long context	5.56%	Third-party llm-stats.com

Integrations & tooling support

Tool calling: Supported
Structured outputs: Supported

Price vs quality

Competent benchmarks

Solid benchmark performance. Pricing not publicly available — check the provider.