GPQA Diamond

Reasoning% accuracy

Graduate-level, Google-proof science reasoning questions.

Updated 4 days agoLatest measured Apr 18, 202610 verified · 6 self-reported

Verified results come from third-party or public leaderboard sources. Self-reported results come from provider papers, blogs, or vendor disclosures and should be compared with extra caution.

At a glance

🏆 Top score

Claude Opus 4.7 Anthropic87.88 % accuracy

Total results

Models tested

Providers

Verified · Self-reported

10 · 6

Average

72.22 % accuracy

Median

73 % accuracy

Range

35.35 – 87.88 % accuracy

Score distribution

Methodology

Expert-written multiple-choice questions in biology, chemistry, and physics, designed to be difficult even with web search. Diamond subset is the hardest.

Limitations

Small test set, high variance. Only measures a narrow slice of scientific reasoning.

By provider

Anthropic· 3 models
87.88 % accuracy
Claude Opus 4.7
Average: 82.24 % accuracyBest: 87.88 % accuracy
OpenAI· 7 models
87.7 % accuracy
o3
Average: 74.07 % accuracyBest: 87.7 % accuracy
xAI· 1 model
84.6 % accuracy
Grok 3

Full leaderboard

Showing 16 of 16

ProviderSourceSort by

#	Model	Provider	Score (% accuracy)	Source	Date
1	Claude Opus 4.7	Anthropic	87.88	Third-party llm-stats.com	Apr 18, 2026
2	o3	OpenAI	87.7	Self-reported OpenAI o3 system card	Jun 1, 2025
3	Claude Opus 4.6	Anthropic	84.85	Third-party llm-stats.com	Apr 18, 2026
4	Grok 3	xAI	84.6	Self-reported xAI model card	Jun 1, 2025
5	GPT-5.2	OpenAI	82.32	Third-party llm-stats.com	Apr 18, 2026
6	o1	OpenAI	78	Self-reported OpenAI o1 system card	Jun 1, 2025
7	GPT-5.4	OpenAI	77.27	Third-party llm-stats.com	Apr 18, 2026
8	Claude Opus 4	Anthropic	74	Self-reported Anthropic model card	Jun 1, 2025
9	GPT-5	OpenAI	72	Third-party Artificial Analysis	Jun 1, 2025
10	DeepSeek R1	DeepSeek	71.5	Self-reported DeepSeek R1 paper	Jun 1, 2025
11	Gemma 4 31B	Google	69.7	Third-party llm-stats.com	Apr 18, 2026
12	Gemini 2 Pro	Google	68.1	Self-reported Google model card	Jun 1, 2025
13	Qwen3.5-27B	Alibaba	61.11	Third-party llm-stats.com	Apr 18, 2026
14	GPT-5.4 mini	OpenAI	60.61	Third-party llm-stats.com	Apr 18, 2026
15	GPT-5.4 nano	OpenAI	60.61	Third-party llm-stats.com	Apr 18, 2026
16	Qwen3.5 397B A17B	Alibaba	35.35	Third-party llm-stats.com	Apr 18, 2026

Community ratings

No ratings yet. Be the first to rate GPQA Diamond.