Aider Polyglot
Coding% pass@2Real-world coding edits across 6 programming languages — measures whether the model produces a correct edit accepted on second attempt.
Updated yesterdayLatest measured Apr 21, 202622 verified · 0 self-reported
Verified results come from third-party or public leaderboard sources. Self-reported results come from provider papers, blogs, or vendor disclosures and should be compared with extra caution.
At a glance
Total results
22
Models tested
22
Providers
7
Verified · Self-reported
22 · 0
Average
49.88 % pass@2
Median
53.55 % pass@2
Range
3.6 – 88 % pass@2
Latest result
Apr 21, 2026
Score distribution
2
1
0
2
3
5
5
0
Methodology
225 exercism-style coding exercises in C++, Go, Java, JavaScript, Python and Rust. Score = pass_rate_2 (% of cases where the model's second attempt produces a passing solution).
Limitations
Aider's harness shapes prompts in a specific way — results may not directly compare to other coding benchmarks.
By provider
- Average: 53.62 % pass@2Best: 88 % pass@2
- Google· 1 model72.9 % pass@2Gemini 2.5 ProAverage: 72.9 % pass@2Best: 72.9 % pass@2
- Alibaba· 2 models59.6 % pass@2Qwen3 235B A22B
Full leaderboard
Showing 22 of 22| # | Model | Provider | Score (% pass@2) |
|---|---|---|---|
| 1 | GPT-5 | OpenAI | 88 |
| 2 | o3-pro | OpenAI | 84.9 |
| 3 | Gemini 2.5 Pro | 72.9 | |
| 4 | o4-mini | OpenAI | 72 |
| 5 | o1 | OpenAI | 61.7 |
| 6 | Qwen3 235B A22B | Alibaba | 59.6 |
| 7 | DeepSeek R1 | DeepSeek | 56.9 |
| 8 | DeepSeek V3 | DeepSeek | 55.1 |
| 9 | DeepSeek V3 | DeepSeek | 55.1 |
| 10 | o3-mini | OpenAI | 53.8 |
| 11 | o3 | OpenAI | 53.8 |
| 12 | Grok 3 | xAI | 53.3 |
| 13 | Grok 3 Beta | xAI | 53.3 |
| 14 | GPT-4.1 | OpenAI | 52.4 |
| 15 | GPT-4 | OpenAI | 44.9 |
| 16 | gpt-oss-120b | OpenAI | 41.8 |
| 17 | Qwen3 32B | Alibaba | 40 |
| 18 | Grok 3 Mini | xAI | 34.7 |
| 19 | o1-mini | OpenAI | 32.9 |
| 20 | Llama 4 Maverick | Meta | 15.6 |
| 21 | Codestral | Mistral AI | 11.1 |
| 22 | GPT-4o | OpenAI | 3.6 |
Community ratings
No ratings yet. Be the first to rate Aider Polyglot.
Rate Aider Polyglot
Sign in to rate and review.
Comments
Sign in to leave a comment.