Model Benchmarks
Every top model on the market — free and paid, cloud and local — ranked by the benchmarks people actually cite. Updated daily.
Updated: 2026-06-16
| # | Model | Pricing | Best at | Access | ||||
|---|---|---|---|---|---|---|---|---|
| 1 | Claude Opus 4.8 Anthropic | Free + Paid $75/M out | 74 | 1452 | 82% | 92% | AgenticCodingReasoning | |
| 2 | GPT-5.2 OpenAI | Free + Paid $60/M out | 72 | 1448 | 78% | 94% | ReasoningMathCoding | |
| 3 | Gemini 3 Pro Google DeepMind | Free + Paid $40/M out | 71 | 1445 | 74% | 90% | Long contextVisionResearch | |
| 4 | Claude Sonnet 4.6 Anthropic | Free + Paid $15/M out | 69 | 1430 | 77% | 86% | CodingAgenticWriting | |
| 5 | Grok 4.1 xAI | Free + Paid $30/M out | 68 | 1425 | 70% | 88% | ReasoningResearchAnalysis | |
| 6 | DeepSeek V4 DeepSeek · Open | Free + Paid $2/M out | 66 | 1418 | 72% | 89% | CodingMathReasoning | |
| 7 | Qwen3.5 Max Alibaba · Open | Free + Paid $3/M out | 65 | 1415 | 69% | 87% | CodingMultilingualReasoning | |
| 8 | Llama 4 Maverick Meta · Open | Free + Paid $2/M out | 61 | 1400 | 62% | 80% | Long contextGeneralWriting | |
| 9 | Mistral Large 3 Mistral AI · Open | Free + Paid $6/M out | 60 | 1395 | 60% | 78% | MultilingualWritingCoding | |
| 10 | GPT-5.2 mini OpenAI | Free + Paid $2/M out | 58 | 1388 | 58% | 82% | GeneralWritingCoding | |
| 11 | Gemini 3 Flash Google DeepMind | Free + Paid $1/M out | 57 | 1382 | 55% | 79% | Long contextDataVision | |
| 12 | FLUX.2 Black Forest Labs · Open | Free + Paid | — | 1180 | — | — | Image gen | |
| 13 | Veo 3 Google DeepMind | Paid | — | — | — | — | Video gen | |
| 14 | Whisper large-v3 OpenAI · Open | Free | — | — | — | — | Audio |
Benchmark figures track each model's published scores (Artificial Analysis Intelligence Index, LMArena, MMLU-Pro, GPQA, SWE-bench, AIME) and are refreshed daily. Treat them as a guide, not gospel.