WebDev Leaderboard

Compare the performance of AI models for web development tasks built in the Code Arena
The legacy WebDev leaderboard is still available at web.lmarena.ai

Last Updated

Dec 16, 2025

Total Votes

66,698

Total Models

/

	Rank Spread (Upper-Lower)
1	1◄─►1	claude-opus-4-5-20251101-thinking-32k	1518	+13/-13	3,592	Anthropic	Proprietary
2	2◄─►5	gpt-5.2-high	1485	+17/-17	1,646	OpenAI	Proprietary
3	2◄─►5	claude-opus-4-5-20251101	1484	+12/-12	3,511	Anthropic	Proprietary
4	2◄─►5	gemini-3-pro	1481	+10/-10	8,535	Google	Proprietary
5	2◄─►5	gemini-3-flash	1465	+15/-15	1,725	Google	Proprietary
6	6◄─►12	gpt-5-medium	1399	+12/-12	3,948	OpenAI	Proprietary
7	6◄─►13	gpt-5.2	1399	+15/-15	1,640	OpenAI	Proprietary
8	6◄─►13	gpt-5.1-medium	1393	+11/-11	4,645	OpenAI	Proprietary
9	6◄─►13	claude-sonnet-4-5-20250929-thinking-32k	1393	+9/-9	7,580	Anthropic	Proprietary
10	6◄─►13	claude-opus-4-1-20250805	1392	+10/-10	7,296	Anthropic	Proprietary
11	6◄─►13	claude-sonnet-4-5-20250929	1387	+9/-9	8,626	Anthropic	Proprietary
12	6◄─►15	gemini-3-flash (thinking-minimal)	1376	+15/-15	1,690	Google	Proprietary
13	12◄─►15	glm-4.6	1368	+10/-10	6,981	Z.ai	MIT
14	7◄─►16	deepseek-v3.2-thinking	1367	+20/-19	955	DeepSeek AI	MIT
15	12◄─►16	gpt-5.1	1359	+10/-10	6,540	OpenAI	Proprietary
16	14◄─►17	kimi-k2-thinking-turbo	1345	+10/-10	6,359	Moonshot	Modified MIT
17	16◄─►18	gpt-5.1-codex	1335	+10/-10	4,793	OpenAI	Proprietary
18	17◄─►18	minimax-m2	1317	+10/-10	7,037	MiniMax	Apache 2.0
19	19◄─►22	deepseek-v3.2-exp	1294	+10/-10	5,156	DeepSeek AI	MIT
20	19◄─►22	qwen3-coder-480b-a35b-instruct	1291	+9/-9	7,246	Alibaba	Apache 2.0
21	19◄─►23	claude-haiku-4-5-20251001	1289	+10/-10	7,305	Anthropic	Proprietary
22	19◄─►23	deepseek-v3.2	1286	+17/-17	1,230	DeepSeek AI	MIT
23	21◄─►24	KAT-Coder-Pro-V1	1264	+15/-15	1,945	KwaiKAT	Proprietary
24	23◄─►26	gpt-5.1-codex-mini	1252	+17/-17	1,565	OpenAI	Proprietary
25	24◄─►28	grok-4-1-fast-reasoning	1227	+13/-13	3,715	xAI	Proprietary
26	24◄─►28	mistral-large-3	1226	+20/-20	1,025	Mistral	Apache 2.0
27	25◄─►28	gemini-2.5-pro	1213	+13/-13	3,505	Google	Proprietary
28	25◄─►28	grok-4.1-thinking	1206	+19/-19	1,261	xAI	Proprietary
29	29◄─►30	grok-4-fast-reasoning	1153	+23/-23	945	xAI	Proprietary
30	29◄─►31	grok-code-fast-1	1144	+21/-21	1,014	xAI	Proprietary
31	30◄─►31	devstral-medium-2507	1103	+22/-22	1,033	Mistral	Proprietary

WebDev Leaderboard

Remove Style Control Leaderboard Plots

Average Win Rate Against All Other Models (Uniform Sampling and No Ties)

Confidence Intervals on Model Strength (via Bootstrapping)

Battle Count for Each Combination of Models (without Ties)

Fraction of Model A Wins for All Non-tied A vs. B Battles