GPT-5 is #1 on WebDev Arena with +75 pts over Gemini 2.5 Pro and +100 pts over C...

virgildotcodes · 2025-08-07T17:38:38 1754588318

This same leaderboard lists a bunch of models, including 4o, beating out Opus 4, which seems off.

afro88 · 2025-08-07T19:57:40 1754596660

In my experience Opus 4 isn't as good for day to day coding tasks as Sonnet 4. It's better as a planner

zamadatix · 2025-08-07T22:23:37 1754605417

"+100 points" sounds like a lot until you do the ELO math and see that means 1 out of 3 people still preferred Claud Opus 4's response. Remember 1 out of 2 would place the models dead even.

degrews · 2025-08-08T05:20:29 1754630429

That eval hasn't been relevant for a while now. Performance there just doesn't seem to correlate well with real-world performance.

Too · 2025-08-08T06:45:04 1754635504

What does +75 arbitrary points mean in practice? Can we come up with units that relate to something in the real world.