Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
z7
7 months ago
|
parent
|
context
|
favorite
| on:
GPT-5
GPT-5 is #1 on WebDev Arena with +75 pts over Gemini 2.5 Pro and +100 pts over Claude Opus 4:
https://lmarena.ai/leaderboard
virgildotcodes
7 months ago
|
next
[–]
This same leaderboard lists a bunch of models, including 4o, beating out Opus 4, which seems off.
afro88
7 months ago
|
parent
|
next
[–]
In my experience Opus 4 isn't as good for day to day coding tasks as Sonnet 4. It's better as a planner
zamadatix
7 months ago
|
prev
|
next
[–]
"+100 points" sounds like a lot until you do the ELO math and see that means 1 out of 3 people still preferred Claud Opus 4's response. Remember 1 out of 2 would place the models dead even.
degrews
7 months ago
|
prev
|
next
[–]
That eval hasn't been relevant for a while now. Performance there just doesn't seem to correlate well with real-world performance.
Too
7 months ago
|
prev
[–]
What does +75 arbitrary points mean in practice? Can we come up with units that relate to something in the real world.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search:
https://lmarena.ai/leaderboard