Usually don’t believe the benchmarks but first in web dev arena specifically is crazy. That one has been Claude for so long, which tracks in my experience
For me, Claude 3.7 was a noticeable step down across a wide range of tasks when compared to 3.5 with the same prompt. Benchmarks are one thing, but for real life use, I kept finding myself switching back to 3.5. Wouldn't be surprised if they were trying to figure out what happened there and how to prevent that in the next version.