Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
myrmidon
on Jan 23, 2025
|
parent
|
context
|
favorite
| on:
Results of "Humanity's Last Exam" benchmark publis...
Is there a text-only evaluation of the non-Deepseek models? Because being evaluated on text-only might have helped the other models immensely as well from what I can tell?
GaggiX
on Jan 23, 2025
[–]
>Is there a text-only evaluation of the non-Deepseek models?
Not that I can see but it would be cool to have, maybe the paper will a more complete evaluation.
famouswaffles
on Jan 23, 2025
|
parent
[–]
Section C.2 of the paper (pg 24) has text only evaluations of other models.
GaggiX
on Jan 23, 2025
|
root
|
parent
[–]
Oh I see, the paper is out, I read "(arXiv coming soon)" and though it wasn't released yet.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: