Hacker Newsnew | past | comments | ask | show | jobs | submit | SubiculumCode's commentslogin

That benchmark is pretty saturated, tbh. A "regression" of such small magnitude could mean many different things or nothing at all.

Isn't SWE-Bench Verified pretty saturated by now?

Depends what you mean by saturated. It's still possible to score substantially higher, but there is a steep difficulty jump that makes climbing above 80%ish pretty hard (for now). If you look under the hood, it's also a surprisingly poor eval in some respects - it only tests Python (a ton of Django) and it can suffer from pretty bad contamination problems because most models, especially the big ones, remember these repos from their training. This is why OpenAI switched to reporting SWE-Bench Pro instead of SWE-bench Verified.

Dumb question: Can inference be done in a reverse pass? Outputs predicting inputs?

Strictly speaking: no. The "forward pass" terminology does not imply that there exists a "reverse pass" that does the same kind of computation. Rather, it's describing two different kinds of computation, and the direction they occur in.

The forward pass is propagating from inputs to outputs, computing the thing the model was trained for. The reverse/backwards pass is propagating from outputs back to inputs, but it's calculating the gradients of parameters for training (rougly: how much changing each parameter in isolation affects the output, and whether it makes the output closer to the desired training output). The result of the "reverse pass" isn't a set of inputs, but a set of annotations on the model's parameters that guide their adjustment.

The computations of the forward pass are not trivially reversible (e.g. they include additions, which destroys information about the operand values). As a sibling thread points out, you can still probabilistically explore what inputs _could_ produce a given output, and get some information back that way, but it's a lossy process.

And of course, you could train a "reverse" model, one that predicts the prefix of a sequence given a suffix (trivially: it's the same suffix prediction problem, but you train it on reversed sequences). But that would be a separate model trained from scratch on that task, and in that model the prefix prediction would be its forward pass.


I do want to see ChatGPT running upwards on my screen now, predicting earlier and earlier words in a futile attempt to explain a nonsense conclusion. We could call it ChatJeopardy.

Not as trivially as the forwards direction, unsurprisingly information is lost, but better than you might expect. See for example https://arxiv.org/pdf/2405.15012

Sounds like a great premise for a sci-fi short story.

Sci-fi ? You mean historical fiction!

The power is in the tails

Flameshot is the best. I don't know about HD, and maybe if I get a HD screen I'll find out, but right now it's the slickest

Wow.

Kinda fucked we cant tell the difference anymore

Precision Medicine is the way, and maybe we will get there one day Too many effective agents are averaged away because the population for whom it is effective is just a subset of the population with the targeted symptom.

I am having a hard time seeing that the install config isn't just basically bash with some aliases...but I still haven't had my second cup of coffee.

This is the same thing I thought when liberal-minded folk talked about giving the Federal government more power over States in order to enact some good work, or to achieve some policy win. Yes, for now, I thought, but you can't assume a good natured centralized power will persist, and when it doesn't, what is your alternative? I have watched as liberal minded folks rediscover the value of State Sovereignty and power in the face of an autocratic Federal executive, bearing arms when the an autocrat sends masked agents to terrorize your city. Lean into it, I say. Winner take all Federal system means no alternative but to win at all costs, rather than live and let live. We need more, smaller, States. We need more Representatives than 1 per 700,000 citizens...by 10x

Honestly, I feel humans are similar. It's the generator <-> executive loop that keeps things right

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: