Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It seems there is a fundamental information theory aspect to this that would probably save us all a lot of trouble if we would just embrace it.

The #1 canary for me: Why does training an LLM require so much data that we are concerned we might run out of it?

The clear lack of generalization and/or internal world modeling is what is really in the way of a self-bootstrapping AGI/ASI. You can certainly try to emulate a world model with clever prompting (here's what you did last, heres your objective, etc.), but this seems seriously deficient to me based upon my testing so far.



In my experience, LLMs do a very poor job of generalizing. I have also seen self supervised transformer methods usually fail to generalize in my domain (which includes a lot of diversity and domain shifts). For human language, you can paper over failure to generalize by shoveling in more data. In other domains, that may not be an option.


It’s exactly what you would expect from what an LLM is. It predicts the next word in a sequence very well. Is that how our brains, or even a bird’s brain, for that matter, approach cognition? I don’t think that’s how any animals brain works at all, but that’s just my opinion. A lot of this discussion is speculation. We might as well all wait and see if AGI shows up. I’m not holding my breath.


Most of this is not speculation. It's informed from current leading theories in neuroscience of how our brain is thought to function.

See predictive coding and the free energy principle, which states the brain continually models reality and tries to minimize the prediction error.

https://en.m.wikipedia.org/wiki/Predictive_coding


At a certain high level I’m sure you can model the brain that way. But we know humans are neuroplastic, and through epigenetics it’s possible that learning in an individual’s life span will pass to their offspring. Which means human brains have been building internal predictive models for billions of years over innumerable individual lifespans. The idea that we’re anywhere close to replicating that with a neural net is completely preposterous. And besides my main point was that our brains don’t think one word at a time. I’m not sure how that relates to predictive processing.


Have you heard of predictive processing?


Couldn’t agree more. For specific applications like drug development where you have a constrained problem with fixed set of variables and a well defined cost function I’m sure the chess analogy will hold. But I think there a core elements of cognition missing from chatGPT that aren’t easily built.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: