It seems there is a fundamental information theory aspect to this that would pro...

sdenton4 · on June 15, 2024

In my experience, LLMs do a very poor job of generalizing. I have also seen self supervised transformer methods usually fail to generalize in my domain (which includes a lot of diversity and domain shifts). For human language, you can paper over failure to generalize by shoveling in more data. In other domains, that may not be an option.

therobots927 · on June 15, 2024

It’s exactly what you would expect from what an LLM is. It predicts the next word in a sequence very well. Is that how our brains, or even a bird’s brain, for that matter, approach cognition? I don’t think that’s how any animals brain works at all, but that’s just my opinion. A lot of this discussion is speculation. We might as well all wait and see if AGI shows up. I’m not holding my breath.

stevenhuang · on June 15, 2024

Most of this is not speculation. It's informed from current leading theories in neuroscience of how our brain is thought to function.

See predictive coding and the free energy principle, which states the brain continually models reality and tries to minimize the prediction error.

https://en.m.wikipedia.org/wiki/Predictive_coding

therobots927 · on June 15, 2024

At a certain high level I’m sure you can model the brain that way. But we know humans are neuroplastic, and through epigenetics it’s possible that learning in an individual’s life span will pass to their offspring. Which means human brains have been building internal predictive models for billions of years over innumerable individual lifespans. The idea that we’re anywhere close to replicating that with a neural net is completely preposterous. And besides my main point was that our brains don’t think one word at a time. I’m not sure how that relates to predictive processing.

drdeca · on June 15, 2024

Have you heard of predictive processing?

therobots927 · on June 15, 2024

Couldn’t agree more. For specific applications like drug development where you have a constrained problem with fixed set of variables and a well defined cost function I’m sure the chess analogy will hold. But I think there a core elements of cognition missing from chatGPT that aren’t easily built.