Okay, I read the paper. I see what they are saying but I strongly disagree that the model is "thinking". They have highlighted that relationships between words is complicated, which we already knew. They also point out that some words are related to other words which are related to other words which, again, we already knew. Lastly they used their model (not Claude) to change the weights associated with some words, thus changing the output to meet their predictions, which I agree is very interesting.
Interpreting the relationship between words as "multi-hop reasoning" is more about changing the words we use to talk about things and less about fundamental changes in the way LLMs work. It's still doing the same thing it did two years ago (although much faster and better). It's guessing the next token.
Interpreting the relationship between words as "multi-hop reasoning" is more about changing the words we use to talk about things and less about fundamental changes in the way LLMs work. It's still doing the same thing it did two years ago (although much faster and better). It's guessing the next token.