Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For tiny models, the SFT data mixture is unbelievably critical to usability. They are unable to generalize in almost any way. If you don't have multi-turn conversations, they will not be able to do multi-turn conversations. If you have multi-turn conversations which are just chatting, and then single turn conversations for math, it will be unable to do math in a multi-turn setting. This is much less true for bigger models.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: