Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Claude notably does not use RLHF, but uses RLAIF, using a LLM to generate the preferences based a "constitution" instead of human preferences. It's remarkable that it can bootstrap itself up to such high quality. See https://arxiv.org/pdf/2212.08073 for more.


I thought Claude used human feedback due to Surge claiming they were a customer:

https://www.surgehq.ai/case-studies/anthropic-claude-surgeai...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: