I'd guess the benefit is that it's quicker/easier to experiment with the prompt?... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		mcintyre1994 10 months ago \| parent \| context \| favorite \| on: Claude's system prompt is over 24k tokens with too... I'd guess the benefit is that it's quicker/easier to experiment with the prompt? Claude has prompt caching, I'm not sure how efficient that is but they offer a discount on requests that make use of it. So it might be that that's efficient enough that it's worth the tradeoff for them? Also I don't think much of this prompt is used in the API, and a bunch of it is enabling specific UI features like Artifacts. So if they re-use the same model for the API (I'm guessing they do but I don't know) then I guess they're limited in terms of fine tuning.

int_19h 10 months ago [–]

Prompt caching is functionally identical to snapshotting the model after it processed the prompt. And you need the KV cache for inference in any case so it doesn't even cost extra memory to keep it around, if every single inference task is going to have the same prompt suffix.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact