$200? Does this use reasoning? Does it involve forgetting to use KV caching?
This should cost well under $1. Process the prompt. Then, for each word, input that word and then the end of prompt token, get your one token of output (maybe two if your favorite model wants to start with a start-of-reply token), and that’s it.
Yes, it uses reasoning. I tried without it, and at the time with OpenAI's api, it was not giving such good answers. Reasoning improved it a fair amount.
This should cost well under $1. Process the prompt. Then, for each word, input that word and then the end of prompt token, get your one token of output (maybe two if your favorite model wants to start with a start-of-reply token), and that’s it.