but if you want to use google SDK (python-genai, js-genai) rather than openai SDK (If found google api more feature rich when using different modality like audio/images/video) you cannot use openrouter. Also not sure if you are developing app and needs higher rate limits - what's typical rate limit via openrouter?
also for some reason I tested simple prompt (few words, no system prompt) with attached 1 images and openrouter charged me like ~1700 tokens when on the other hand using directly via python-genai its like ~400 tokens. Also keep in mind they charge small markup fee when you top you their account.
You can do this with LLM proxies like LiteLLM. e.g. Cursor -> LiteLLM -> LLM provider API.
I have LiteLLM server running locally with Langfuse to view traces. You configure LiteLLM to connect directly to providers' APIs. This has the added benefit of being able to create LiteLLM API keys per project that proxies to different sets of provider API keys to monitor or cap billing usage.
You need LLM Ops. YC happens to have invested in Langfuse, which is if you're serious about tracking metrics, you'll appreciate the rest, too.
And before you ask: yes, for cached content and batch completion discounts you can accommodate both—just needs a bit of logic in your completion-layer code.