Interesting, never knew about that! I filled out my details, then went to https://huggingface.co/openai/gpt-oss-120b but I'm not sure if I see any difference? Where is it supposed to show if I can run it or not?
Maybe I'm spoiled by having great internet connection, but I usually download the weights and try to run them via various tools (llama.cpp, LM Studio, vLLM and SGLang typically) and see what works. There seems to be so many variables involved (runners, architectures, implementations, hardware and so on) that none of the calculators I've tried so far been accurate, both in the way that they've over-estimated and under-estimated what I could run.
So in the end, trying to actually run them seems to be the only fool-proof way of knowing for sure :)
While it is seemingly hard to calculate it, maybe one should just make a database website that tracks specific setups (model, exact variant / quantisation, runner, hardware) where users can report, which combination they got running (or not) along with metrics like tokens/s.
Visitors could then specify their runner and hardware and filter for a list of models that would run on that.
Do you guys know a website that clearly shows which OS LLM models run on / fit into a specific GPU(setup)?
The best heuristic i could find for the necessary VRAM is Number of Parameters × (Precision / 8) × 1.2 from here [0].
[0] https://medium.com/@lmpo/a-guide-to-estimating-vram-for-llms...