While you're here.. Do you guys know a website that clearly shows which OS LLM m...

philipkiely · 2025-08-07T12:59:33 1754571573

Yeah we have tried to build calculators before it just depends so much.

Your equation is roughly correct, but I tend to multiply by a factor of 2 not 1.2 to allow for highly concurrent traffic.

reactordev · 2025-08-07T11:48:28 1754567308

huggingface has this built in if you care to fill out your software and hardware profile here:

https://huggingface.co/settings/local-apps

Then on the model pages, it will show you whether you can use it.

diggan · 2025-08-07T12:41:46 1754570506

Interesting, never knew about that! I filled out my details, then went to https://huggingface.co/openai/gpt-oss-120b but I'm not sure if I see any difference? Where is it supposed to show if I can run it or not?

reactordev · 2025-08-07T14:51:28 1754578288

You’ll see green check next to models you can use on the model card.

https://huggingface.co/unsloth/gpt-oss-20b-GGUF

diggan · 2025-08-07T15:20:08 1754580008

Ah, it only works for GGUF, not for .safetensors (which the format HuggingFace themselves came up with :P ) ? I see the checks at https://huggingface.co/unsloth/gpt-oss-20b-GGUF but nothing at https://huggingface.co/openai/gpt-oss-120b, seems a bit backwards.

reactordev · 2025-08-08T15:15:29 1754666129

For those kind of models, you know if you can run them. :D

Also most of the times they are split up and, sometimes, you’ll get an indicator on the splits.

It’s still a work in progress to check all hardware and model format compatibility but it’s a great start until GGUF becomes the standard.

diggan · 2025-08-07T11:34:55 1754566495

Maybe I'm spoiled by having great internet connection, but I usually download the weights and try to run them via various tools (llama.cpp, LM Studio, vLLM and SGLang typically) and see what works. There seems to be so many variables involved (runners, architectures, implementations, hardware and so on) that none of the calculators I've tried so far been accurate, both in the way that they've over-estimated and under-estimated what I could run.

So in the end, trying to actually run them seems to be the only fool-proof way of knowing for sure :)

lagrange77 · 2025-08-07T13:42:40 1754574160

Thanks for your answers!

While it is seemingly hard to calculate it, maybe one should just make a database website that tracks specific setups (model, exact variant / quantisation, runner, hardware) where users can report, which combination they got running (or not) along with metrics like tokens/s.

Visitors could then specify their runner and hardware and filter for a list of models that would run on that.

diggan · 2025-08-07T15:19:10 1754579950

Yeah, what you're suggesting sounds like it could be more useful than the "generalized calculators" people are currently publishing and using.