You don't need io_uring. For many workloads being slow & inefficient is acceptable, isn't awful. But gee I'd rather start from a modern baseline that has high levels of mechanistic sympathy with the hardware, where things like network & io work can be done in an efficient async manner.
Why do I need io_uring? Because it sounds awful and unhackerly to suffer living in a much lesser worse world.
Mechanical sympathy is understanding the system, not using the shiniest thing. If you want low latency processing of one event at a time, you are either going to burn an entire core spinning or you are going to do a syscall for each operation. The io_uring syscalls are not especially fast — they get their awesomeness by doing, potentially, a whole lot of work per operation. And, for some use cases, by having a superior async IO model.
But if you actually just want read(), then call read().
Low latency for a single event is never going to have mechanistic sympathy, will be a colossal waste of most of your system.
Highly concurrent system usage is what it takes. EPOLLEXCLUSIVE (2016) finally sort of gets epoll vaguely capable of what OSes were doing decades ago but is still difficult to use & a rats nest of complexity. Who here feels good reading https://stackoverflow.com/questions/41582560/how-does-epolls... ?
The submission/completion queue model of io_uring makes sense. It lets work be added or resolved without crossing that painful slow kernel barrier. It's been expanded to offer a lot more operations than what could be done in epoll.
The "shiniest thing" is a vast leap in capabilities, systems legibility, and overall (not single operation) throughout. You cannot remotely get the numbers io_uring was bringing three years ago any other way. And it's only gotten further and further ahead while everyone else has sat still.
> Low latency for a single event is never going to have mechanistic sympathy, will be a colossal waste of most of your system.
Excuse me? I maintain a production system that cares about low latency for single events. Declaring that it doesn’t have “mechanistic sympathy” entirely misses the point. Of course I’m not squeezing the most throughput out of every cycle of my CPU. I have a set of design requirements, I understand what the kernel and CPU and IO system do under the hood, and I designed the system to make the most of the resources at hand to achieve the design requirements. Which, in this case, are minimal latency for single events or small groups of events, and io_uring would have no benefit.
(I can steam in events at a very nice rate as measured in events/sec, but I never tried to optimize that, and I should not try to optimize that because it would make the overall system perform worse.)
You aren't using your chips efficiently. That's basically it. Maybe your use case justifies it but you are not taking advantage of a massive part of what chips do. That's on you. And it does make you a pretty weird different use case than most software development.
Fine, you've talked you yourself deeply into a conviction that async doesn't and won't ever matter for you. But man, most people are properly doing the right thing by optimizing for throighput, not single events, and async has altered the game on amazingly colossally positive ways for computing efficiency.
if you want mechanistic sympathy and low latency then you can't really do much better than dpdk; uring is still going through the very generic and abstracted kernel networking stack.
Why do I need io_uring? Because it sounds awful and unhackerly to suffer living in a much lesser worse world.