Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
How Fast can A Single Instance of Redis be? (keydb.dev)
220 points by eblenkarn on June 18, 2019 | hide | past | favorite | 74 comments


IIUC the benchmark did not use pipelines and the numbers show that. Redis itself can outperform this by a factor of 10 with pipelining, and I'm betting the difference from the module won't be that big given such a setup (which is how you should ideally work anyway).

Still, nice to see the modules API being put to interesting use cases.


Yeah - be interesting to see the real use case here. Pipelining + scripting if needed normally go a long way. I'm curious who needs even more beyond this in terms of performance and if this module actually provides it.


Anecdotally, my experience running Redis servers on AWS (both standalone EC2 instances and ElastiCache dedicated Redis instances) is that network latency is likely to become a barrier before anything else. We struggled with the same performance problems on both small-footprint small-payload Redis DBs and on large ones, and paying for the next tier of network connectivity (between our applications on EC2 and our Redis servers on EC2/ElastiCache) did the most to alleviate delays.


This is very accurate. Seeing variations in GET / SET operations with ElastiCache up to 10000μs, which is way longer than the actual execution speed. After moving complex operations to transactions & even Lua scripts, performance is fairly acceptable again.

I assume the AWS network is pretty noisy.


Yeah, this is true in my case as well.

Years ago I built a layer on top of Redis that could also run in the browser, so I could cache data there. Even pull from other browsers with WebRTC, and fallback to server only when needed.

Over time, I eventually made the Redis component a module, so it could be swapped out with other systems. It has always proved to be the fastest, except for an LMDB adapter someone in the community wrote ( https://github.com/notabugio/gun-lmdb ).


You are facing same problems that I’ve experienced and I developed different techniques to get around these network latencies. In our case most of the payloads were large fragments of compiled JSON. So we benchmarked and found LZ4 to be lightweight compression cutting down on network payload and cache size at same time. Shameless plug https://youtu.be/QkUz2_kRV9g I discussed them at RedisConf.


I'd like to see the client code, see how it manages backpressure.

(I'm probably being thick; the benchmark code is probably linked and I'm just not seeing it.)

I recently maintained some nodejs & expressjs stuff. Neither the Redis clients (or the original developers of our stuff) have any concept of backpressure (throttling). In our case, HTTP request received would cause a Redis request. For whatever reason, expressjs (or nodejs) event loop thingie processes Request preempt other stuff. So Redis responses would pile up in Redis itself, causing Redis to ABEND (out of memory).

Trying to explain backpressure, queueing, throttling to a bunch of junior devs who LOVE LOVE LOVE nodejs... Definitely was not my favorite gig.

--

PS- I started using Redis a few gigs ago. I really wanted to hate it. I'm primarily a Java dev and was esthetically offended by the NoSQL fad. But turns out Redis is awesome. And antirez is now a personal hero. I truly wish I was more like him.


> I'd like to see the client code, see how it manages backpressure.

That should (in theory) be handled on layer 7, not really on the network end (beyond regular TCP flow control), whereas this article is mostly about optimization on the network layer.

If you're fiddling with redis to the point where you're optimizing to use DPDK/OpenOnload/Exablaze/etc you've probably already exhausted the typical optimization paths (redis pipelining, tuning your queries, potentially implementing a stripped down client library implementing RESP, etc).


Thanks. I will try to understand what you're saying.

But first I'll try to explain what I think I was seeing.

Nodejs & expressjs. A couple high use REST API endpoints. Thundering herds. HTTP server can handle say max 200 rps (while maintaining target P99). Redis server can only handle sustained 100 rps.

Where does the throttle go? How is it implemented?

In my prior Java experience, I got throttling "for free" by tuning the thread pools.

With nodejs & expressjs, being single threaded with an event loop, the only solution I figured out was to throttle our app's Redis client.

What I really wanted is end-to-end backpressure. What I've done in the past is a postfix (mail server) inspired queuing system (work piled up, no blocking). What I would have settled for is an expressjs (or equiv) 'frontdoor' that throttled new socket accepts and new HTTP requests. Extra credit if the app level load balancer was aware of this frontdoor, and therefore more responsive.

If there's some app agnostic throttle buildable at layer 7, I definitely want to learn about it.


The benchmark client is memtier: https://github.com/RedisLabs/memtier_benchmark


There’s a memcached client that uses connection pools but it has its own set of problems.

And you have to use nginx plus to get the backpressure support on the ingest side. If my levels of capability and capacity were a little higher I’d be tempted to start writing a reverse proxy in Rust.


Can't you just use an atomic counter for requests/responses per connection, and increment/decrement it when you receive/send a packet, checking that it is positive before you try to send, and being fine if it briefly dips into the negative region because there shouldn't be too many fighting over being able to queue request? And even if, you should be able to find instructions or so that do a bounded-decrement that does not go below zero, and still ensures that no count are lost.


I think maybe I've seen a similar issue and didn't know it? What's the right answer for that situation, node/express hitting redis which runs out of memory?


Sorry for delay. Had to refresh my memory, dig up that old code.

I used the 'queue' (submodule?) from caolan's terrific 'async' module to limit the number of Redis requests in flight.

https://caolan.github.io/async/v3/docs.html#queue

https://www.npmjs.com/package/async

As said above, this prevents the Redis responses from piling up on the Redis server, waiting to be sent.

To troubleshoot: If the Redis CLIENT LIST's output length list, output memory, or both, are growing without ever shrinking, then I'd bet one delicious apple fritter your app's Redis client isn't processing responses fast enough. Check out the 'obl', 'oll', 'omem' stats.

https://redis.io/commands/client-list

--

Fronting Redis calls with async.queue makes the client code gnarly. So I banged out a (very minimal) client with an internal queue.

I had been weighing releasing it. And then if there's any interest, then maybe flesh out the client. There's some other notions I had wanted to try out.

I'm NOT a nodejs developer. So I hadn't fully worked out how to do proper I/O, with proper timeouts, retry, backoff. So for instance, I'd want the async.queue concurrency limit to be adaptive.


Thanks so much!


We've been running into the issue where our redis instance(s) randomly dies. I haven't been able to pin point the problem (nodejs+redis). Would love to hear your thoughts on some gotchas to look out for.


Not OP, but I wonder what the collectd plugin for Redis would tell you.


Please see (sibling?) reply to jessaustin. Thanks.


Well, to be honest if you are IO/network bound and kernel TCP stack is the bottleneck then user space networking like DPDK can help in every application. It depends on application but sometimes additional complexity of introducing DPDK is just not worth it, and just spinning another instance/server is a better choice. Look also at Seastar used in ScyllaDB with and without DPDK numbers.

Just remember you need to give DPDK one whole NIC and it's using polling so 100% CPU usage on polling cores.


DPDK does poll but you can have it sleep in low traffic situations. At the expense of a little extra latency.


I always figured the right solution here is a queue that sends any time the request queue is over N elements or every M microseconds, whichever happens first. Haven’t seen many implementations though.

Nagle’s algorithm for the oldest, maybe a couple others, possibly some code I’ve written.


DPDK has had ways to not use 100% of the CPU for polling for many years.

https://doc.dpdk.org/guides-16.04/sample_app_ug/l3_forward_p...


I wonder how the performance compares with pedis [1], which is based on Seastar/ScyllaDB.

https://github.com/fastio/pedis


These numbers seem off. 4ms to store something in memory ? Even if it include network latency it should be able to improve further.


Most of the bottleneck is in the TCP stack. That's why user mode networking like this module does helps a lot.


TCP latency is on the order of dozens of us, not ms.

Valid question I'd say.


TCP latency increases when you have high load and buffers in the way. Over 50% of CPU time is spent in the TCP stack for Redis, with the bulk of the rest in query parsing.


How is this different from what Solarflare (https://twitter.com/Solarflare_Comm/status/11134717313798430...) is doing with Cloud Onload? From what I understand they don't require any application changes and they work on any networking application.


OpenOnload transparently replaces the UDP/TCP network stack and epoll calls of an applications with highly tunable userspace components. It can work with any application that uses these system calls. Of course, that application could be Redis. If you search around, I've commented on using them together.

This is a Redis module, so will only work with Redis. Although I don't see its implementation (?), it appears to connect Redis' Unix Socket interface to a network stack running on DPDK (a user-space low-level network interface).

In SolarFlare world, ef_vi is their library to DPDK -- it is a packet buffer interface. They then have OpenOnload (transparent acceleration that cooperates well with the kernel) and TCPDirect, a proprietary userspace TCP/UDP library that has its own interface. It's even higher performance than OpenOnload because it doesn't have to coordinate with the kernel and you manage the sharing of the network resources.

SolarFlare has a DPDK driver too. OpenOnload doesn't accelerate UNIX sockets.

One thing this module doesn't accelerate is epoll... I think a properly tuned SolarFlare solution would be higher performing -- especially on the same machine with TCP-loopback acceleration. But you don't know until you try it...

edit: added note about epoll



DPDK and Solarflare Onload are conceptually similar since they're both OS bypass network stacks. DPDK is open source and portable but lower-level and more complex to use while Onload looks much easier to use but proprietary-ish.



Nice! I am also working on developing an extension for Redis and tried creating FlameGraphs as well but am not able to get them to work properly. Could you please share the commands you executed for the Flame Graphs? Would be greatly appreciated!

Already tried using '-fno-omit-frame-pointer' and '-O0'

# $CMD is a command starting a redis-server and creating traffic

perf record --freq=10000 --all-cpus -g -- $CMD

perf script --input=perf.data | ./stackcollapse-perf.pl > out.perf-folded

./flamegraph.pl out.perf-folded > perf-redis.svg


I can recommend hotspot from KDAB. Use

  perf record --call-graph dwarf,32768 -f 999 -- $CMD
if the following does not work or you work on something older than Haswell:

  perf record --call-graph lbr -f 999 -- $CMD
Be careful with the frequency. Use cycles:up as the event (with -e) for general cpu time, and other stuff like LLC-load-misses cycle_activity.stalls_l3_miss as an example on a Kaby Lake system. Use

  perf list
to search for the right event name. On the Broadwell i5/dualcore+HT Laptop I see cycle_activity.stalls_l2_miss as the equivalent, due to it apparently not having an L3 cache. cycle_activity.stalls_mem_any highlights code where the CPU is doing nothing while waiting on memory.

For de-inlining I found simpleperf from the android-ndk to be the only tool not wastefully spawning one addr2line for each-single-address. Yes, that takes ages to process. Yes, I gave up and used simpleperf, which caches this. And yes, I considered patching perf-tools to use the pipe-based interface to addr2line.

Hotspot unfortunately appears unable to distinguish time spent in a function between the different inline stacks inside said function, so I had to forego heavy link-time-optimization that got 5-10% without much else, because there was no meaningful insight left into what part spend how long computing.

And please, please refrain from -O0 when you want performance. Either to use the performance or to measure it. Instead add -g or -ggdb in there, to force dwarf debug info to get line info and stack frame unwinding, the latter without relying on the frame pointer. Though, for the unwinding itself, lbr does a great job. Just keep in mind that it's max depth is limited by the CPU generation, and can't by bypassed/increased.


Hackernews hug of death already?


Maybe they should use this module that I can't read about to power their site. :P


Makes me feel a little better about my libµhttp/Objective-C/Objective-Smalltalk server (serving http://objective.st), which held up just fine to the HN hug of death, running on the smallest digital ocean droplet. :-)



Those $5 droplets pack a surprising punch.


I assume the downtime is due to the HTTP server, not persistence.


That would be unusual. Most websites fall over due to a DB bottleneck long before the backends or HTTP servers see significant load.


The bulk of the CPU time is currently spent in node.js, running docusaurus.


Yep. Almost always the web server will not be able to connect to the db because of the limited connections + slowed db.

---

FWIW, the site is up for me.


The AWS instance was sized too low :) touche on the not fast enough comments.


Testing to see if we get permabanned for using the term "slash-dotted"


Seems that even that doubled Redis performance wasn't enough.


How does it compare to Redis threaded I/O? (https://twitter.com/antirez/status/1110973404226772995?lang=...)

Using 4 threads it is simple to get 2x the performances of single-threaded Redis (even if yet the reading part is not threaded).


Impressive! Would be great if you could share the SVG of the flamegraphs too.



Well the site is down.


We are back up, had to move to a bigger server haha


I'm always curious what kind of setup people run that can not handle a thousand people connecting into HTTP.


I did a bit of sleuthing in the dev console.

The site is a static HTML site generated by Docusaurus [0]. That seems like it should be very fast and lightweight for serving tons of concurrent requests.

In the page's response header, it shows that the server is NGINX on Ubuntu. Ahhh, and "X-Powered-By: Express". I suspect that's the bottleneck, serving static assets via Node.js+Express, instead of directly with NGINX.

I also see in the document foot that it tries to load a script from http://localhost:35729/livereload.js. That could be a sign that the site is a development build, not optimized for production..?

[0] https://docusaurus.io/en/


Guessing it's a CMS without a configured deep-caching server in front of it, like Wordpress, That does some DB request(s) and a few addon modules on each article request.

That's not to say its' bad, but is probably seeing higher than normal traffic. I've been interested in static content generation again.


I had a couple of posts go first page on HN in the last months. I run my blog on Ghost (the hosted SaaS version) and it's crazy fast. I have no idea how they do it, alas.


Go up again!


To maximize reach, have you explored what would take for a first-class Redis wasm [1] implementation, maybe pairing it with Terra [2] for Lua scripting.

[1] http://webassembly.github.io/spec/core/exec/index.html

[2] https://github.com/zdevito/terra


"Redis is known as one of the fastest databases out there."

Redis is not a database. Let's begin with that.


Sure it is; it's not a traditional RDBMS, but a key-value store is by broad definition a database.


Depends on how you define database. Wikipedia starts with...

”A database is an organized collection of data, generally stored and accessed electronically from a computer system. Where databases are more complex they are often developed using formal design and modeling techniques." [1]

...which makes Redis sound like a database to me. If I am not technically correct, feel to educate me.

Side note: I try to focus on what problems we're trying to solve, because it's hard enough to get people to sync up on that. I've found focusing on being right is inversely proportional to the health of my relationships.

[1] https://en.wikipedia.org/wiki/Database


Your filesystem then is a database. You can even stretch it to include your text editor. Redis is not a database.


Of course the filesystem is a database. In fact, bank mainframes still use distributed file systems as “databases” where file writes are auditable and fully reversible transactions. (I worked on such systems.)


I think there is much to be learned by comparing databases and filesystems.

One thing I think that filesystems can learn from databases is the notion of a compound primary key. It would be neat if app-files were identified by an (app, type, id) tuple. This would bring the advantages of both the posix and the windows filesystem layouts.

For instance if we had(app=firefox, type=/usr/bin, id=main). Then we could easily find all firefox files by querying by app. Or we could easily find all binaries in PATH by querying by type.

ps. I think this would work better than the overly general tag-based filesystem people sometimes propose.


Well, that's basically what directories are.


The posix filesystem layout doesn't provide an easy way of enumerate all files belonging to a given program. The traditional windows (98?) layout of one app in one folder conversely doesn't easily allow enumerating all binaries. Or all manual entries. Etc


Ok, so let's say you arrange your folders to acommodate for that. you have /files/$user/$program/$file. That's basically what a primary key in a database looks like. If you want a secondary index, what databases often do is just create a second table with a different primary key, with the value being primary keys of the main table.

We can model that in a filesystem as well, of course. So if I want one filtering by file type and one filtering by month of creation, then I can create /files/$program/$file and then ln -s /files/$program/$file /files/month/$month/$file


You could argue this endlessly from different perspectives, but it's not at all relevant to what the article actually discusses. They're comparing redis against itself, not other examples of things that might or might not be databases.


In a few years, you may look back and wonder, incredulously, that you ever felt a filesystem might not be a kind of database. Look into filesystems long enough, and you may start to view an RDBMS as a specialized kind of filesystem.


If you use it as such, of course it is. Filesystem DBs been around forever. What point are you trying to make?


> Your filesystem then is a database.

Absolutely


I very much beg to differ. Just because a database maps to memory and doesn’t commit to disk per transaction doesn’t make it less of a database.

So if not a database, what is it?


Also, redis does commit to disk per transaction if you want it to. IMO it should start doing it by default.


By which definition? Any links?


Why isn't it?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: