How Fast can A Single Instance of Redis be?

dvirsky · on June 18, 2019

IIUC the benchmark did not use pipelines and the numbers show that. Redis itself can outperform this by a factor of 10 with pipelining, and I'm betting the difference from the module won't be that big given such a setup (which is how you should ideally work anyway).

Still, nice to see the modules API being put to interesting use cases.

privateSFacct · on June 18, 2019

Yeah - be interesting to see the real use case here. Pipelining + scripting if needed normally go a long way. I'm curious who needs even more beyond this in terms of performance and if this module actually provides it.

caymanjim · on June 18, 2019

Anecdotally, my experience running Redis servers on AWS (both standalone EC2 instances and ElastiCache dedicated Redis instances) is that network latency is likely to become a barrier before anything else. We struggled with the same performance problems on both small-footprint small-payload Redis DBs and on large ones, and paying for the next tier of network connectivity (between our applications on EC2 and our Redis servers on EC2/ElastiCache) did the most to alleviate delays.

19h · on June 18, 2019

This is very accurate. Seeing variations in GET / SET operations with ElastiCache up to 10000μs, which is way longer than the actual execution speed. After moving complex operations to transactions & even Lua scripts, performance is fairly acceptable again.

I assume the AWS network is pretty noisy.

marknadal · on June 19, 2019

Yeah, this is true in my case as well.

Years ago I built a layer on top of Redis that could also run in the browser, so I could cache data there. Even pull from other browsers with WebRTC, and fallback to server only when needed.

Over time, I eventually made the Redis component a module, so it could be swapped out with other systems. It has always proved to be the fastest, except for an LMDB adapter someone in the community wrote ( https://github.com/notabugio/gun-lmdb ).

maxpert · on June 19, 2019

You are facing same problems that I’ve experienced and I developed different techniques to get around these network latencies. In our case most of the payloads were large fragments of compiled JSON. So we benchmarked and found LZ4 to be lightweight compression cutting down on network payload and cache size at same time. Shameless plug https://youtu.be/QkUz2_kRV9g I discussed them at RedisConf.

specialist · on June 18, 2019

I'd like to see the client code, see how it manages backpressure.

(I'm probably being thick; the benchmark code is probably linked and I'm just not seeing it.)

I recently maintained some nodejs & expressjs stuff. Neither the Redis clients (or the original developers of our stuff) have any concept of backpressure (throttling). In our case, HTTP request received would cause a Redis request. For whatever reason, expressjs (or nodejs) event loop thingie processes Request preempt other stuff. So Redis responses would pile up in Redis itself, causing Redis to ABEND (out of memory).

Trying to explain backpressure, queueing, throttling to a bunch of junior devs who LOVE LOVE LOVE nodejs... Definitely was not my favorite gig.

--

PS- I started using Redis a few gigs ago. I really wanted to hate it. I'm primarily a Java dev and was esthetically offended by the NoSQL fad. But turns out Redis is awesome. And antirez is now a personal hero. I truly wish I was more like him.

whorleater · on June 18, 2019

> I'd like to see the client code, see how it manages backpressure.

That should (in theory) be handled on layer 7, not really on the network end (beyond regular TCP flow control), whereas this article is mostly about optimization on the network layer.

If you're fiddling with redis to the point where you're optimizing to use DPDK/OpenOnload/Exablaze/etc you've probably already exhausted the typical optimization paths (redis pipelining, tuning your queries, potentially implementing a stripped down client library implementing RESP, etc).

specialist · on June 19, 2019

Thanks. I will try to understand what you're saying.

But first I'll try to explain what I think I was seeing.

Nodejs & expressjs. A couple high use REST API endpoints. Thundering herds. HTTP server can handle say max 200 rps (while maintaining target P99). Redis server can only handle sustained 100 rps.

Where does the throttle go? How is it implemented?

In my prior Java experience, I got throttling "for free" by tuning the thread pools.

With nodejs & expressjs, being single threaded with an event loop, the only solution I figured out was to throttle our app's Redis client.

What I really wanted is end-to-end backpressure. What I've done in the past is a postfix (mail server) inspired queuing system (work piled up, no blocking). What I would have settled for is an expressjs (or equiv) 'frontdoor' that throttled new socket accepts and new HTTP requests. Extra credit if the app level load balancer was aware of this frontdoor, and therefore more responsive.

If there's some app agnostic throttle buildable at layer 7, I definitely want to learn about it.

jdsully · on June 18, 2019

The benchmark client is memtier: https://github.com/RedisLabs/memtier_benchmark

hinkley · on June 19, 2019

There’s a memcached client that uses connection pools but it has its own set of problems.

And you have to use nginx plus to get the backpressure support on the ingest side. If my levels of capability and capacity were a little higher I’d be tempted to start writing a reverse proxy in Rust.

namibj · on June 19, 2019

Can't you just use an atomic counter for requests/responses per connection, and increment/decrement it when you receive/send a packet, checking that it is positive before you try to send, and being fine if it briefly dips into the negative region because there shouldn't be too many fighting over being able to queue request? And even if, you should be able to find instructions or so that do a bounded-decrement that does not go below zero, and still ensures that no count are lost.

jessaustin · on June 18, 2019

I think maybe I've seen a similar issue and didn't know it? What's the right answer for that situation, node/express hitting redis which runs out of memory?

specialist · on June 19, 2019

Sorry for delay. Had to refresh my memory, dig up that old code.

I used the 'queue' (submodule?) from caolan's terrific 'async' module to limit the number of Redis requests in flight.

https://caolan.github.io/async/v3/docs.html#queue

https://www.npmjs.com/package/async

As said above, this prevents the Redis responses from piling up on the Redis server, waiting to be sent.

To troubleshoot: If the Redis CLIENT LIST's output length list, output memory, or both, are growing without ever shrinking, then I'd bet one delicious apple fritter your app's Redis client isn't processing responses fast enough. Check out the 'obl', 'oll', 'omem' stats.

https://redis.io/commands/client-list

--

Fronting Redis calls with async.queue makes the client code gnarly. So I banged out a (very minimal) client with an internal queue.

I had been weighing releasing it. And then if there's any interest, then maybe flesh out the client. There's some other notions I had wanted to try out.

I'm NOT a nodejs developer. So I hadn't fully worked out how to do proper I/O, with proper timeouts, retry, backoff. So for instance, I'd want the async.queue concurrency limit to be adaptive.

jessaustin · on June 19, 2019

Thanks so much!

ariosto · on June 18, 2019

We've been running into the issue where our redis instance(s) randomly dies. I haven't been able to pin point the problem (nodejs+redis). Would love to hear your thoughts on some gotchas to look out for.

hinkley · on June 19, 2019

Not OP, but I wonder what the collectd plugin for Redis would tell you.

specialist · on June 19, 2019

Please see (sibling?) reply to jessaustin. Thanks.

lossolo · on June 18, 2019

Well, to be honest if you are IO/network bound and kernel TCP stack is the bottleneck then user space networking like DPDK can help in every application. It depends on application but sometimes additional complexity of introducing DPDK is just not worth it, and just spinning another instance/server is a better choice. Look also at Seastar used in ScyllaDB with and without DPDK numbers.

Just remember you need to give DPDK one whole NIC and it's using polling so 100% CPU usage on polling cores.

jdsully · on June 18, 2019

DPDK does poll but you can have it sleep in low traffic situations. At the expense of a little extra latency.

hinkley · on June 19, 2019

I always figured the right solution here is a queue that sends any time the request queue is over N elements or every M microseconds, whichever happens first. Haven’t seen many implementations though.

Nagle’s algorithm for the oldest, maybe a couple others, possibly some code I’ve written.

shaklee3 · on June 19, 2019

DPDK has had ways to not use 100% of the CPU for polling for many years.

https://doc.dpdk.org/guides-16.04/sample_app_ug/l3_forward_p...

zerd · on June 19, 2019

I wonder how the performance compares with pedis [1], which is based on Seastar/ScyllaDB.

https://github.com/fastio/pedis

z3t4 · on June 18, 2019

These numbers seem off. 4ms to store something in memory ? Even if it include network latency it should be able to improve further.

jdsully · on June 18, 2019

Most of the bottleneck is in the TCP stack. That's why user mode networking like this module does helps a lot.

fh973 · on June 18, 2019

TCP latency is on the order of dozens of us, not ms.

Valid question I'd say.

jdsully · on June 18, 2019

TCP latency increases when you have high load and buffers in the way. Over 50% of CPU time is spent in the TCP stack for Redis, with the bulk of the rest in query parsing.

tinktank · on June 18, 2019

How is this different from what Solarflare (https://twitter.com/Solarflare_Comm/status/11134717313798430...) is doing with Cloud Onload? From what I understand they don't require any application changes and they work on any networking application.

neomantra · on June 18, 2019

OpenOnload transparently replaces the UDP/TCP network stack and epoll calls of an applications with highly tunable userspace components. It can work with any application that uses these system calls. Of course, that application could be Redis. If you search around, I've commented on using them together.

This is a Redis module, so will only work with Redis. Although I don't see its implementation (?), it appears to connect Redis' Unix Socket interface to a network stack running on DPDK (a user-space low-level network interface).

In SolarFlare world, ef_vi is their library to DPDK -- it is a packet buffer interface. They then have OpenOnload (transparent acceleration that cooperates well with the kernel) and TCPDirect, a proprietary userspace TCP/UDP library that has its own interface. It's even higher performance than OpenOnload because it doesn't have to coordinate with the kernel and you manage the sharing of the network resources.

SolarFlare has a DPDK driver too. OpenOnload doesn't accelerate UNIX sockets.

One thing this module doesn't accelerate is epoll... I think a properly tuned SolarFlare solution would be higher performing -- especially on the same machine with TCP-loopback acceleration. But you don't know until you try it...

edit: added note about epoll

jing · on June 19, 2019

Yep. And Mellanox does it too fwiw -

https://community.mellanox.com/s/article/vma-improves-redis-...

wmf · on June 18, 2019

DPDK and Solarflare Onload are conceptually similar since they're both OS bypass network stacks. DPDK is open source and portable but lower-level and more complex to use while Onload looks much easier to use but proprietary-ish.

lousken · on June 18, 2019

https://archive.fo/4WMYg

paprikawuerzung · on June 18, 2019

Nice! I am also working on developing an extension for Redis and tried creating FlameGraphs as well but am not able to get them to work properly. Could you please share the commands you executed for the Flame Graphs? Would be greatly appreciated!

Already tried using '-fno-omit-frame-pointer' and '-O0'

# $CMD is a command starting a redis-server and creating traffic

perf record --freq=10000 --all-cpus -g -- $CMD

perf script --input=perf.data | ./stackcollapse-perf.pl > out.perf-folded

./flamegraph.pl out.perf-folded > perf-redis.svg

namibj · on June 19, 2019

I can recommend hotspot from KDAB. Use

  perf record --call-graph dwarf,32768 -f 999 -- $CMD

if the following does not work or you work on something older than Haswell:

  perf record --call-graph lbr -f 999 -- $CMD

Be careful with the frequency. Use cycles:up as the event (with -e) for general cpu time, and other stuff like LLC-load-misses cycle_activity.stalls_l3_miss as an example on a Kaby Lake system. Use

  perf list

to search for the right event name. On the Broadwell i5/dualcore+HT Laptop I see cycle_activity.stalls_l2_miss as the equivalent, due to it apparently not having an L3 cache. cycle_activity.stalls_mem_any highlights code where the CPU is doing nothing while waiting on memory.

For de-inlining I found simpleperf from the android-ndk to be the only tool not wastefully spawning one addr2line for each-single-address. Yes, that takes ages to process. Yes, I gave up and used simpleperf, which caches this. And yes, I considered patching perf-tools to use the pipe-based interface to addr2line.

Hotspot unfortunately appears unable to distinguish time spent in a function between the different inline stacks inside said function, so I had to forego heavy link-time-optimization that got 5-10% without much else, because there was no meaningful insight left into what part spend how long computing.

And please, please refrain from -O0 when you want performance. Either to use the performance or to measure it. Instead add -g or -ggdb in there, to force dwarf debug info to get line info and stack frame unwinding, the latter without relying on the frame pointer. Though, for the unwinding itself, lbr does a great job. Just keep in mind that it's max depth is limited by the CPU generation, and can't by bypassed/increased.

the_duke · on June 18, 2019

Hackernews hug of death already?

blaisio · on June 18, 2019

Maybe they should use this module that I can't read about to power their site. :P

mpweiher · on June 18, 2019

Makes me feel a little better about my libµhttp/Objective-C/Objective-Smalltalk server (serving http://objective.st), which held up just fine to the HN hug of death, running on the smallest digital ocean droplet. :-)

majewsky · on June 19, 2019

A simple nginx also holds up very well: https://xyrillian.de/thoughts/posts/latency-matters-aftermat...

quickthrower2 · on June 18, 2019

Those $5 droplets pack a surprising punch.

bdcravens · on June 18, 2019

I assume the downtime is due to the HTTP server, not persistence.

falcolas · on June 18, 2019

That would be unusual. Most websites fall over due to a DB bottleneck long before the backends or HTTP servers see significant load.

jdsully · on June 18, 2019

The bulk of the CPU time is currently spent in node.js, running docusaurus.

paulddraper · on June 18, 2019

Yep. Almost always the web server will not be able to connect to the db because of the limited connections + slowed db.

---

FWIW, the site is up for me.

jdsully · on June 18, 2019

The AWS instance was sized too low :) touche on the not fast enough comments.

brodouevencode · on June 19, 2019

Testing to see if we get permabanned for using the term "slash-dotted"

mamon · on June 18, 2019

Seems that even that doubled Redis performance wasn't enough.

ngaut · on June 19, 2019

How does it compare to Redis threaded I/O? (https://twitter.com/antirez/status/1110973404226772995?lang=...)

Using 4 threads it is simple to get 2x the performances of single-threaded Redis (even if yet the reading part is not threaded).

kraftman · on June 18, 2019

Impressive! Would be great if you could share the SVG of the flamegraphs too.

ericblenkarn · on June 18, 2019

No Problem https://download.keydb.dev/Redis-Normal-FlameGraph.svg

https://download.keydb.dev/Redis-plus-Module-FlameGraph.svg

ksec · on June 18, 2019

Well the site is down.

ericblenkarn · on June 18, 2019

We are back up, had to move to a bigger server haha

overcast · on June 18, 2019

I'm always curious what kind of setup people run that can not handle a thousand people connecting into HTTP.

lioeters · on June 18, 2019

I did a bit of sleuthing in the dev console.

The site is a static HTML site generated by Docusaurus [0]. That seems like it should be very fast and lightweight for serving tons of concurrent requests.

In the page's response header, it shows that the server is NGINX on Ubuntu. Ahhh, and "X-Powered-By: Express". I suspect that's the bottleneck, serving static assets via Node.js+Express, instead of directly with NGINX.

I also see in the document foot that it tries to load a script from http://localhost:35729/livereload.js. That could be a sign that the site is a development build, not optimized for production..?

[0] https://docusaurus.io/en/

tracker1 · on June 18, 2019

Guessing it's a CMS without a configured deep-caching server in front of it, like Wordpress, That does some DB request(s) and a few addon modules on each article request.

That's not to say its' bad, but is probably seeing higher than normal traffic. I've been interested in static content generation again.

tnolet · on June 18, 2019

I had a couple of posts go first page on HN in the last months. I run my blog on Ghost (the hosted SaaS version) and it's crazy fast. I have no idea how they do it, alas.

Redsquare · on June 18, 2019

Go up again!

espeed · on June 18, 2019

To maximize reach, have you explored what would take for a first-class Redis wasm [1] implementation, maybe pairing it with Terra [2] for Lua scripting.

[1] http://webassembly.github.io/spec/core/exec/index.html

[2] https://github.com/zdevito/terra

segmondy · on June 18, 2019

"Redis is known as one of the fastest databases out there."

Redis is not a database. Let's begin with that.

nkozyra · on June 18, 2019

Sure it is; it's not a traditional RDBMS, but a key-value store is by broad definition a database.

PopeDotNinja · on June 18, 2019

Depends on how you define database. Wikipedia starts with...

”A database is an organized collection of data, generally stored and accessed electronically from a computer system. Where databases are more complex they are often developed using formal design and modeling techniques." [1]

...which makes Redis sound like a database to me. If I am not technically correct, feel to educate me.

Side note: I try to focus on what problems we're trying to solve, because it's hard enough to get people to sync up on that. I've found focusing on being right is inversely proportional to the health of my relationships.

[1] https://en.wikipedia.org/wiki/Database

segmondy · on June 18, 2019

Your filesystem then is a database. You can even stretch it to include your text editor. Redis is not a database.

19h · on June 18, 2019

Of course the filesystem is a database. In fact, bank mainframes still use distributed file systems as “databases” where file writes are auditable and fully reversible transactions. (I worked on such systems.)

im3w1l · on June 18, 2019

I think there is much to be learned by comparing databases and filesystems.

One thing I think that filesystems can learn from databases is the notion of a compound primary key. It would be neat if app-files were identified by an (app, type, id) tuple. This would bring the advantages of both the posix and the windows filesystem layouts.

For instance if we had(app=firefox, type=/usr/bin, id=main). Then we could easily find all firefox files by querying by app. Or we could easily find all binaries in PATH by querying by type.

ps. I think this would work better than the overly general tag-based filesystem people sometimes propose.

dvirsky · on June 19, 2019

Well, that's basically what directories are.

im3w1l · on June 19, 2019

The posix filesystem layout doesn't provide an easy way of enumerate all files belonging to a given program. The traditional windows (98?) layout of one app in one folder conversely doesn't easily allow enumerating all binaries. Or all manual entries. Etc

dvirsky · on June 20, 2019

Ok, so let's say you arrange your folders to acommodate for that. you have /files/$user/$program/$file. That's basically what a primary key in a database looks like. If you want a secondary index, what databases often do is just create a second table with a different primary key, with the value being primary keys of the main table.

We can model that in a filesystem as well, of course. So if I want one filtering by file type and one filtering by month of creation, then I can create /files/$program/$file and then ln -s /files/$program/$file /files/month/$month/$file

markbnj · on June 18, 2019

You could argue this endlessly from different perspectives, but it's not at all relevant to what the article actually discusses. They're comparing redis against itself, not other examples of things that might or might not be databases.

sk5t · on June 19, 2019

In a few years, you may look back and wonder, incredulously, that you ever felt a filesystem might not be a kind of database. Look into filesystems long enough, and you may start to view an RDBMS as a specialized kind of filesystem.

futureastronaut · on June 18, 2019

If you use it as such, of course it is. Filesystem DBs been around forever. What point are you trying to make?

nkozyra · on June 19, 2019

> Your filesystem then is a database.

Absolutely

19h · on June 18, 2019

I very much beg to differ. Just because a database maps to memory and doesn’t commit to disk per transaction doesn’t make it less of a database.

So if not a database, what is it?

dvirsky · on June 18, 2019

Also, redis does commit to disk per transaction if you want it to. IMO it should start doing it by default.

EugeneOZ · on June 18, 2019

By which definition? Any links?

dvirsky · on June 18, 2019

Why isn't it?