The Great Redis Misapprehension

tabbyjabby · on Nov 27, 2011

Hmm. The author's recommendation to use Redis' 'KEYS' command to pattern match keys for invalidation is a dangerous one. 'KEYS' runs in O(n) time. If you're using Redis in a serious production environment, you do not want want to be using an O(n) command each time you need to invalidate a group of keys. It's better to group related items in a Redis hash table, and that way they are all stored under one common key.

Honestly, I think this article is a more impoverished version of antirez's post on the same topic [1]. antirez, being one of the principal authors of Redis, is a much more authoritative source, and he actually describes all the patterns that this author described in greater detail.

[1] http://antirez.com/post/take-advantage-of-redis-adding-it-to...

LeafStorm · on Nov 28, 2011

And the O(N) is against your entire keyset, and not just the keys you're erasing. Which makes it even worse.

dickeytk · on Nov 28, 2011

Thanks for linking the antirez article, I actually haven't seen that one before I wrote the article. I put a link to it in the article.

As far as they keys thing, it's obviously been a major point of contention with my article. I know the warning, I just don't think it's actually a problem for cache invalidation.

See below on another comment of someone having similar concerns for my explanation why.

dickeytk · on Nov 28, 2011

tabby, I've also updated the article to mention the fact that KEYS may not be a solid use of the system. I would appreciate any feedback!

tabbyjabby · on Nov 28, 2011

Sorry, I didn't mean to come across as a dick. I'm glad that you've taken the feedback of your readers into consideration.

I'm a big fan of Redis, and it's a key component of our stack. Sorted sets are useful for a lot more than just leader boards, though that is a good use case for them. It's a bit late here, so I'm not feeling up to writing a big post, but I'm considering writing my own blog post on my experiences with Redis.

dickeytk · on Nov 29, 2011

I don't think you were at all. Many people mentioned the same thing you did.

I'm new at blogging, so I'm trying to take in everything and produce the best content I can, so I love the feedback!

I would love to see a follow-up article too by the way!

typicalrunt · on Nov 28, 2011

Now, this is where Redis comes in. You can find keys on wildcards! So you can just query it like so:

keys post/83/*

No no no. This is slow and there is a large warning section in the notes (http://redis.io/commands/keys) about using this in production environments.

[Edit: Why is this bad? Think about if you have millions of keys in your environment. KEYS will need to iterate over a million keys to find the ones that match your pattern.]

As an alternative to using KEYS, Redis provides the hash object. Store everything abou an object in a hash (e.g.: "posts:83") and then just delete the hash object. Everything under it will be removed as well. If you need to know what's in the hash that is going to be deleted, use the HKEYS command (which has no such warning about performance).

dickeytk · on Nov 28, 2011

Right, I know what the docs say, however, it's fine to use this for cache invalidation (but maybe not other use-cases) in production.

The docs also say:

"While the time complexity for this operation is O(N), the constant times are fairly low. For example, Redis running on an entry level laptop can scan a 1 million key database in 40 milliseconds."

If you have so many keys that this is an issue for cache invalidation, you should be using memcached anyways. (Since it can be distributed, where in Redis distribution is left up to you to figure out)

I wasn't able to dig it up, but I know I read an article about some consulting group making a site for a major shoe company where they did exactly this.

Your hash method isn't the best way either though since it's more efficient to store everything in individual key-value pairs, and hashes cannot be nested.

Really the BEST way to do this in Redis is to use a set containing all they keys related to an object, then clear each of them out when destroying an item.

dickeytk · on Nov 28, 2011

It's no authoritative source, but I'm obviously not the only one that thinks this: http://www.webdevelopment-blog.com/2011/07/puma-on-redis/

tabbyjabby · on Nov 28, 2011

Why is it more efficient to store everything in individual k-v pairs? Hash tables are certainly more memory efficient, and the difference in CPU efficiency is so slight as to be inconsequential.

dickeytk · on Nov 28, 2011

doh! I got it backwards. You are right that hashes are more efficient: http://redis.io/topics/memory-optimization

However, for this particular problem, the docs do say:

Don't use KEYS in your regular application code. If you're looking for a way to find keys in a subset of your keyspace, consider using sets.

I wonder why they don't suggest hashes?

tabbyjabby · on Nov 28, 2011

Because you might want to use sets to group any type of keys, including hashes.

nostrademons · on Nov 28, 2011

I'm curious why you would use Redis instead of just using in-memory datastructures in your app servers? It's trivially easy to implement a leaderboard as a priority queue, for example. And it eliminates the need to run yet another server and deal with the associated RPC & command parsing overhead.

That's the approach Hacker News and Viaweb took, along with Mailinator and probably several other startups.

patio11 · on Nov 28, 2011

I'm curious why you would use Redis instead of just using in-memory datastructures in your app servers?

I make fairly pedestrian use of Redis, generally as either a persistent cache, shared memory, or schemaless DB shared by multiple Rails processes. In-memory structures have a lot to anti-recommend them in the Rails world: at any given time I have 4 server processes and 2 worker processes running, and each of them would need a separate copy of everything. There would be consistency problems. Those processes have a lifetime measured in days in the best of cases to minutes in the worst of cases: following a restart, any in-memory structures have to be rebuilt from the underlying data source. Hypothetically assuming demand for my products explodes and I can no longer deal with only a single physical server, Redis plays very well with being accessed from multiple servers, whereas I'd have to write some sort of REST API to reimplement Redis (poorly) on top of my actual people-pay-money-for-this application code to share that state among multiple physical servers, if I were to go down that route.

Redis has been an absolute dream to administrate: the total overhead for me was "apt-get install redis-server", adding three lines of configuration to Rails and tweaking two in Redis, and doing one SCP command when I migrated servers. The RPC/command parsing overhead is, empirically, negligible in my use cases. Don't take this advice if you're Google (I know you're Google, but for the general "you" here), but many people are not Google.

nostrademons · on Nov 28, 2011

Ah, I was kinda assuming that there's already a separate appserver tier distinct from the webservers. If you don't have that, I can see how something like Redis might be a useful intermediate step so you don't have to go build one until there's actually a substantive need for it.

It's kinda like a complement to memcached then, right? Memcached gives you an off-the-shelf distributed hashtable that you can stick things in. Redis gives you an off-the-shelf list or heap server that you can stick things in. You might eventually want more control of the algorithms that you can run on these, but if it's not yet worth setting up a separate server, you can glue these components together and get a decent approximation.

patio11 · on Nov 28, 2011

Ah, I was kinda assuming that there's already a separate appserver tier distinct from the webservers.

That's kinda an enterprise-y architecture choice in my experience. There's excellent reasons for it (much like Service Oriented Architecture) but I generally see folks evolve into it over time rather than starting from it, unless they come from an enterprise-y background where its assumed from the beginning. In particular, Rails and some other opinionated frameworks start from the assumption that 99%+ of the business logic is going to get executed in the web tier, and while I'll bet you that some of the more famous Rails deployments eventually move away from that, Rails would fight you every step of the way if you were trying to do it in greenfield development.

Redis makes a great complement (or drop-in replacement depending on use case) for memcached. Relatedly, I love how these (and other OSS tools) let little guys play with big boy solutions without having to have big boy budgets or organizational resources to make use of them. I think Facebook probably has about 10 terabytes more memcached than I do, but it turns out that memcached is really freaking useful way down the scaling/complexity curve, too.

pjscott · on Nov 28, 2011

Redis and memcached aren't really that similar. Memcached is a key-value cache; it will evict items that haven't been used recently, and it has no persistence. Redis actually makes a serious effort to not silently lose data. (Unless you tell it to.)

antoncohen · on Nov 28, 2011

> why you would use Redis instead of just using in-memory datastructures in your app servers?

If you mean actually storing the data (scores on a leaderboard) on the app servers, the problem is it can only scale so far. If data/state is stored on app servers you can't load balance across multiple servers. HN runs on one server, and it has been hitting scalability issues lately. It's also harder to do high-availability, if everything runs on one server there's nothing to fail-over to.

nostrademons · on Nov 28, 2011

Wouldn't this apply to a Redis deployment as well? At the point where the app server would fall over, the Redis instance is probably getting just about saturated as well. (Well, perhaps a bit later because Redis is written in optimized C instead of Python/Java/Scala, but still no more than a constant factor away.) So you'd need multiple Redis instances. How is coherency between them handled? Does the client library automatically take care of synchronizing writes to multiple instances and failing over reads, or do you have to do all that yourself?

brown9-2 · on Nov 28, 2011

This same argument applies to storing your relational data in a server separate from the app server, doesn't it?

nostrademons · on Nov 28, 2011

The difference is that relational data is usually disk-based, I/O bound, and requires persistence but not necessarily fast access. There're a bunch of algorithms that are specialized for the access patterns of disks (who wants to implement their own B-trees and transaction logs, other than Google?), so it makes sense to use an off-the-shelf solution for them.

Redis's main selling point is being an in memory datastore, which is great. But virtually every programming language has a rich selection of in-memory data structures in its standard library, along with the ability to write code and implement some more. What is it that Redis gives you over using these? Programmers are generally quite familiar with efficient algorithms for accessing memory - it tends to be taught in intro CS.

Detrus · on Nov 28, 2011

You're depending on each language's data structure implementation there. Redis aims to be more efficient than those.

_urga · on Nov 28, 2011

Do you know how Hacker News persists changes to its in-memory data structures? Does it snapshot every few seconds or journal every change? Does it keep comments in memory or swap them as needed?

jdunck · on Nov 28, 2011

The idea is that you coordinate concurrent read/write to the same structures, yielding a shared data view without formal coordination. Also, the structures may be large enough that you don't want them all in-memory for each client.

nostrademons · on Nov 28, 2011

I was kinda assuming that the system already has a separate appserver tier from the webserver tier (HN doesn't, but many other real-world apps would). Separating application logic from HTML formatting is usually a good idea, if only because HTML templating tends to be CPU-intensive but memory-light while application logic is often CPU-light but memory-intensive. That's an orthogonal issue, though - you can run as many appserver instances as are necessary for your dataset.

I guess I was wondering why, in your app server, you don't just add a big in-process heap and use the normal language mechanisms to access it wherever you'd return your leaderboard info?

The concurrency issue is interesting - how does Redis handle it? Does it have some sort of STM, or is it all because everything executes in a single thread in Redis? If it's the latter, you'd get that for free in a single-threaded appserver (although you probably don't want a single-threaded appserver).

tabbyjabby · on Nov 28, 2011

Redis is single-threaded. Also, not everyone uses stateful app servers.

alexro · on Nov 28, 2011

For one thing: stability, which brings you sleep at night

stephan83 · on Nov 28, 2011

I use redis a lot to store non critical data. For instance I store signup confirmation tokens in redis. The web app sends a message to RabbitMQ when a user signs up, then a background worker catches that message, creates an activation token, stores it in redis, and sends an email to the user. You can set an expiration time on keys too. It's convenient shared memory.

philjackson · on Nov 28, 2011

"It has some persistence support, but does not appear to be super durable. If you're thinking of using like that though, you're misunderstanding the tool."

This warrants a much more detailed explanation. The author should have spoken about possible durability options, like the append-only file, which, given the right configuration, makes Redis "fully-durable" at the inevitable sacrifice of some speed.

A more informative read would be: http://redis.io/topics/persistence

Detrus · on Nov 28, 2011

so if you've been thinking it's persistent, get that OUT of your head now!

On persistance from antirez http://antirez.com/post/short-term-redis-plans.html

We want also work both in the communication (most users don't understand that Redis with both AOF and RDB enabled is very durable already, and this is the setup we suggest) and the implementation to make sure that Redis AOF can be a very durable solution, as durable as the best SQL databases out there.

antirez · on Nov 28, 2011

This post is not correct, but of course the biggest problems are the claiming that Redis is not persistent, and the use of KEYS command.

pkulak · on Nov 28, 2011

If I was greenfielding a new project tomorrow, I'd use a Redis/MySQL combo. MySQL for all the data in perfect first-normal form, and then Redis for storing difficult joins, caches, queues, etc. It would be a perfect marriage.

dickeytk · on Nov 28, 2011

YES! It's a winning powerhouse.

jbwyme · on Nov 28, 2011

For some reason, people seem to think it's a key-value store, or some persistent database, but that's totally not it at all.

From what I understand, it is actually a key-value store and is basically a superset of memcached. Therefore, if my understanding is true you could use it merely as a key-value store as well and use its other features (native sets, lists, etc, pub-sub, and persistance) as needed.

dickeytk · on Nov 28, 2011

Your understanding is correct, but those 'other features' is right where Redis becomes interesting.

If you just want a better memcache, use membase.

jasonjei · on Nov 28, 2011

I use Redis as a distributed locking mechanism too, especially with its setex feature which can help reduce deadlocks. We have a UUID string as the value for the key, and the only way to release the lock is that the UUID must be passed and matched against the value in Redis.

Mikushi · on Nov 28, 2011

Valid article, mostly. But the point is still valid, and i feel that most people using Redis do not get it, it is a data structure server, it's not that hard to understand...

Use it as such and you won't be disappointed.

dickeytk · on Nov 28, 2011

Yes!! Thank you so much. That's exactly the point I was trying to make.

tkahn6 · on Nov 28, 2011

In order to argue that redis isn't a database you have to first define what a database is and explain why redis doesn't fit that definition. All the author did was assert that redis isn't a database and give 3 use cases.

This is just bad writing, to be blunt.

And for what it's worth antirez wrote an HN clone using redis as the database.

freeformz · on Nov 28, 2011

Hah! I've been saying this for a while. Good to see it on Hacker News!