You have old user table with for example: login and user name
In MongoDB this is a JSON object {login:'user', name:'User Name'}
You want to add 'shoe size'. So you add
1 - the shoe size to the signup/user editing form
2 - next user created is like this: {login:'user', name:'User Name', 'shoe_size': 7}
3 - Old users DON'T get a shoe_size added automatically to their document, but next time they login they get asked "What's your shoe size". It's dynamic. if (!user.shoe_size), done.
You add new columns on demand, and you almost never remove them (you don't need to add fields for migrating, except of course those you want to add)
You (almost never) "run the migration" for a while adding shoe_size to existing documents.
And it absolutely does make a huge difference, since you don't have to deal with the mess of relational DB migrations.
I'll never develop a system using a relational DB again if I can avoid.
How is that different than adding a nullable column shoe_size to the user table in your relational DB, and an if(user.shoe_size != null) in your application?
Incremental can make all the difference between zero- and hours of downtime. Do not underestimate the importance of this extra agility in the modern world of almost daily updates to web apps and backends.
It's the same concept, but it's very different in practice. For example, adding a field with a default value in postgres will lock the table during the migration, which may be killer. If you use postgres for big data sets, what you'll end up implementing for your migrations looks a lot like what schemaless gives you for free.
If you add a default value it locks the table and re-writes it.
However, if you don't add a default value postgres will perform the operation instantly. Old rows will be untouched, and new values will be added with the shoe_size of either NULL or whatever you set it to be... IE exactly the same outcome and performance as the MongoDB case mentioned above.
Adding a field in postgres while setting a default value would be exactly the same as adding a field in MongoDB and updating every existing row with that default value (except for the fact that postgres will only lock that one table, while MongoDB will lock your entire database).
Now I'm not anti-MongoDB, I'm just saying you shouldn't give it credit for something that relational database do just fine.
Thank you very much for this quick tip. I've unfortunately only used MySQL at scale, and it most definitely grinds away for hours when ALTERing, default value or not.
It looks like MySQL might be giving everyone in the RDBS world a bad rap.
> Now I'm not anti-MongoDB, I'm just saying you shouldn't give it credit for something that relational database do just fine.
No, it can't. Sorry, it absolutely can't.
> If you add a default value it locks the table and re-writes it.
> However, if you don't add a default value postgres will perform the operation instantly.
And in MongoDB you don't have to change anything in the DB (hence, no downtime). It's all in your code.
Yes, I can add a field in PGSQL, then my ORM quits on me (because I can't have a optional field, of course). Or I can use hand written SQL at which point it adds a couple of months to my schedule.
Or I can use migrations + ORM like Ruby migrations or South (Django). They will blow up for anything slightly more complex.
I also can't use the old version of the DB, so my old backups just became more troublesome to recover.
And that's why Facebook and others don't user relational anymore. Sure, you can use MySQL or PG, but the real data is stored in a text or binary blob.
> I also can't use the old version of the DB, so my old backups just became more troublesome to recover.
Tangential to the topic, but this is important: every schema update MUST be considered part of the app / database / backend / system source code, and as such it should be properly and formally applied, versioned and backed up. In the simplest case, you can store a version number in the database itself, every update applies to a certain version # X and advances to another version # Y, and every update is stored in its own source file (updateX.sql for example).
Would be nice if DBMSs offered primitives specifically for schema version management, but I guess there could be many different use cases, and it's very easy to roll your own.
> And that's why Facebook and others don't user relational anymore. Sure, you can use MySQL or PG, but the real data is stored in a text or binary blob
Is that why? I was under the impression that Facebook doesn't use the relational properties of MySQL because they couldn't get the performance out of it because of their scale, a problem a lot of advocates of the non-relational model dont' seem to have in my experience.
> Adding a field in postgres while setting a default value would be exactly the same as adding a field
Not true. Postgres locks the entire table for the duration of the update (which could be hours or days for a large enough dataset). Mongo will lock your entire databased for tiny fractions of a second, lots and lots of times. The difference is huge. The postgres implementation takes down your site, the Mongo implementation doesn't.
At scale, people start using Postgres a lot like a NoSQL service. Check out the Reddit schema for example, or read IMVU's blog posts on the topic (with MySQL, but same point). Or Facebook's architecture. All those migrations strategies look a lot more like mongo than postgres, even though they all use SQL DBs.
> Now I'm not anti-MongoDB, I'm just saying you shouldn't give it credit for something that relational database do just fine.
Saying "I can do NoSQL migrations just fine in Postgres" is like saying "I can do OO just fine in assembly".
According to the Postgres documentation, the behavior the poster a few levels up defined is possible in Postgres with no downtime.
> When a column is added with ADD COLUMN, all existing rows in the table are initialized with the column's default value (NULL if no DEFAULT clause is specified).
> Adding a column with a non-null default or changing the type of an existing column will require the entire table to be rewritten. This might take a significant amount of time for a large table; and it will temporarily require double the disk space.
As other posts have already mentioned, adding a new column to a table in postgres does so instantly unless you set a default (telling postgres that you want it to update all the old records too).
If you add the default separately from adding the column, you get the default but don't rewrite all the old rows (which remain NULL).
(1) Add column and add code moves the old data when you access it. Deploy.
(2) Let it run for a while. Run a background job that migrates the rest (this might be done days or months later).
(3) Remove the column and the custom code.
The more I hear about "schemaless" the more I realize that it doesn't make any difference at all.