So it's actually 100% the same as you would do with on-the-fly migrations in SQL...

raverbashing · on April 13, 2012

Absolutely not!!

It's more like this

You have old user table with for example: login and user name

In MongoDB this is a JSON object {login:'user', name:'User Name'}

You want to add 'shoe size'. So you add

1 - the shoe size to the signup/user editing form

2 - next user created is like this: {login:'user', name:'User Name', 'shoe_size': 7}

3 - Old users DON'T get a shoe_size added automatically to their document, but next time they login they get asked "What's your shoe size". It's dynamic. if (!user.shoe_size), done.

You add new columns on demand, and you almost never remove them (you don't need to add fields for migrating, except of course those you want to add)

You (almost never) "run the migration" for a while adding shoe_size to existing documents.

And it absolutely does make a huge difference, since you don't have to deal with the mess of relational DB migrations.

I'll never develop a system using a relational DB again if I can avoid.

reddiric · on April 15, 2012

How is that different than adding a nullable column shoe_size to the user table in your relational DB, and an if(user.shoe_size != null) in your application?

bradgessler · on April 13, 2012

That's still a migration, its just incremental.

Jare · on April 14, 2012

Incremental can make all the difference between zero- and hours of downtime. Do not underestimate the importance of this extra agility in the modern world of almost daily updates to web apps and backends.

bradgessler · on April 20, 2012

I don't see where I underestimated that?

pbiggar · on April 13, 2012

It's the same concept, but it's very different in practice. For example, adding a field with a default value in postgres will lock the table during the migration, which may be killer. If you use postgres for big data sets, what you'll end up implementing for your migrations looks a lot like what schemaless gives you for free.

birken · on April 14, 2012

If you add a default value it locks the table and re-writes it.

However, if you don't add a default value postgres will perform the operation instantly. Old rows will be untouched, and new values will be added with the shoe_size of either NULL or whatever you set it to be... IE exactly the same outcome and performance as the MongoDB case mentioned above.

Adding a field in postgres while setting a default value would be exactly the same as adding a field in MongoDB and updating every existing row with that default value (except for the fact that postgres will only lock that one table, while MongoDB will lock your entire database).

Now I'm not anti-MongoDB, I'm just saying you shouldn't give it credit for something that relational database do just fine.

damncabbage · on April 14, 2012

Thank you very much for this quick tip. I've unfortunately only used MySQL at scale, and it most definitely grinds away for hours when ALTERing, default value or not.

It looks like MySQL might be giving everyone in the RDBS world a bad rap.

raverbashing · on April 14, 2012

> Now I'm not anti-MongoDB, I'm just saying you shouldn't give it credit for something that relational database do just fine.

No, it can't. Sorry, it absolutely can't.

> If you add a default value it locks the table and re-writes it. > However, if you don't add a default value postgres will perform the operation instantly.

And in MongoDB you don't have to change anything in the DB (hence, no downtime). It's all in your code.

Yes, I can add a field in PGSQL, then my ORM quits on me (because I can't have a optional field, of course). Or I can use hand written SQL at which point it adds a couple of months to my schedule.

Or I can use migrations + ORM like Ruby migrations or South (Django). They will blow up for anything slightly more complex.

I also can't use the old version of the DB, so my old backups just became more troublesome to recover.

And that's why Facebook and others don't user relational anymore. Sure, you can use MySQL or PG, but the real data is stored in a text or binary blob.

Jare · on April 14, 2012

> I also can't use the old version of the DB, so my old backups just became more troublesome to recover.

Tangential to the topic, but this is important: every schema update MUST be considered part of the app / database / backend / system source code, and as such it should be properly and formally applied, versioned and backed up. In the simplest case, you can store a version number in the database itself, every update applies to a certain version # X and advances to another version # Y, and every update is stored in its own source file (updateX.sql for example).

Would be nice if DBMSs offered primitives specifically for schema version management, but I guess there could be many different use cases, and it's very easy to roll your own.

sausagefeet · on April 14, 2012

> And that's why Facebook and others don't user relational anymore. Sure, you can use MySQL or PG, but the real data is stored in a text or binary blob

Is that why? I was under the impression that Facebook doesn't use the relational properties of MySQL because they couldn't get the performance out of it because of their scale, a problem a lot of advocates of the non-relational model dont' seem to have in my experience.

pbiggar · on April 14, 2012

> Adding a field in postgres while setting a default value would be exactly the same as adding a field

Not true. Postgres locks the entire table for the duration of the update (which could be hours or days for a large enough dataset). Mongo will lock your entire databased for tiny fractions of a second, lots and lots of times. The difference is huge. The postgres implementation takes down your site, the Mongo implementation doesn't.

At scale, people start using Postgres a lot like a NoSQL service. Check out the Reddit schema for example, or read IMVU's blog posts on the topic (with MySQL, but same point). Or Facebook's architecture. All those migrations strategies look a lot more like mongo than postgres, even though they all use SQL DBs.

> Now I'm not anti-MongoDB, I'm just saying you shouldn't give it credit for something that relational database do just fine.

Saying "I can do NoSQL migrations just fine in Postgres" is like saying "I can do OO just fine in assembly".

sausagefeet · on April 14, 2012

According to the Postgres documentation, the behavior the poster a few levels up defined is possible in Postgres with no downtime.

> When a column is added with ADD COLUMN, all existing rows in the table are initialized with the column's default value (NULL if no DEFAULT clause is specified).

> Adding a column with a non-null default or changing the type of an existing column will require the entire table to be rewritten. This might take a significant amount of time for a large table; and it will temporarily require double the disk space.

http://www.postgresql.org/docs/8.3/static/sql-altertable.htm...

zeeg · on April 14, 2012

This is easily solved in Postgres. You dont do alters with default, or non null values. Alters become fast [enough] and bearable, even at scale.

http://justcramer.com/2011/11/10/scaling-schema-changes/

scubaguy · on April 13, 2012

For a large table, adding a new column can be non-trivial, each record is updated when the new column is added.

jeffdavis · on April 14, 2012

If you choose to update all of the old records.

As other posts have already mentioned, adding a new column to a table in postgres does so instantly unless you set a default (telling postgres that you want it to update all the old records too).

If you add the default separately from adding the column, you get the default but don't rewrite all the old rows (which remain NULL).

pbiggar · on April 13, 2012

That process sounds more like "implementing schemaless in SQL" than "implementing schema in NoSQL".