> The problem with standard'ish SQL+RDBMS and their schemas is that types are not enforced well in SQL and most libraries and language integrations, thus we end up with the quagmire of static schemas that can not be reliably type checked when you use standard tooling.
This used to be an issue with MySQL (and not really any other major RDBMS implementation), but with STRICT_MODE and the default settings on other implementations, SQL is fantastic at representing and enforcing static types. Furthermore, ORMs and tools like Apache Spark have really upped the integration between types in the database and types in languages. Practically speaking, RDBMS is really the only way you can have sane static typing in most applications (esp. ones that are built on dynamic languages).
> Also lacking in many DB systems is integrated support for tables withh heterogenous schemas that is supported by page/row-level cersioning and/or on line schema alteration. Having to rewrite a huge table only because you want to narrow a couple field for future data, it gets old quickly.
This is just not true. All of the major commercial RDBMS systems support both online schema alteration AND flexible/semi-structured types (JSON, XML). Again, MySQL and Postgres are behind the curve on this (although they both now support JSON). Commercial systems like SQL Server, Oracle, and MemSQL all have both.
I'm not sure if I read your comment properly. But you said SQL Server has both? Both JSON and XML?
Yes SQL Server has json, but it's not real json support. It's an ntext column with some json functions... support for XML is great but the sql Server team refuses to support json properly because "we invested in xml and no one uses it"
PostgreSQL on the other hand has a real json type for a Long time with query and index support. As well as xml. In this regard, json support. PostgreSQL is actually ahead of all rdbms...
PostgreSQL may be behind in some other areas but not here. PostgreSQL has some of the best support for online schema changes (transactional DDL for almost everything, lots of work put into reducing lock level of schema changes) and was the first to implement support for JSON.
When doing the rewrite, I was extremely surprised by how good PostgreSQL's JSON support is; there are a large number of functions for querying nested structures, and the indexing is very powerful. I mapped my data from RethinkDB documents to mostly relational data, but kept a couple of columns as JSONB for now, which saved time and massively simplified the rewrite. I remember trying to fully model similar data back in 2007, and it got overly complicated for no real benefit...
Please try to alter a schema of a live multibillion rows table before pretending that it works. Unless they changed how is implemented recently then such an operation can take hours.
I love postgres for many reasons so i would like to be proven wrong.
Some schema changes can take prohibitively long on large tables, so you do have to be careful. But there are ways of achieving the same results that are safe for production database. Braintree did a decent writeup on it. https://www.braintreepayments.com/blog/safe-operations-for-h...
I have never worked with a database of that size but I have done major refactoring of tables with about half a billion rows. And while it can take hours for the schema changes to finish it is hours without any downtime (or just a couple of seconds of downtime during when the exclusive locks need to be acquired). It is tedious to have to wait that long (especially if one has 10 times the rows than what we did), but PostgreSQL can do almost any change online just fine.
What kind of schema changes do you think of that would require that much downtime? Even in complex cases that can be avoided with triggers or even rules.
Depends on the schema change. If you have a multibillion row table there are plenty of other things you need to pay close attention to as well, as you're probably all-too aware.
The bigger the database the more complex it will be to implement schema change while keeping track of transactions that happen while your are transitioning. However I see no reason why it shouldn't be feasible given proper amount of preparation. And if short downtime is a goal I don't see why a proper amount of time developing a solution wouldn't make it either.
And on top of that PostgreSQL community is implementing new way to solve problem with each release. I'm not in a huge live-data use case so I might be wrong, but upcoming logical replication look promising for such use cases https://www.depesz.com/2017/02/07/waiting-for-postgresql-10-....
> online schema alteration AND flexible/semi-structured types (JSON, XML).
> Again, MySQL and Postgres are behind the curve on this
You may want to remove PostgreSQL from that list, especially in 2017.
PostgreSQL has supported XML for years (long before it supported JSON). And online schema changes are not only possible, but even transaction safe, which is especially nice for migrations.
This used to be an issue with MySQL (and not really any other major RDBMS implementation), but with STRICT_MODE and the default settings on other implementations, SQL is fantastic at representing and enforcing static types. Furthermore, ORMs and tools like Apache Spark have really upped the integration between types in the database and types in languages. Practically speaking, RDBMS is really the only way you can have sane static typing in most applications (esp. ones that are built on dynamic languages).
> Also lacking in many DB systems is integrated support for tables withh heterogenous schemas that is supported by page/row-level cersioning and/or on line schema alteration. Having to rewrite a huge table only because you want to narrow a couple field for future data, it gets old quickly.
This is just not true. All of the major commercial RDBMS systems support both online schema alteration AND flexible/semi-structured types (JSON, XML). Again, MySQL and Postgres are behind the curve on this (although they both now support JSON). Commercial systems like SQL Server, Oracle, and MemSQL all have both.