>Everything is an order of magnitude more efficient using PostgreSQL than it was with RethinkDB.
A large part of the sales pitch of "NoSQL" was that traditional RDBMSs couldn't handle "webscale" loads, whatever that meant.
Yet somehow, we continue to see PostgreSQL beating Mongo, Rethink, and other trendy "NoSQL" upstarts at performance, one of the primary advantages they're supposed to have over it.
Let's be frank. The only reason "NoSQL" exists at all is 20-something hipster programmers being too lazy to learn SQL (let alone relational theory), and ageism--not just against older programmers, but against older technology itself, no matter how fast, powerful, stable, and well-tested it may be.
After all, PostgreSQL is "old," having its roots in the Berkeley Ingress project three decades ago. Clearly, something hacked together by a cadre of OSX-using, JSON-slinging hipster programmers MUST be better, right? Nevermind that "NoSQL" itself is technically even older, with "NoSQL" systems like IBM's IMS dating back the 1960s: https://en.wikipedia.org/wiki/IBM_Information_Management_Sys...
"NoSQL" has it's use case just like traditional DBs do.
The problem is that most "NoSQL" provider try to sell you their DB for everything when a good old SQL DB might be better.
For example, I worked on a IoT project for an R&D company and Riak was definitely a better choice than a SQL database for our usecase : Easily distributed, handle large very large amount of small write and small amount of large read. It was also a necessity because we could generate several gigabyte per minutes and being able to add Riak node to add more storage space and bandwidth dynamically was very useful.
But yeah, SQL DBs are like a good toolbox : They can be used in most situation and will work fine. But "NoSQL" DBs are usually more of a dedicated tool for a specific use : It's better for this specific case but might not do well for other tasks.
Thanks... I came here to say this, and I actually said something similar yesterday and my comment got downvoted (which I couldn't care less), but to me the conclusion is, check your requirements and use the proper tool.
You might have a point about Postgres being faster than whatever NoSQL database, but there's really no need for all the name calling. Especially the "lazy" is very uncalled for in my opinion - there's just too much to learn, and too little time. If you're a 20-something (or whatever age, for that matter) and have to maintain the entire system of a single company, you're simply not going to be able to be a master of every part of the stack, or even know where you're making the wrong choices.
>Especially the "lazy" is very uncalled for in my opinion - there's just too much to learn, and too little time.
How is that a valid justification for reinventing the wheel, and poorly at that? In the end they're having to come around and learn SQL anyway, judging by all the Mongo hate and "why we switched to Postgres" posts I see here, so they gained nothing by avoiding it.
>or even know where you're making the wrong choices.
They could try actually listening to older developers for once and trusting established solutions like PostgreSQL instead of chasing after everything new and shinny that catches their eye.
> How is that a valid justification for reinventing the wheel, and poorly at that?
I'm not talking about the people reinventing the wheel, I'm talking about the people using the reinvented wheels and not realising that it's a reinvented wheel.
> try actually listening to older developers
This is harder than you make it sound. On the internet, it's hard to know who to listen to, and in real life, it can be hard to find older developers working on something relevant to you, and even then their skill levels differ. And then they should even want to bother with holding your hand while you're setting up your first database, making sure you don't botch up the performance by not doing things that come natural for someone who's been working with PostgreSQL for years and years.
If there's too much to learn and little time, you learn the stuff that computer science got right. Some abstractions are so good that you should prioritize learning them above any technology of the day:
- Relational algebra
- The network stack model (OSI or IP; it is irrelevant)
so if there isn't even enough time to learn practical but sub-optimal solutions, a dev should instead absorb books about IT, mathematics and programming theory?
great advice...unless any part of your equation includes receiving a paycheck.
Don't make it sound harder than it is. Each of these is 30-40 hours of work for undergraduate students. You don't have to quit your job and join a monastery to study compilers. It's interesting, challenging and fun for those of us into computing; and it pays back the time spent learning.
So now I have a list of things I should learn according to a random person on the internet, that no one in real-life ever gave me before. How am I to know that this list is the one? Why not every similar list that has ever been given to me?
And how am I going to develop the front-end of our new website after learning everything you named? There'll still be a lot left to learn before I can even hope to have produced something.
> The only reason "NoSQL" exists at all is 20-something hipster programmers being too lazy to learn SQL
For me as a 20-something swe who claims to be somehow proficient in SQL and NoSQL, I find it way more difficult to get the NoSQL datamodel right. With traditional SQL, you can basically do whatever you want with a model that looks however it wants. But if I have a, eg Cassandra model and a new requirement pops up, I cannot just do some joining and satisfy that.
While there is some truth in this, I think you are missing the context.
NoSQL first became popular in 2008/09, when web traffic was exploding. In those days 500GB disks were the largest available and were very expensive in server form. CPUs were less powerful, and RAM was much smaller.
All this meant that many sites really were running into the limits a single server database. The most common solution back then was master slave MySQL replication, which had a whole set of problems of its own. Don't forget Postgres replication was pretty rudimentary, and back then MySQL really had a performance edge if you could compromise things like transactional integrity(!) - which many did.
Things like Hadoop solved the reliability issues with distributed MySQL, MongoDB (and CouchDB) tackled the horrible developer ergonomics and Redis tackled performance.
In that context they all made a lot of sense.
Now of course buying two big servers with a heap of RAM and storage and putting Postgres on them with replication is pretty easy.
But RAID had its own set of problems. Because RAID relies on multiple physical disks (often SAS for performance) there were serious limits on the amount you could get in. This was prior to SSDs being widespread of course.
Here's a review of a typical 1U server from 2008:
The X4150 can handle up to eight 2.5-inch SAS drives (mine has four 10K drives, 72GB each), 64GB of RAM (mine has 16GB), four Gigabit Ethernet interfaces, and three PCIe slots. This puts the X4150 ahead of the mainstream server pack, as the main contenders in this space generally offer a maximum of 32GB of RAM, and between four and six local disks.[1]
Assuming RAID, this had 144GB of storage, and even back then that was problematic. RAM was very small too, so keeping your DB in memory (or even a working set, or even the indexes) was difficult with a single server.
And of course, yes you could go to 2U or bigger. But back then server space was expensive.
Everything is a trade off of course. But my point is that in the context of 2008 hardware looking to do things differently made a lot of sense.
>Now of course buying two big servers with a heap of RAM and storage and putting Postgres on them with replication is pretty easy.
Is it really ? I really have no idea since I haven't played with PG in a couple of years but setting up clustering used to be a PITA with it so if it got better that sounds great. One of the impressive things I saw from RethinkDB is that you can set up sharing and replication with a few steps.
Oh I don't disagree at all btw., I've been doing MS SQL server and frontend professional recently so I haven't been in the loop about OSS DB, glad to hear things improved.
I think the NoSQL popularity and appeal was largely due to the fit with the startup/rapid-iteration mentality. The lack of rigid schema promised faster product iteration with less friction compared to traditional RDBMS with strict schemas and typing.
Oh indeed, most RDBMSs offer great flexibility. They just tend to require a lot of configuration work. My impression of the NoSQL wave was "we don't know and we don't care; just throw it all in there and let search sort it out."
In fact, that mentality originated with Google (and it's entirely a bad way of doing things). Managing, categorizing, classifying, etc. in non-trivial cases may be an effort of relative futility, constantly chasing a moving target.
A simple key/value or document store just lets you start storing before you may actually know what you're storing or what the patterns and structures are (or even what you might want to do with the data later). With RDBMS, your designs can change quite a lot based on these requirements or intended uses.
> Yet somehow, we continue to see PostgreSQL beating Mongo, Rethink, and other trendy "NoSQL" upstarts at performance, one of the primary advantages they're supposed to have over it.
Yeah, pg will beat the pants off of pretty much any NOSQL DB on a single node, but what happens when you need master-master replication of sharded data in each datacenter as well as between datacenters?
Sure, denormalize and enforce data integrity at the application layer, shard the data, use queues for data updates to ensure every data-center gets all the data (eventually), and so on.
Now you're essentially treating the RDBMS as an eventually consistent NOSQL data store, horizontal scaling within a DC is still going to be annoying once you can't scale the shards vertically anymore, interruptions of network connectivity (or just increased latency) between DCs create huge headaches, and the (right) NOSQL DBs will beat the pants off of it for an equivalent hardware or hosting budget.
That said, NOSQL datastores solve problems that your startup will only encounter once you achieve product-market fit and enter a hypergrowth phase. Your MVP does not need an eventually-consistent distributed NOSQL backend, dammit.
Heck, your MVP may not even need PostGres or MySQL either. Just use SQLite, back up your server, and redeploy on pg only when (if) you start getting some traction.
It's fair to point out that young people will lack experience and understanding that older people have but there is such a thing as ageism in the opposite direction -- against younger people -- and your post comes across as such.
I am closer to 30 than 20 but I was closer to 20 than 30 when I was first introduced to NoSQL.
I enjoyed using PostgreSQL prior to NoSQL and I continue to enjoy PostgreSQL.
What got me exited about NoSQL when I first heard about it had nothing to do with SQL being "old". If I thought that "oldness" was a disadvantage -- which I don't, because it's not -- I wouldn't be using an operating system that has it's roots in the 70's and my workflow would not have been command-line-centric.
What got me exited about NoSQL was:
- The prospect of serialisation and deserialisation of data without having to rely on an Object-Relational Mapper. The relational model is great for the right kind of data, but the reason we have ORMs is that we are trying to cram data into a model that doesn't quite fit it. This has worked to varying degrees of success or failure. When NoSQL was introduced, PostgreSQL did not yet have native support for JSON, and having fields hold stringified JSON meant that you wouldn't be able to query that data using SQL, so having stringified JSON in a relational DB is not optimal. PostgreSQL has since gotten native support for JSON but here I am talking about when NoSQL was introduced.
- Map-Reduce. If it's a good fit for Google, I should investigate it to understand how to use it and to see if it had a use for me. Even if it would turn out to not bring any advantages, I would have learned it and I'd know to re-investigate it in the future if I needed something like it in order to scale a project that got a lot of traffic.
- Document-oriented instead of tables makes it possible to model data without having a strict schema.
Certainly someone with more experience might have been able to tell why one or more of these were actually not such a good reason, but MY POINT IS that I had actual reasons for being excited about NoSQL and none of my reasons had anything to do with hipsterism, ageism or anti-oldness to do. This is why I think your post is ageist, because rather than consider that we might have well-thought-out reasons to want to learn about NoSQL, you automatically jumped to the simple, overly broad, and frankly outright offensive, explanation that you did.
A lot of that is true, but to be sure there are use cases for NoSQL--they are simply too often misunderstood or abused, being used when a traditional RDBMS would have worked fine. Elastic/horizontal scaling for certain types of data that are amenable to it (i.e. event logs, feeds, append only type stuff): its going to be hard to keep up with something like Cassandra there, where you can pretty much add servers on the fly to pick up load. In memory stuff for sure, for a distributed volatile in-memory key store. Simple flexibility you get going schema-less: if you have an embedded DB/on-premise application, you can't just get into your customer's environment and perform a migration; having a flexible schema can make patch updates easier.
One important exception about NoSQL's raison d'être. If all you need is a key-value store without any fancy requirements then NoSQL^W pre-relational databases are perfectly good choices.
(Well, *dbm and memcachedb et al sorta pre-date that NoSQL-is-webscale database boom you're writing about.)
Not every web app is webscale. From all the evidence I saw SMC is tiny in webscale terms - 80GB of data or 600 open connections are not what NoSQL databases were meant to solve.
Using this anicdote to say that postgres can handle webscale is ludicrous.
My biggest problem with SQL for systems with web interfaces is that it forces you onto OLTP, when your workflow is just "select data, show to the user" -> "user saves changes, save them in the db". Googling for "asp.net deadlock" gives 160k results.
I probably have not worked enough with ASP.NET but I do not get that reference at all. In my experience deadlocks in databases are rare and do not happen on simple databases.
Most asp.net workshops rely on SQL Server, which up to version from 2012 did not support MVCC and relied on locking on writes and reads. The problem is, it had problems with web-kind load. If something was writing into a table X, and there would be a parallel query for table X join... (especially including index seek and not just lookups), you'd get into territory of heavy locking. Add ORM to that, which wouldn't let you influence the order of the joined tables and you've just got yourself a deadlock. This happened on a few occasions in a few different projects I've worked on. The real problem is: there is lot of technicalities you need to know and understand, to use SQL without problems and in powerful way. Some NoSQL solutions (including MongoDb) try to abstract away those technicalities, and that was the point of my comment.
lol "lets be frank".... have you ever worked on a large scale project youself? how did running that on a singly homed postgres work out for you?
your comment seriously sounds like "I haven't ever seen anything with non-trivial scale yet, so I'm sure everybody working on it must be lazy/stupid/whatever."
and everbody is upvoting them. gotta love hacker news cargo cult.
A large part of the sales pitch of "NoSQL" was that traditional RDBMSs couldn't handle "webscale" loads, whatever that meant.
Yet somehow, we continue to see PostgreSQL beating Mongo, Rethink, and other trendy "NoSQL" upstarts at performance, one of the primary advantages they're supposed to have over it.
Let's be frank. The only reason "NoSQL" exists at all is 20-something hipster programmers being too lazy to learn SQL (let alone relational theory), and ageism--not just against older programmers, but against older technology itself, no matter how fast, powerful, stable, and well-tested it may be.
After all, PostgreSQL is "old," having its roots in the Berkeley Ingress project three decades ago. Clearly, something hacked together by a cadre of OSX-using, JSON-slinging hipster programmers MUST be better, right? Nevermind that "NoSQL" itself is technically even older, with "NoSQL" systems like IBM's IMS dating back the 1960s: https://en.wikipedia.org/wiki/IBM_Information_Management_Sys...