I use JSONB columns a lot, it has its place. It can fit certain applications, but it does introduce a lot of extra query complexity and you lose out on some ways to speed up query performance that you could get from a relational approach.
Which is to say JSONB is useful, but I wouldn’t throw the relational baby out with the bath water.
I think if a field in a JSONB column is used frequently in the most common query patterns, it might be time to "surface" it up among other non-JSONB relational columns. There will be some additional overhead e.g. it should be made a read-only column for consistency's sake; if you update the value of this field in the JSONB column, this surfaced column should also be updated.
However, a bit of duplication is not a terrible trade-off for significantly improved query performance.
I’ve been doing some reasonably serious playing with the idea of using jsonb columns as a kind of front end to relational tables. So basically, external interactions with the database are done using JSON, which gives end users some flexibility, but internally we effectively create a realtime materialised view of just those properties we need from the json.
Anyone else tried this approach? Anything I should know about it?
My mental model has always been to only use JSONB for column types where the relations within the json object are of no importance to the DB. An example might be text editor markup. I imagine if you start wanting to query within the json object you should consider a more relational model.
My experience has been that users like to include a lot of information that's not relevant to the application I'm working on.
My application couldn't really care less about customer names, for example. But the people who buy my software naturally do care - and what's worse, each of my potential customers has some legacy system which stores their customer names in a different way. So one problem I want to address is, how do I maintain fidelity with the old system, for example during the data migration, while enabling me to move forward quickly?
My solution has been to keep non-functional data such as customer names in JSON, and extract only the two or three fields that are relevant to my application, and put them into a regular SQL database table.
So far this has given me the best of both worlds: a simple and highly customisable JSON API for these user-facing objects with mutable shapes, but a compact SQL backend for the actual work.
I always thought JSONB is a great use case for client-specific customizations. For example - a CRM application where you have structured DB tables like "account", "opportunity", etc. but then a client-specific table might be "cohort" that is linked to "account" and has fields "cohort size", "cohort name", etc. You just chuck the cohort table in JSONB and now the company who uses the CRM that has the custom cohort table can change that table without creating/migrating your RDMS table structure.
That sounds like something that would be better tailored to well normalized tables and custom row entries. The question I'd ask is if something you're concerned about being a column could instead be a row in a different table.
I do something similar, building a lightweight search index over very large relational datasets.
So the tables are much simpler to manage, much more portable, so I can serve search off scalable hardware without disturbing the underlying source of truth.
The downside is queries are more complex and slower.
I've had extremely disappointing performance with postgres JSONB columns. I've used it a lot for MVPs and betas where it's much easier to use JSONB and lock in your exact relational schema lately.
I've now decided this path is a mistake because the performance is so bad. Even with low thousands/tens of thousands of rows it becomes a huge problem, queries that would take <1ms on relational stuff quickly start taking hundreds of ms.
Optimizing these with hand rolled queries is painful (I do not like the syntax it uses for jsonb querying) and for some doesn't really fix anything much, and indexes often don't help.
It seems that jsonb is just many many order of magnitudes slower, but I could be doing something wrong. Take for example storing a dictionary of string and int (number?) in a jsonb column. Adding the ints up in jsonb takes thousands of times longer rather than having these as string and int in a standard table.
Perhaps I am doing something wrong; and I'd love to know it if I am!
Did you have arrays in your jsonb data? I don't currently use jsonb for production performance sensitive queries, but in the past what I learned was that it's great at automatically indexing everything except arrays. You have to manually declare indexes on your arrays.
When jsonb works, it's incredible. I've had many... suboptimal experiences with mongo, and jsonb is just superior in my experience (although like I said, I haven't used it for performance critical stuff in production). For a long time, it kinda flew under the radar, and still remains an underappreciated feature of Postgres.
An "index on expression" should perform the same regardless of the input column types. All that matters is the output of the expression. Were you just indexing the whole jsonb column or were you indexing a specific expression?
For example, an index on `foo(user_id)` vs `foo(data->'user_id')` should perform the same.
Which is to say JSONB is useful, but I wouldn’t throw the relational baby out with the bath water.