AWS offers a solution which is OpenSearch serverless *collections*. A dedicated ...

chatmasta · on Aug 7, 2024

Any idea how “OpenSearch serverless collections” are implemented? I’m guessing that a “collection” is basically an ElasticSearch index, and “serverless” refers to some method of serializing/loading it on-demand with some cold start tradeoffs?

jillesvangurp · on Aug 7, 2024

Basically opensearch and elasticsearch both offer serverless modes now. They work differently because all this was developed post fork. But they do similar things. The Elastic implementation is in early access mode right now, so it is not released yet. I saw a demo of this at one of their meetups in June. I think Opensearch actually moved a bit faster than Elastic on this after Elastic announced that they were working on this a few years ago.

The aim with both is to not have users worry about cluster sizes or cluster management any more, which from an operational point of view is a huge gain because most companies using this stuff aren't very good at doing that properly and the consequences are poor performance, outages, and even data loss when data volume grows.

Serverless essentially decouples indexing and querying traffic. All the nodes are transient and use their local disk only as a cache. Data at rest lives in S3 which becomes the single source of truth. So, if a new node comes up, it simply loads it's state from there and it doesn't have to coordinate with other nodes. There's no more possibility for a cluster to go red either. If a node goes down, it just gets replaced. Basically this makes use of the notion that lucene index segments are immutable after they are written to. There's a cleanup process running in the background that merges segments to clean things up but basically that just means you get a new file that then needs to be loaded by query nodes. I'm not 100% sure how write nodes coordinate segment creation and management. But I assume it involves using some kind of queueing.

So you get horizontal scalability for both reads and writes and you no longer have to worry about managing cluster state.

The tradeoff is that you have a bit increased latency before query nodes can see your incoming data because it has to hit S3 as part of the indexing process before read nodes can pick up new segments. Think multiple seconds before new data becomes visible. Both solutions are best suited for time series type use cases but they can also support regular search use cases. Any kind of use case where the same documents get updated regularly or where reading your own writes matters, things are not going to be great.

Amazon's implementation of course probably leverages a lot of their cloud stuff. Elastics implementation will eventually be available in their cloud solution on all supported cloud providers. Self hosting this is going to be challenging with either solution. So that's another tradeoff.

chatmasta · on Aug 7, 2024

Thanks for the detailed response and insight. This is a great example of when meetups and in-person networking/collaboration can help you stay ahead of the curve.

It does sound like the solution glosses over some cold start problems that will surface with increasing regularity for more fragmented indexes. For example if you have one index per tenant (imagine GitHub search has one public index and then additionally one index per authenticated user containing only repos they are authorized to read), then each user will experience a cold start on their first authenticated search.

I bet these tradeoffs are not so bad, and in practice, are worth the savings. But I will be curious to follow the developments here and to see the limitations more clearly quantified.

(Also this doesn’t address the writes but I’m sure that’s solvable.)

mulmen · on Aug 7, 2024

> This is a great example of when meetups and in-person networking/collaboration can help you stay ahead of the curve.

Did I miss something? This is a comment on a publicly accessible website. How did you infer these benefits?

chatmasta · on Aug 7, 2024

The context offered in their comment was based on a demo they saw at a meetup:

> I saw a demo of this at one of their meetups in June

urban_winter · on Aug 7, 2024

> For anything more complex than an anonymous online flower shop, it is simpler to partition the whole thing off – for each customer/tenant.

Is this really a viable approach at the scale of B2B SaaS like Salesforce (or, contextually, Algolia)? They would end up with literally 10s of 1000s of DBs. That is surely cost-prohibitive.