While I don’t have enough knowledge of the wider implications of this, it does i...

robertnewson · on March 12, 2022

The (low) attachment size limit at Cloudant is about service quality and guiding folks to good uses of the service more than a technical issue.

As others have noted, the solution to storing attachments in FDB, where keys and values have an enforced maximum length, is to split the attachments over multiple key/values, which is exactly what the CouchDB-FDB code currently does.

The other limit in FDB is the five second transaction duration, which is a more fundamental constraint on how large attachments can be, as we are keen to complete a document update in a single transaction. The S3 approach of uploading multiple parts of a file and then joining them together in another request would also work for CouchDB-FDB. While it _could_ be done, there's no interest in the CouchDB project to support it.

samwillis · on March 12, 2022

Exactly, almost all the time you would be better to save the attachment to an object store. However I think I found that small edge case where the attachment system was perfect. It was essential to save the binary Yjs doc with the couch document, it needed to be synced to clients with the main document. Saving it to an object store is not viable due to the overhead during syncing.

robertnewson · on March 12, 2022

yup. the purpose of couchdb's original attachment support was for "couchapps". The notion that you'd serve your entire application from couchdb. Attachments were therefore for html, javascript, image, font assets, which are all relatively small. The attachment support in CouchDB <= 3.x is a bit more capable than that due to its implementation, but storing large binaries was not strictly a goal of the project.

didericis · on March 13, 2022

> a CouchDB like DB with native support for a CRDT toolkit such as Yjs or Automerge would be awesome, when syncing mirrors you would be able to just exchange the document state vectors - the changes to it - rather than the whole document

syncedstore -> https://syncedstore.org/docs/sync-providers#y-indexeddb-for-...

samwillis · on March 13, 2022

SyncedStore is a brilliant Yjs reactive store for SPAs, it’s not a database. It like an automatically distributed real-time reactive redux/vuex for collaborative apps.

The y-IndexedDB you linked to is actually part of the Yjs toolkit, it is a way of persisting a Yjs document in the browser for offline editing in a PWA. It doesn’t provide a way to sync a whole (or subset of a) database like Couch/PouchDB does. It’s a very important part of the Yjs toolkit but doesn’t do what I’m describing (it’s just a key value store).

didericis · on March 13, 2022

If all of your data is in yjs, then you can use syncedstore to efficiently send updates between clients and y-IndexedDB to store the data when the app is off. Its not exactly what was asked for, no, but depending on the application that setup can be substituted for a central database.

tluyben2 · on March 12, 2022

Why don't you open source your work? Can you contact me otherwise, maybe I can take over this work on couchdb; we have to do it anyway and we would open source it.

janl · on March 14, 2022

the work is all open source on the CouchDB main git repo.

kevincox · on March 12, 2022

I don't see why there would be a fundamental reason why there would be an attachment size limit. I guess it would just need to be implemented by breaking the attachment into multiple keys? There may be some overhead but it seems that this is valuable because it allows large attachments to be split across servers as required.

tlarkworthy · on March 12, 2022

When you chunk it you have problems about what happens if that process is interrupted. So it's not trivial (though solvable) but it's the kind of atomics you want the new engine to do.

aseipp · on March 12, 2022

I think the person you're replying to is saying that the document should be split across keys inside the implementation, i.e. split across the fdb keyspace, not split by the user at the application level. Which is the approach you mostly always have to use for 'large' values; FoundationDB has size limitations on the k/v pairs it can accept and splitting documents and writing those chunks in small transactional batches is the recommended workaround (along with some other 'switch over' transactional write which makes the complete document visible all at once.)

tehbeard · on March 12, 2022

If I remember the fdb docs, there's also a time limit on transactions that further limits the feasible max size.

malkia · on March 12, 2022

Reminds me, when a team I worked in, had to migrate from one database to another (we were the only team left using that one, and no one was supporting it internally), but the new one had 22MB (or was it 44mb) limit on the total transaction size, while previous one did not have (AFAIR). Someone worked on splitting into several transactions (the bulk was really due to long recorded conversation "forum" like messages related to specific data), but overall it changed how things worked and had some issues initially... Who would've thought you would need that, years from the day it was originally designed...

HelloNurse · on March 12, 2022

But is it a small size limit that affects realistic usage? Don't you have performance issues if you use a CRDT implemented in JavaScript and running in the browser with large files?

samwillis · on March 12, 2022

So yes, a particularly large document is not the norm but it can happen.

JavaScript CRDTs can be quite performant, see the Yjs benchmarks: https://github.com/dmonad/crdt-benchmarks