While I don’t have enough knowledge of the wider implications of this, it does impact something I was experimenting with last year.
The FoundationDB rewrite would introduce a size limit on document attachments, there currently isn’t one. Arguably the attachments are a rarely used feature but I found a useful use case for them.
I combined the CRDT Yjs toolkit with CouchDB (PouchDB on the client) to automatically handle sync conflicts. Each couch document was an export of the current state of the Yjs Doc (for indexing and search), all changes done via Yjs. The Yjs doc was then attached to the Couch document as an attachment. When there was a sync conflict the Yjs documents would be merged and re-exported to create a new version. The issue being that the FoundationDB rewrite would limit the size and that makes this architecture more difficult. It’s partly why I ultimately put the project on hold.
(Slight aside, a CouchDB like DB with native support for a CRDT toolkit such as Yjs or Automerge would be awesome, when syncing mirrors you would be able to just exchange the document state vectors - the changes to it - rather than the whole document)
The (low) attachment size limit at Cloudant is about service quality and guiding folks to good uses of the service more than a technical issue.
As others have noted, the solution to storing attachments in FDB, where keys and values have an enforced maximum length, is to split the attachments over multiple key/values, which is exactly what the CouchDB-FDB code currently does.
The other limit in FDB is the five second transaction duration, which is a more fundamental constraint on how large attachments can be, as we are keen to complete a document update in a single transaction. The S3 approach of uploading multiple parts of a file and then joining them together in another request would also work for CouchDB-FDB. While it _could_ be done, there's no interest in the CouchDB project to support it.
Exactly, almost all the time you would be better to save the attachment to an object store. However I think I found that small edge case where the attachment system was perfect. It was essential to save the binary Yjs doc with the couch document, it needed to be synced to clients with the main document. Saving it to an object store is not viable due to the overhead during syncing.
yup. the purpose of couchdb's original attachment support was for "couchapps". The notion that you'd serve your entire application from couchdb. Attachments were therefore for html, javascript, image, font assets, which are all relatively small. The attachment support in CouchDB <= 3.x is a bit more capable than that due to its implementation, but storing large binaries was not strictly a goal of the project.
> a CouchDB like DB with native support for a CRDT toolkit such as Yjs or Automerge would be awesome, when syncing mirrors you would be able to just exchange the document state vectors - the changes to it - rather than the whole document
SyncedStore is a brilliant Yjs reactive store for SPAs, it’s not a database. It like an automatically distributed real-time reactive redux/vuex for collaborative apps.
The y-IndexedDB you linked to is actually part of the Yjs toolkit, it is a way of persisting a Yjs document in the browser for offline editing in a PWA. It doesn’t provide a way to sync a whole (or subset of a) database like Couch/PouchDB does. It’s a very important part of the Yjs toolkit but doesn’t do what I’m describing (it’s just a key value store).
If all of your data is in yjs, then you can use syncedstore to efficiently send updates between clients and y-IndexedDB to store the data when the app is off. Its not exactly what was asked for, no, but depending on the application that setup can be substituted for a central database.
Why don't you open source your work? Can you contact me otherwise, maybe I can take over this work on couchdb; we have to do it anyway and we would open source it.
I don't see why there would be a fundamental reason why there would be an attachment size limit. I guess it would just need to be implemented by breaking the attachment into multiple keys? There may be some overhead but it seems that this is valuable because it allows large attachments to be split across servers as required.
When you chunk it you have problems about what happens if that process is interrupted. So it's not trivial (though solvable) but it's the kind of atomics you want the new engine to do.
I think the person you're replying to is saying that the document should be split across keys inside the implementation, i.e. split across the fdb keyspace, not split by the user at the application level. Which is the approach you mostly always have to use for 'large' values; FoundationDB has size limitations on the k/v pairs it can accept and splitting documents and writing those chunks in small transactional batches is the recommended workaround (along with some other 'switch over' transactional write which makes the complete document visible all at once.)
Reminds me, when a team I worked in, had to migrate from one database to another (we were the only team left using that one, and no one was supporting it internally), but the new one had 22MB (or was it 44mb) limit on the total transaction size, while previous one did not have (AFAIR). Someone worked on splitting into several transactions (the bulk was really due to long recorded conversation "forum" like messages related to specific data), but overall it changed how things worked and had some issues initially... Who would've thought you would need that, years from the day it was originally designed...
But is it a small size limit that affects realistic usage? Don't you have performance issues if you use a CRDT implemented in JavaScript and running in the browser with large files?
The FoundationDB rewrite would introduce a size limit on document attachments, there currently isn’t one. Arguably the attachments are a rarely used feature but I found a useful use case for them.
I combined the CRDT Yjs toolkit with CouchDB (PouchDB on the client) to automatically handle sync conflicts. Each couch document was an export of the current state of the Yjs Doc (for indexing and search), all changes done via Yjs. The Yjs doc was then attached to the Couch document as an attachment. When there was a sync conflict the Yjs documents would be merged and re-exported to create a new version. The issue being that the FoundationDB rewrite would limit the size and that makes this architecture more difficult. It’s partly why I ultimately put the project on hold.
(Slight aside, a CouchDB like DB with native support for a CRDT toolkit such as Yjs or Automerge would be awesome, when syncing mirrors you would be able to just exchange the document state vectors - the changes to it - rather than the whole document)