It looks like a very cool product/service, but there's something... off... about...

sramsay · on March 5, 2012

I had the same reaction. I think Rich Hickey is a genius, and I'm an absolute Clojure nut. I have every confidence that this is probably a very cool thing.

But judging from that opening description, I gather that Datomic puts the data and analysis in the same application. As a description of what the thing IS, it's about as informative as saying, "This new language allows you to take control of your computer by allowing you to give it coded instructions!" or "Our storage solution allows you to persistently store data!"

Scriptor · on March 5, 2012

I agree with you on the landing page. The introductory paragraph seems rather "fluffy". That combined with the fact that it uses a whitepaper immediately gave me the feeling that it's not really meant as something for regular programmers to check out and hack with. It's surprising, since that's how many Clojure programmers get their start.

On the other hand, it's very new so maybe they'll add more developer-friendly pages soon. Or maybe it's only meant for "enterprise" environments? Time will tell.

nickik · on March 5, 2012

Maybe its so new and unique that its hard or impossible to explain it in a pragraph.

rjn945 · on March 5, 2012

I think this has a lot to do with it. After an hour of reading, watching and thinking, I can't come up with any way to put it into one paragraph.

Here's the shortest what and why I could come up with:

Questioning Assumptions

Many relational databases today operate based on assumptions that were true in the 1970s but are no longer true. Newer solutions such as key-value stores ("NoSQL") make unnecessary compromises in the ability to perform queries or make consistency guarantees. Datomic reconsiders the database in light of current computer set-ups: millions of times larger and faster disks and RAM, and distributed architectures connected over the internet.

Data Model

Instead of using table-based storage with explicit schemas, Datomic uses a simpler model wherein the database is made up of a large collection of "datoms" or facts. Each datom has 4 parts: an entity, an attribute, a value, and a time (denoted by the transaction number that added it the database). Example:

  John, :street, "23 Swift St.", T27

This simple data model has two main benefits. It makes your data less rigid and hence more agile and easier to change. Additionally, it makes it easy to handle data in non-traditional structures, such as hierarchies, sets or sparse tables. It also enables Datomic's time model...

Time

Like Clojure, Datomic incorporates an explicit model of time. All data is associated with a time and new data does not replace old data, but is added to it. Returning to our previous example, if John later changes his address, a new datom would be added to the database, e.g.

  John, :street, "17 Maple St.", T43

This mirrors the real world where the fact that John has moved does not erase the fact that John once lived on Swift St. This has multiple benefits: the ability to view the database at a point in time other than the present; no data is lost; the immutability of each datom allows for easy and pervasive caching.

Move Data and Data Processing to Peers

Traditionally databases use a client-server model where clients send queries and commands to a central database. This database holds all the data, performs all data processing, and manages the data storage and synchronization. Clients may only to access the data through the interface the server provides - typically SQL strings which may include a (relatively small) set of functions provided by the database.

Datomic breaks this system apart. The only centralized component is data storage. Peers access the data storage through a new distributed component called a transactor. Finally, the most important part, data processing, now happens in the clients, which, considering their importance, have been renamed "peers".

Queries are made in a declarative language called Datalog which is similar to but better than SQL. It's better because it more closely matches the model of the data itself (rather than thinking in terms of the implementation of tables in a database). Additionally, it's not restricted like SQL. It allows you to use your full programming language. You can write reusable rules that can then be composed in queries. Additionally, you can call any of your own functions. This is a big step up in power and it's made practical because of the distribution. If ran your query on central server, you'd have to be concerned about tying up a scare resource with a long-running query. When processing locally, that's not a concern.

When a query is performed that data is loaded from central storage and placed into RAM (if it will fit). Later queries can use this locally cached data for fast queries.

----

That's definitely not all it does or all the benefits, but hopefully that's a good start.

programnature · on March 5, 2012

I would add the following

*Transactions as first-class entities

Transactions are just data like everything else, and can add facts about them like anything else. For example, who created the transaction. What did the database look like before and after transaction.

Additionally, you can subscribe to the queue of transactions, if you wanted to watch for and react to events of a certain nature. This very difficult in most other systems.

anamax · on March 5, 2012

> a time (denoted by the transaction number that added it the database).

Do transaction numbers have total order or just partial order? Total order is serializing. (And no, using real time as the transaction number doesn't help because it's impossible to keep an interesting number of servers time-synched.) Partial order is "interesting".

programnature · on March 5, 2012

It is totally ordered.

The transactor is a single point of failure.

However, since its only job is doing the transactions, the idea is it can be faster than a database server that does both the transactions and the queries.

snprbob86 · on March 6, 2012

Hmm... presumably an application can act in read-only mode in the absence of a transactor. That's an interesting thought :-)

weavejester · on March 6, 2012

The transactor is only used for writes, so if the transactor went down, you could still run queries.

alkby · on March 7, 2012

I think their statement about ACID is too bold.

How does somebody do read-"modify" style of transactions ?

Say I want to bump some counter. So I delete old fact and I establish new fact. But new fact needs to be exactly 1 + old value of counter. With transactions as simple "add this and remove that" you seemingly cannot do that. So it's not ACID. Right?

richhickey · on March 7, 2012

Transactions are not limited to add/retract. There are also things we call data functions, which are arbitrary, user-written, expansion functions that are passed the current value of the db (within the transaction) and any arbitrary args (passed in the call), and that emit a list of adds/retracts and/or other data function calls. This result gets spliced in place of the data function call. This expansion continues until the resulting transaction is only asserts/retracts, then gets applied. With this, increments, CAS and much more are possible.

We are still finalizing the API for installing your own data functions. The :db.fn/retractEntity call in the tutorial is an example of a data function. (retractEntity is built-in).

This call:

    [:db.fn/retractEntity entity-id]

must find all the in- and out-bound attributes relating to that entity-id (and does so via a query) and emit retracts for them. You will be able to write data functions of similar power. Sorry for the confusion, more and better docs are coming.

danieljomphe · on March 7, 2012

From what I remember, compare-and-swap semantics are in place for that kind of case.

If that was not the case, you could still model such an order-dependent update as the fact that the counter has seen one more hit. Let the final query reduce that to the final count, and let the local cache implementation optimize that cost away for all but the first query, and then incrementally optimize the further queries when they are to see an increased count.

That said, I'm pretty sure I've seen the simpler CAS semantics support. (The CAS-successful update, if CAS is really supported, is still implemented as an "upsert", which means old counter values remain accessible if you query the past of the DB.)

danieljomphe · on March 8, 2012

Forget my last paragraph. Anyways, richhickey answered. :)

anamax · on March 6, 2012

> However, since [the transactor's] only job is doing the transactions

Huh? How is that consistent with:

> access the data storage through a new distributed component called a transactor.

If "doing the transactions" consists of more than passing out incrementing transaction tokens, won't the transactor be a bottleneck?

rjn945 · on March 6, 2012

Yeah, it looks like I got that part wrong. (I intentionally skimmed over the transactor, because I was avoiding "how" issues and because my understanding of it wasn't that clear.)

The transactor is involved in just writes, not reads. (So that helps.) It's not distributed and cannot be distributed, in this system, because it ensures consistency, so yes, it is potentially a bottleneck. In blog comments by Rich Hickey[1], he states:

"Writes can’t be distributed, and that is one of the many tradeoffs that preclude the possibility of any universal data solution. The idea is that, by stripping out all the other work normally done by the server (queries, reads, locking, disk sync), many workloads will be supported by this configuration. We don’t target the highest write volumes, as those workloads require different tradeoffs."

Presumably, 1) the creators of Datomic think that performance can be good enough to be useful, 2) this is a new model that probably requires testing to prove is practical.

[1] Multiple people have linked to it, but for convenience: http://blog.fogus.me/2012/03/05/datomic/comment-page-1/#comm...

sausagefeet · on March 6, 2012

Isn't this the same compromise we would already have had to make if we just used postgres?

weavejester · on March 6, 2012

It's actually slightly better than a SQL database. If your master SQL database gets fried, there's a chance you could lose some data. Datomic's transactor only handles atomicity, not writes, so if the transactor dies, nothing written to the database will be lost.

fogus · on March 5, 2012

Datomic is a UFO filled with advanced alien technology that has landed right on the National Mall.

nickik · on March 5, 2012

Probebly an UFO from the Land of Lisp.

alkby · on March 5, 2012

Any access to source? Looks nice but I have a bunch of questions about transactor.

vdm · on March 5, 2012

This 'Architecture' page summarises it best.

http://datomic.com/company/resources/architecture

Zak · on March 5, 2012

The use of the term "whitepaper". It's very "enterprisey"

As a tangent, I'm really curious as to why this document is in a PDF file instead of simply being a web page. I can't see that doing much other than making it less convenient to read.

spullara · on March 5, 2012

If I got one thing out of this article, it is your comment and link to Parse. That looks pretty nice! The landing page for datomic is horrible and I didn't make it past the small, dense text.