Do you guarantee referential integrity in TAO? Last I heard the answer is no; complex queries are not supported and clients would have to do their own work with internal versions to achieve a consistent view on the whole graph (if such a consistent view even exists). But it seems to work fine since global referential integrity doesn't seem to be a big deal; there aren't a lot of cases where dependency chains are long enough or actions quick enough that it matters (e.g. a group admin adds a new user, makes them an admin, who adds an admin, and then everyone tries to remove the admin who added them [always locally appearing as if there is >=1 remaining admin], but causes the group to ultimately have no admins when transactions settle). Run into any fun issues like that?
Contrasting that with Spanner where cache coherency is solved by reading from the recent past at consistent snapshots or by issuing global transactions that must complete without conflict within a brief truetime window. I am guessing the cost of Spanner underneath the social graph would be a bit too much for whatever benefits might be gained, but curious if anyone looked into using something similar.
For the fun issues you described, read-modify-write (either using optimistic concurrency control or pessimistic) can work, if I understand your question correctly.
Spanner is awesome and one of my favorite systems/papers. I think it would be very computational and power expensive to run the social graph workload on a spanner-like system. Do you have an estimate of how many megawatts (if not more) are needed to support one quadrillion queries a day on Spanner?
I wish I had more solid data. Cloud Spanner claims about 10K read QPS and 2K write on a "node" which costs $1/hour. The Spanner paper reports about 10K read QPS per core. As I understand the cloud deployment it's 3 replicas at that price, so $0.33/hour should buy me about 8 cores and so I'm not sure what the disparity is (maybe markup? CloudSQL is 2X the cost of compute), but I'll go with 10K QPS/8 cores at the low end. Anyway, something like 500-1000W for ~100 cores so something between 100 QPS/W and 10 QPS/W using the 1000W and high and low performance estimates, or 10^15 / 86400 = 11.6e12 QPS for somewhere between 116MW and a a little over a GW. Sounds comparable(?) to MySQL+TAO on the low end to ridiculously expensive at the high end.
EDIT: I have no clue how efficient MySQL+TAO really are but figure that at least tens of thousands of machines go into it.
It that's on x86_64 than the power budget seems some ten times too low for me. A typical Xeon Silver 4208 is 10 W TDP per core, and other things would also add up, even taking into account that newer CPUs are more efficient.
I am basing this off of Epyc Milan which looks closer to 5W/core, and possibly mutilating my language a bit because I think cloud products are sold by thread (vCPU), not by physical core.
I was actually a little surprised how hard it is to get performance/W figures since I don't currently have any physical server hardware to go off of.
Since I observed data inconsistencies on Facebook multiple times, I believe they aren't as concerned about data integrity as a company doing financial transactions would be.
Contrasting that with Spanner where cache coherency is solved by reading from the recent past at consistent snapshots or by issuing global transactions that must complete without conflict within a brief truetime window. I am guessing the cost of Spanner underneath the social graph would be a bit too much for whatever benefits might be gained, but curious if anyone looked into using something similar.