Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Just put transactions in the application. With row level locking, the odds of running into contention problems are low. If you've been careful to always lock tables in the same order, then deadlocks are a non-issue. At that point, having longer running transactions does not matter. What matters is contention for the internal locking mechanisms inside of the database. And limiting how much extraneous stuff you have improves that.

Now it is easy to screw up an application. It is easy to screw up a database design. It is easy to screw up queries and query plans. But all of those are fixable in relatively straightforward ways. And once you do that, you will wind up with database throughput as your bottleneck.

As for NoSQL, the problem is this. Moving to that architecture requires taking an up front hit on transactional complexity, usually requires several times the hardware (data needs to be stored multiple times for hardware failure), puts a lot of stress on your network and latency, and is really easy to screw up. Just using a popular out of the box solution is not enough - see https://aphyr.com/tags/jepsen for a list of real failure modes on stuff that will look perfectly fine in testing.

It is a necessary challenge to accept if you want to go beyond a certain scale. But you should not accept that challenge unless you have good reason to do so.



The thing you're missing is that unless your validation logic is really complicated or you have a really fast network (think Infiniband, which is still not commodity), it will take less time and fewer resources to perform the validation logic (once the locks are taken, which has to happen regardless) than it does to marshall the data over the network to your application for processing, often by orders of magnitude. That is why more or less everyone trying to win benchmarks at transaction processing uses stored procedures.

FWIW, there's lots of concrete evidence of this beyond the fact that benchmarks almost always use stored procedures. If you look at the very best-performing storage systems, like MICA or FaRM, which achieve 100 million tps or more, you'll see that a huge amount of their optimization comes from circumventing the OS networking stack, taking advantage of built-in queueing mechanisms in NICs, and offloading work to avoid taking up cache lines and cores from the processing CPUs. Many extremely recent database designs also take this approach (separating the transaction processors from the executors) for similar reasons, like Bohm, Deuteronomy, and the SAP HANA Scale-out Extension, as well as deterministic database systems like Calvin. Given that these are literally the best-performing database systems out there, I have a hard time accepting that reducing network latency and doing as much processing work as possible in the database isn't the correct way to achieve high performance.

Using stored procedures also allows for (in theory) analysis of the allowed transactions and conflicts between them, which can boost performance further (e.g., if you can guarantee that they can't conflict with each other ahead of time, or if you can reuse cached data because nothing could have changed, or you might be able to order data in epochs and guarantee that there are no deadlocks, or you might be able to periodically check to see if a record was particularly highly contended and rebalance or split, if commutativity is possible--all of these options and more have been explored in recent database research). It's often difficult for an application to guarantee this because several web servers may be contacting the database at once, and they don't all know what the other web servers are doing, and this is doubly true if you allow ad-hoc queries. Not knowing whether a transaction will finish quickly (as is the case if an application is allowed to hold locks) also greatly increases chances of contention and/or disconnection, which can lead to further issues.

To be doubly clear--I think stored procedures are a massive pain in the ass and there are lots of good reasons to do exactly what you're proposing above. But performance is not one of them.


I wrote https://news.ycombinator.com/item?id=11989138 while you were writing that, and addressed your point there.

Performance and scalability are very different things. Stored procedures are great for performance, but bad for scalability.


Please see my response there. It's not as simple as you're implying; often your throughput suffers if you can't handle requests fast enough. MICA, for instance, injects packets directly into L3 to avoid cache misses; it's important that MICA handle these requests as fast as possible because otherwise the L3 cache will fill up, the queues will start to back up, and ultimately they'll put backpressure on incoming packets and slow down the whole system (greatly decreasing throughput). The fact that it has tail latencies measured in microseconds is good on its own, but its real benefit is its effect on throughput!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: