Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Remember also that "partition" is not "yes or no" but rather a latency threshold. If the network is connected but a call now takes 30 seconds instead of milliseconds, that is probably a partition




this is likely wrong. the issue with partitions is that we can no longer communicate at all, thus we can't end up in the same state. If we have poor performance, thats certainly something that worth putting machinery in to adapt to, but its not at all in the same class as 'I can't talk to you and I dont know what you're doing at all' fro a correctness standpoint

edit: yeah ok, since failure detection is being driven by timers by necessity, then sure. the tradeoff we're making between the interval under which we're unable to make progress vs the upheaval caused by announcing a failure.


Yeah, I glossed over a few steps. There's likely a latency threshold beyond which you should abort, and then it is a partition (after all, that's what TCP is doing under the hood if it sends a packet and doesn't get a response).

One should be so lucky to have an operation fail immediately, rather than lumber on until it times out (holding resources hostage all the while)!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: