Another thing that matters enormously is branch misprediction. For example, supe...

nicknash · on May 8, 2015

Very cool. In my experience, using a skewed pivot doesn't actually help in practice. Outside of the case you mention (where the input is a random permutation), sampling must be used to choose a skewed pivot, which has a fairly significant overhead.

lorenzhs · on May 8, 2015

But using a skewed pivot is not my point. It would not be particularly clever to respond to the problem of branch mispredicitions by doing suboptimal things. The solution is using algorithms that avoid hard to predict branching and exploit data locality. Thus, super scalar sample sort, which is significantly faster than skewed quicksort even in the case where quicksort gets the pivot for free.

Regarding overhead for sampling -- super scalar sample sort samples sqrt(n) elements, sorts them, and picks a fixed number of pivots from them (e.g. 256 in the implementation I used). But that is worth it, as no branches are needed to find in which "bucket" (between which two pivots) to put an element, using predicated instructions. So the overhead is not the problem, and as you can see in the other plots (in /plots), ssssort executes more instructions. It is still significantly faster than the other algorithms because it exploits the properties of the CPU.

nicknash · on May 8, 2015

Sure. I didn't actually mean ssssort. Sometimes it's worth distinguishing between comparison based sorts, and those that make use of key structure, assume a fixed size key universe etc For that reason, I think considering sampling overhead is worth thinking about.