That was my first thought, too. I installed and ran both against a tar'd set of PDF files totaling 435MB in size. My timings:
lzfse 45 MB/s encode, 229 MB/s decode, 1.12 comp ratio
zstd 181 MB/s encode, 713 MB/s decode, 1.13 comp ratio
The numbers are so dramatically different that I ran several different tests, but those results showed the same rough results. I used default command-line options for both tools, and both created very similar compression ratios.
Note that LZFSE has a somewhat different goal, however: it's designed to be the most power-efficient compression algorithm out there, in other words on mobile devices LZFSE optimizes for bytes-per-watt rather than bytes-per-second. Zstandard, on the other hand, runs multiple pipelines and such--it's banking on having a server-class processor to run on.
Edit: hardware is a 2013 MacBook Pro, pretty fast flash storage, and 2 cores/4 threads. I warmed cache before each run and sent output to /dev/null, so the numbers above are best-case.
"Note that LZFSE has a somewhat different goal, however: it's designed to be the most power-efficient compression algorithm out there, in other words on mobile devices LZFSE optimizes for bytes-per-watt rather than bytes-per-second."
Sure, but is that even relevant? I mean, is there any way that lzfse could possibly be more power-efficient per byte than zstd when zstd is 3-4 times faster for the same compression ratio? According to the docs zstd doesn't have any support for multiple threads right now, so it should be a fair comparison.
I use "fastest to complete == least power usage" as a rule of thumb because of "race to sleep". I suppose that might be thrown off by power usage characteristics varying based on number of cores working? How does one even begin to write code that prioritizes power-efficiency over performance?
For short runs, any fixed-cost warmup/cooldown periods might dominate - a latency/throughput tradeoff kind of affair. (I am just shooting my mouth off and have no idea whether that's the case here.)
As for how to do it, I'm not sure... but if I were a valued customer of various CPU suppliers, and were famous for the depth of my pockets, I'm sure I'd be able to find somebody to explain it to me ;)
Is low energy significantly different from high performance code? I thought the general advice was make the code as fast as possible so the work can be finished and the CPU slow/power down again.
Are there cases where an algorithm takes 10x the wall clock time to execute, but actually uses less energy on the same chip?
(Memory use/access is the main thing I guess that could be different.)
>> Is low energy significantly different from high performance code?
It can be. Certain operations are more power efficient than others - subtraction and then a check for negative value instead of comparison, certain vector operations, loop unrolling, using less memory/bus traffic...
>> Are there cases where an algorithm takes 10x the wall clock time to execute, but actually uses less energy on the same chip?
Slower code can be more power efficent. You're just tuning for different results.
Interesting. Do you know of any open source examples offhand?
I'd imagine this also requires very detailed knowledge of the chip, microcode, etc. Probably hard for x86 but I guess this kinda stuff would be on ARM where the programmer has deep vertical access (like as mentioned, Apple).
Thats what I was thinking. I haven't found any validation or even the rationale behind lzfse's supposed lower power usage. I can think of two things.
1. Apple tried to write a fast and reasonably compressible version of LZ4 thus improving power usage by creating LZFSE since none existed but beaten out handsomely by Zstandard.
2. Following a parent comment, Zstandard might some of the things that are dependent on a highly OoO cpu with lots of caches, extremely good branch predictor that could be significantly slower on an ARM even the apple one despite how good they are. Or they could still be slower but on ARM the gap might not be as big and the decision not as cut and dry as it seems now.
Would love to know what the actual case is from someone involved in LZFSE.