Smash: An efficient compression algorithm for microcontrollers

jhfdbkofdcho · on Oct 29, 2020

Reminds me of this compression project from a few years back from Steve Chamberlin at Big Mess O' Wires, FC8.

FC8 is designed to be as fast as possible to decompress on "legacy" hardware, while still maintaining a decent compression ratio. Generic C code for compression and decompression is provided, as well as an optimized 68K decompressor for the 68020 or later CPUs. The main loop of the 68K decompressor is exactly 256 bytes, so it fits entirely within the instruction cache of the 68020/030. Decompression speed on a 68030 is about 25% as fast as an optimized memcpy of uncompressed data.

The algorithm is based on the classic LZ77 compression scheme, with a sliding history window and duplicated data replaced by (distance,length) markers pointing to previous instances of the same data. No extra RAM is required during decompression, aside from the input and output buffers. The match-finding code and length lookup table were borrowed from liblzg by Marcus Geelnard.

https://www.bigmessowires.com/2016/05/06/fc8-faster-68k-deco...

oso2k · on Oct 30, 2020

This sounds like it shares some similarities with LZ4.

jhfdbkofdcho · on Oct 31, 2020

Indeed

reidrac · on Oct 30, 2020

This is basically an ad.

The author of LZSA has a nice comparison that may help:

https://github.com/emmanuel-marty/lzsa

LZSA focuses on very fast decompression on 8-bit systems; so I suspect some of those could be used for MCUs as well, although getting to the level of integration that Segger is selling with their compression method may not be trivial.

rkagerer · on Oct 30, 2020

After I failed to find the code and an accompanying license, I reached the same conclusion:

This is basically an ad.

keithnz · on Oct 29, 2020

A nice library that's super small (embedded into a 16 bit PIC chip) and works pretty nicely is lzfx https://code.google.com/archive/p/lzfx/#!

but it got abandoned, though the code is still completely useable for small micro controllers. It works pretty well for small ish payloads

ed25519FUUU · on Oct 30, 2020

Abandoned or finished?

jononor · on Oct 30, 2020

The author calls it "unmaintained". And Google Code was shut down in 2016, everything there is read-only archived. While such a project probably does not need regular updates, bugs are often found in code over time. For this project there is effectively no place to report, discuss, fix and integrate such. So abandoned does not seem inaccurate.

keithnz · on Oct 30, 2020

yeah, it doesn't really have a proper home anymore, but it is pretty much finished as a library. While I have used it for many years now and not had a problem, it still may have problems on edge cases I haven't touched on. I mainly mention it here because it took me a while to find a nice small simple library that can be easily embedded in smaller microcontrollers and uses no dynamic memory. Often a hard thing to come across anything done well.

jononor · on Oct 30, 2020

Your tip is much appreciated! I have also been on the lookout for a microcontroller-friendly compression library, and it has indeed been hard to find.

KMnO4 · on Oct 29, 2020

Segger makes high quality stuff, but they’re expensive and their business model is “anti-modern”. The cost to use emCompress is $2500 USD.

Also, they specify that SMASH gets a worse compression ratio than LZMA but is an “order of magnitude less complex measured in lines of code”.

I’m not convinced that fewer lines of code means better performance on embedded systems. If anything, embedded code that has several thousand lines of code typically means there are hundreds of preprocessor macros that enable/disable certain paths depending on your architecture for performance.

jononor · on Oct 30, 2020

Lines of code is not all that important. But program size is a key metric for microcontroller-type embedded systems, since the total available FLASH is typically in the 1-1000 kB range.

cozzyd · on Oct 30, 2020

At least they let you use a GdbServer on the J-Link Edu Mini (although I suppose that may not be an option for non-academic users).

alpanka · on Oct 30, 2020

Segger also provides cheap high quality development boards and very affordable academic versions.

Lauterbach on the other hand...

snops · on Oct 29, 2020

See also Heatshrink[1] based on LZSS

[1]https://spin.atomicobject.com/2013/03/14/heatshrink-embedded...

superdisk · on Oct 30, 2020

So, uh, where's the algorithm? This article is just hyping it up but I can't find the actual algo anywhere.

s_gourichon · on Nov 1, 2020

Same conclusion as others: no source, no figures, TFA is basically an ad.

There are already tools to compress on a PC and decompress on a small machine, with more-than-decent compression rate and decompression speed, for example Exomizer https://bitbucket.org/magli143/exomizer

As an example, a decompression routine for the Z80 is about 154 bytes of code, needs 156 bytes of private data, and the decompression speed is not that much lower than a memory-to-memory copy speed. It even has options to ease the case when decompressing in the same memory range as the compressed data so that you can load data in RAM and decompress in-place.

I actually used it in a project (not yet open-source, but based on my open-source framework https://github.com/cpcitor/cpc-dev-tool-chain which integrates exomizer).

vardump · on Oct 30, 2020

While interesting the niche for Smash seems to be small.

You'd have to have a system that's big enough to have some RAM where to decompress in the first place. So forget those ultra small 256 bytes or less RAM systems.

Mid-sized systems (say 16 kB+ RAM), I think you'd still mostly run directly from ROM. Any RAM is probably needed for other use.

Larger systems with ARM M0 core (or better) and 64 kB+ RAM have a ton of CPU power and typically plenty of other resources as well.

Perhaps there are people who need to deal with mask ROMs, but do those need to be that small either in 2020? Or penny pincher cases where you just can't have an extra I2C EEPROM / flash chip and your target uC doesn't have enough flash storage.

Maybe I'm missing something.

In any case, LZ-family compressors (like LZ4) have served me well so far.

Edit: That LZSA mentioned in other comment(s) looks cool. Regardless, a small niche of applications where you'd need this in the first place.

amelius · on Oct 29, 2020

Does this allow code to be decompressed while the code is executing? See [1] for one such example.

[1] https://link.springer.com/chapter/10.1007/978-94-017-8798-7_...

simcop2387 · on Oct 30, 2020

would depend a lot on the architecture of the controller. not all of them allow you to run code from ram (Harvard vs von Neumann architecture differences).

bluedino · on Oct 29, 2020

Can decompression cause issues with real-time applications? Say a packet of data takes a little longer to process than another.

icegreentea2 · on Oct 29, 2020

You would need to understand what the worst case decoding time/operations for a given sized packet, and build your system/code around that.

Real-time does not necessarily mean that all operation X must take exact same time, it just means that you know that all operation X will occur within some span.

magicsmoke · on Oct 30, 2020

Real-time systems are unfortunately named. Time-constrained systems would have been better. Variable naming is, after all, one of the two real hard things in Compsci.

jononor · on Oct 30, 2020

Latency constrained perhaps? Time has many meanings.

mhh__ · on Oct 29, 2020

I think yes in principle although this is why you test things thoroughly (In an embedded context there both probably won't be much data flowing around and the code will probably be simple enough that any glaring problems should be fairly obvious)

Ultimately it depends on whether you mean embedded as in a self-driving car, or embedded as in a microwave controller.

mmcallister · on Oct 29, 2020

> Error establishing a database connection

Hugged to death :-(

jhfdbkofdcho · on Oct 29, 2020

https://web.archive.org/web/20201029220223/https://blog.segg...

robinanil · on Oct 30, 2020

How does it compare again zstd?

mmastrac · on Oct 30, 2020

Poorly, since it's designed for different goals (scale vs microprocessor).