Is there a unix-style streaming tool, like tar/zstd/age, that does forward error...

dmitrybrant · on Jan 27, 2022

If I'm not mistaken, the tape drive automatically adds ECC to each written block, and then uses it to verify the block next time you read it. So if there's bit rot on the tape (i.e. too much for ECC to fix), it will just be reported as a bad block with no data, and there wouldn't be any point of adding "second-order" ECC from the user end.

metabagel · on Jan 27, 2022

You’re exactly right. There is substantial ECC in the LTO format. If the drive can recover the data, then it’s valid.

BenjiWiebe · on Jan 27, 2022

There might be a point if you interleaved data and/or had a much higher amount of EC, such that you could recover from isolated bad blocks.

kortex · on Jan 29, 2022

I'm not using tapes through, I'm using ssds, flash drives, and hdds. I'm sure they have some internal ecc, but is it enough?

dmitrybrant · on Jan 29, 2022

Correct, all modern storage media perform plenty of internal ECC. That's basically what "bad sectors" means: the drive wasn't able to correct the data on that sector using ECC. The drive will never return the actual "raw" data -- it's either valid data or a bad sector. This means that if you want to add your own application-level ECC, it would need to be enough to correct an entire sector's worth of missing data. For most applications this would be a prohibitive amount of extra overhead.

c0l0 · on Jan 27, 2022

It may not exactly be what you are looking for, but if you want to protect a stable data set from bit-rot after it's been created, make sure to take a look at Parchive/par2:

https://en.wikipedia.org/wiki/Parchive

https://github.com/Parchive/par2cmdline/

genewitch · on Jan 27, 2022

Parity archives used to be extremely popular back when dialup was king. I've often wondered if there's a filesystem that has that sort of granular control over how much parity there is. I'd use it, for sure.

uniqueuid · on Jan 27, 2022

ZFS is probably closest to what you want.

It allows you to choose the amount of parity on the disk-level (as in: 1,2, or 3 disk parity in raidz1, raidz2 and raidz3). You can also keep multiple copies of data around with copies=N (but note that when the entire pool fails, those copies are gone - this just protects you by storing multiple copies in different places, potentially on the same disk).

[edit] To add another neat feature that allows for granularity: ZFS can set attributes (compression, record size, encryption, hash algorithm, copies etc.) on the level of logical data sets. So you can have arbitrarily many data stores on a single pool with different settings. Sadly, parity is not one of those attributes - that's set per pool, not per dataset.

genewitch · on Jan 28, 2022

My only issue with ZFS is it only works (ideally, with least effort) with multiples of the same size disks. I got into a bit of an argument on IRC with someone who said "just partition, ZFS can figure it out" and i'm thinking of all of the 128GB and 256GB SSDs, the 640GB spindles, and so on that i want to actually use for data, and it sounds like it'd be more hassle than it's worth. Something like "take everything on this 2TB drive, and put the .PAR/.Pnn archives on at least 2 other "filesystems", be they two 1TB or 2 128, a 256, a 512, and a 1TB.

Eventually i think i will start populating 2-4 sata drives on my 15-25W TDP celerons, atoms, and so on, and just give them away to people who need a computer for whatever. I'll even toss in a GTX 1050ti.

Notanothertoo · on Jan 27, 2022

Zfs is king imo. Brtfs is the more liberally licensed oss competitor and Refs is the m$ solution.

JustFinishedBSG · on Jan 27, 2022

Still extremely popular (as in the norm) on Usenet

StillBored · on Jan 27, 2022

So, while others have pointed out the media blocks are ECC protected/etc, I think what you are really looking for is application/fs control. LTO supports "Logical Block Protection" which is meta data (CRC's) which are tracked/checked alongside the transport level ECC/etc on fibrechannel & the drive itself.

Check out section 4.9 in https://www.ibm.com/support/pages/system/files/inline-files/....

To be clear, this is a "user" level function that basically says "here is a CRC I want the drive to check and store alongside the data i'm giving it". It needs to be supported by the backup application stack/etc if one isn't writing the drive with scsi passthrough or similar. Its sorta similar to adding a few bytes to a 4k HD sector (something some FC/scsi HDs can do too) turning it into a 4K+X bytes sector on the media, that gets checked by the drive along the way vs, just running in variable block mode and adding a few bytes to the beginning/end of the block being written (something thats possible too since tape drives can support blocks of basically any size).

The problem with these methods, is that one should really be encoding a "block id" which describes which/where the block is as well. Since its entirely possible to get a file with the right ECC/protection information and its the wrong (version) file.

So, while people talk about "bitrot", no modern piece of HW (except intel desktop/laptops without ECC ram) is actually going to return a piece of data that is partially wrong because there are multiple layers of ECC protecting the data. If the media bit rots and the ECC cannot correct it, then you get read errors.

eternityforest · on Jan 27, 2022

There's gotta be an API to get the raw data even if it's wrong, right?

StillBored · on Jan 27, 2022

Not usually, its the same with HD's. You can't get the raw signal data from the drive unless you have special firmware, or find a hidden read command somewhere.

The drive can't necessarily even pick "wrong" data to send you because there are a lot more failure cases than "I got a sector but the ECC/CRC doesn't match". Embedded servo errors can mean it can't even find the right place, then there are likely head positioning and amp tuning parameters which generally get dynamically adjusted on the fly. This AFAIK is a large part of why reading a "bad" sector can take so long. Its repeatedly rereading it trying to adjust/bias those tuning parameters in order to get a clean read. And there are multiple layers of signal conditioning/coding/etc usually in a feedback loop. The data has to get really trashed before its not recoverable, but when that happens it good and done. (think about even CD's which can get massively scratched/damaged before they stop playing).

dmitrygr · on Jan 27, 2022

   man par2