It's generally easier to fix an incorrect program than corrupt data. You're not ...

adw · on Feb 10, 2017

Yep. Everyone winds up with some war story about cleansing some multi-petabyte data store – but the better data engineers I know try very hard to avoid having two of them.

stouset · on Feb 10, 2017

Yeah. I'm a huge fan of dynamically (yet strongly) typed languages like Ruby. They absolutely have their place. But weakly-typed data persistence or data interchange are absolutely terrible.

krylon · on Feb 10, 2017

It's not just about type safety, though. A good RDBMS enforce things like referential integrity (FOREIGN KEY constraints) and allow you to express further constraints on your data (e.g. order.amount must be a positive number, a certain combination of columns must be unique across the table, etc.)

And most (all?) RDBMS that have been around for a while have been tuned and optimized to support this efficiently. I remember reading that at some point (1980s-1990s-ish) DBMS vendors were buying compiler developers like crazy to help them with query optimization and such.

digitalzombie · on Feb 10, 2017

Seriously, eventually the data have to be structure some how so you can make use of it.

Hence you end up writing map reduce down the road.

Mind as well do it upfront with Relational DB.