I'm going to get flamed for this, but I think, even for a data startup, worrying...

earlyresort · on Nov 3, 2010

Do you have full control over the amount of data your system is taking in?

The startup I founded had analytics code in a ton of iPhone applications and was handling the load just fine right up until the day it suddenly wasn't. By that point we had customers who relied on us, and we had to deal with it very quickly. Not fun. And there's certainly more to scaling than just cheap architecture. We thought EC2 would handle the overflow until we unexpectedly became completely I/O bound. Firing up a few more instances can't fix that.

If you're just running some scraper and can control what you're taking in, that's a completely different story.

il · on Nov 3, 2010

You're absolutely right, I hadn't considered analytics as an example.

Some data startups I've seen as well as my own project take in existing data sets and simply generate reports from it for customers. Makes it a lot easier to scale.

SkyMarshal · on Nov 3, 2010

I think there's a happy middle ground between premature optimization and naive development.

While the former should not be allowed to impede one's progress toward a MVP, real customer feedback, and the potential need to adapt or pivot, neither should one ignore early optimization decisions where they are inexpensive and may only minimally impede (if at all) that progress.

Being able to recognize the difference is a talent that comes with experience.

Maro · on Nov 4, 2010

I agree. The first step should be a dataset that fits into a large CSV file or a regular DB, and play around with Gnuplot 'til your eyes pop out.