Backblaze Storage Pod 4.5 – Tweaking a Proven Design

ricardobeat · on March 7, 2015

I really enjoy these posts, and would be a happy Backblaze customer if they didn't have this (unpublicized) policy of deleting external hard-drive backups after 30 days:

    Backblaze works best if you leave the external hard drive attached to
    your computer all the time. [...] If the drive is detached for more
    than 30 days, Backblaze interprets this as data that has been
    permanently deleted and securely deletes the copy from the
    Backblaze datacenter.

This probably reduces the total amount of storage used, making the $5/month economically viable. I'd happily pay more if it weren't for this though, currently paying ~$12/month for Bitcasa.

atYevP · on March 7, 2015

Yev from Backblaze here -> we do have that on our website (https://www.backblaze.com/cloud-backup.html and https://www.backblaze.com/remote-backup-everything.html) and in some help articles but I hear you. It's a topic that comes up, but when we've done the calculations in the past, keeping a copy forever as an archive raises our costs to the point where we'd have to raise our prices and we're not too keen on that! We have notification emails so that we let you know ahead of time before data is removed though, so it's rarely become an issue in terms of "accidental deletions".

free652 · on March 7, 2015

You don't have it here -> https://www.backblaze.com/cloud-backup.html. There is no mention of 30 days. I am so what pissed about since I bought the service just to backup my external drive.

ricardobeat · on March 7, 2015

Hi Yev. It is indeed on the website, but only in the second link you mentioned. I only found out after signing up for the service last year, and then decided to cancel. I wanted to backup photo collections that I might only access once a year - not very keen on having to do a maintenance scanning session once a month. Maybe you could come up with a separate plan, with limited storage, that solves this use? In any case, thanks for taking the time to reply.

atYevP · on March 7, 2015

Yea, there are lots of ways to address it, and we talk about it every now and again though haven't come up with an elegant solution yet! One the first link it states that deleted files will be removed after 30 days. A removed hard drive acts the same as a deleted file, since it's no longer on your computer. I'll chat with our team and see if we can make it clearer.

carterf · on March 7, 2015

Damn, I didn't know that either. Been a customer for years and wish I had seen that earlier. I've also had many lapses in time. As long as I'm paying why would you ever delete my data?

atYevP · on March 7, 2015

You should be getting alerts whenever you're nearing a time lapse (up to two weeks ahead of one) so we do try to warn you when your nearing the 30 day timer (if not please contact support - seriously). The reason we have a 30 day timer is that we use the reclaimed space, if we were to infinitely house a copy of every file that was uploaded to us, it would dramatically raise our costs, and at $5/month for an unlimited product our margins are very thin as it is. Most companies that do keep data "forever" augment that cost by having either tiered pricing, or "per GB" pricing so that they can continue covering the cost of the space for as long as that file exists. Since we're unlimited and inexpensive for the service, we don't have that elasticity. That's not to say we haven't thought about enacting an additional fee for that type of service, but we haven't quite figured out a good way of doing it. But we're constantly thinking of ways to better the product, and know this is one that people do feel strongly about!

DenisM · on March 7, 2015

So if I go on a 4-week vacation, I will return to an empty account? Uh oh. That should be printed in bold red letters everywhere. It's the opposite to peace of mind.

atYevP · on March 7, 2015

No, if your computer is offline, we keep data for 6 months, this is primarily for externals that are plugged in and out.

matthewmacleod · on March 7, 2015

I'm a Backblaze user, and think this is a pretty reasonable policy — Backblaze is a backup system with a 30-day history, and not an archival or "unlimited online storage" system.

In addition, that policy seems to be pretty obvious. I definitely knew about it before I signed up, I'm sure it's signposted in the app itself, and I receive email notifications about a drive that's not plugged in.

All-in-all, seems pretty reasonable.

ejdyksen · on March 7, 2015

+1

This is a deal-breaker for me. I have limited SSD space on my MBP, so my Lightroom catalog is on an external hard drive. I can easily go 30 days between plugging it in.

That drive contains, by a very wide margin, my most important and least replaceable files.

gry · on March 7, 2015

What is your backup solution?

ripdog · on March 7, 2015

I'm not the person you replied to, but I use crashplan specifically because they offer utterly unlimited backup for the same price. Deleted files and unlimited file revisions are kept forever.

I have no idea how they do it, but they've been around for a fair while.

ejdyksen · on March 7, 2015

I've been a CrashPlan customer for about 7 years.

nonissue · on March 7, 2015

It actually makes sense. If you think about it, if every external that anyone ever plugged in was backed up and then never seen again, that could be sooo much data. What if that drive was used, wiped, renamed and then used again? You potentially have the same drive on BB twice.

A way to alleviate the issue would be to charge a small fee per drive or per account for persistent external back ups. If it was 1-2$ a month extra for the peace of mind that a drive would always be backed up, I'd gladly pay it.

At the end of the day, you have a choice as a consumer and you went with bitcasa. I'm happy paying $5/m for unlimited backup for my main computer. All said and done it's an amazing deal, even when you consider the external drive issue.

_ojkn · on March 8, 2015

I know you might not care about the security and you're only using them for the cheap storage but just keep in mind that Bitcasa aren't as secure as they claim they are as they're able to provide your data to third parties via both their web interface and their API, which means that they possess all decryption keys. i.e. the second section of the following quote from their 'Personal' page is false:

"Bitcasa keeps all of your files secure by applying client-side, AES-256 block-level encryption before they are even uploaded to your Personal Drive. Only you have access to your content. Not even Bitcasa employees can access or determine what files are in any given user’s account."

I was a customer of theirs when they first launched beta but ever since they've been screwing with their consumer customers with seemingly little thought to the consequences and their marketing copy is dangerously deceptive.

I've since moved to SpiderOak, who are quite transparent about their security and have updated their pricing to be more competitive.

grandinj · on March 7, 2015

Aaaargh. Yet another idiotic website that disables zoom gestures, making it unreadable on my iphone.

k1t · on March 7, 2015

  Secure Connection Failed

  An error occurred during a connection to www.backblaze.com.
  Cannot communicate securely with peer: no common encryption algorithm(s).
  (Error code: ssl_error_no_cypher_overlap)

Oh dear, please upgrade your SSL certs.

duskwuff · on March 7, 2015

This has nothing to do with the SSL certificate. The site is configured to use RC4, which used to be the recommended cipher to avoid certain attacks (particularly BEAST). However, RC4 has other weaknesses, and this is no longer recommended. Current versions of Firefox will actually refuse to negotiate an RC4 connection.

sp332 · on March 7, 2015

Doesn't that mean that the certificate is using the outdated RC4 cipher, and should be updated?

skuhn · on March 7, 2015

No, the SSL cert does not use RC4. RC4 is a block cipher which is used to encrypt data transferred between your browser and the server, akin to AES.

Backblaze does need to step it up in terms of their SSL configuration [1], particularly if this is indicative of their configuration for actual file transfer (no idea though). If I had to guess, they're using their OS's OpenSSL and thus trapped on 0.9.8 or 1.0.0 (both of which do not support TLS v1.2) and maybe haven't spent much time tuning the config lately.

A lot of places are afraid of diverging from their OS vendor's version of OpenSSL, but I personally think it is a mistake to get stuck in time with such a crucial component. Likewise if your service is built around nginx -- are you going to stick with whatever RHEL 5 shipped for 8 years?

[1] https://www.ssllabs.com/ssltest/analyze.html?d=www.backblaze...

WatchDog · on March 7, 2015

RC4 is a stream cipher, not a block cipher.

miduil · on March 7, 2015

@brianwski (or any other backblaze volunteer) We still can't visit your website. https://news.ycombinator.com/item?id=8999036 It's been about a month ago we mentioned this issue.

Scarbutt · on March 7, 2015

So they don't offer a linux client to keep people away from running it on servers? ;)

fraXis · on March 7, 2015

Duplicate: https://news.ycombinator.com/item?id=9154273

slyall · on March 7, 2015

I submitted and got that one. Problem is it had already vanished from the "new" page so wasn't going to make the front, ever.

Resubmitted with "Backblaze" in the title to get better traction. I got lucky this time, sometimes I submit something and get nowhere while another submit is the ones that takes off.

Maybe a tweak to the new page so that if somebody resubmits a URL that has fallen off the new page it gets pushed back on there. The current system is a bit of a lottery, just one chance for a URL to make it and then it's gone.

atYevP · on March 7, 2015

Weird how timing is everything, right?

omarforgotpwd · on March 7, 2015

Why design your own storage pods when services like Amazon S3 offer similar prices, even with bandwidth?

skuhn · on March 7, 2015

It's not trivial, but it is fairly easy to beat S3 on storage costs alone, let alone the other factors.

I personally wouldn't use Backblaze's storage pod, so my numbers are based on equivalent components (and don't reflect what's possible if you negotiate prices and optimize for your particular needs -- Backblaze's costs are surely lower than this):

  11 Supermicro 36 drive 4U server : $4000 each
  1 48-port top-of-rack switch : $5000 each
  374 4TB drive : $400 each
  22 500GB drive : $200 each
  2 CDU : $3000 each
  Rack + integration : $10000 each

Total rack cost is $220,000 for 1.4PB or 700GB with basic 2-copy redundancy. Power draw is around 8kw, so 1 year of power is around $1100. Figure on a drive failure rate of 5%, memory 2%, PSU 3% and you might spend another $18,000 to stock spares. So besides your core datacenter costs, this solution will cost $239,100 to operate for a year. These costs only improve when you amortize the hardware over a 3 year period, which is standard.

Now how much does S3 cost to store 700GB for a year? $248,530.94.

Finally, consider bandwidth costs. I'll crib from a recent comment of mine [1]:

1 gbit/s in a datacenter: $30,000 / year 500 mbit/s in S3: $142,540 / year

And that in a nutshell is why AWS doesn't make financial sense at any significant scale. S3 makes sense if you aren't really going to use all of that space: you can scale your spending up as your needs scale up. That isn't the kind of problem Backblaze or other storage businesses have, there isn't much chance of that storage going unused for any extended period of time. And if they architect their datacenter and service correctly, the down period between rolling in hardware and getting it utilized sufficiently to begin seeing a return on the investment should be minimal.

[1] https://news.ycombinator.com/item?id=9139244

toomuchtodo · on March 7, 2015

Thanks for doing the math. I've repeatedly shown people (I do Infrastructure, sometimes AWS sometimes physical) that AWS is awesome for proof of concept. All opex, no commitments. As soon as you start getting to a certain scale though, time to move onto physical equipment when you've got predictable load patterns.

If Stackoverflow can still do hardware colo'd, so can you.

skuhn · on March 7, 2015

I completely agree. I sometimes take flack for "hating" AWS (and I do hate some aspects of it), but I think it's a great platform for incubating your service or proving a concept. Every company I've been at in the last five years has used AWS to an extent, despite operating their own datacenters. It's great for random analytics jobs or other tasks that aren't steady work tied to your core business.

It quickly starts to not make sense for services that have usage-driven growth patterns. AWS's billing model just doesn't work for this -- even though higher usage is discounted, it isn't enough to overcome the skew of the model. Companies that have deployed on AWS often realize this too late: their business starts to take off, and they get crushed under giant AWS bills that would be disproportionate even at 50% off list. It's much harder to move off once you've hit scale.

acdha · on March 7, 2015

A lot of this also comes down to ops & the features you really need – e.g. your example above didn't include the costs of the geographic redundancy or strong data integrity features which are standard with S3. A lot of places simply don't have geographically redundant datacenters and either lack the support team to run something like a replicated cluster filesystem in heavy production usage or, more commonly, don't have enough usage to really push it into clear win.

In the case above, the cost differential at 700TB is roughly the cost of a single sysadmin and so the running operation would probably be a wash versus S3 unless it had heavy data churn to maximize the S3 expenses or a good way to amortize the staffing needed for 24-hour-a-day support across other projects. Many places have budget processes which favor spending more on known upfront costs than getting permission to add full-time employees in the hopes of being able to beat those costs down the road.

I'm hoping that something like OpenStack Swift matures to the point where some of the complexity at the software + ops level starts making the DIY option routine. You'll never get rid of the minimum sysadmin commitment but that becomes a lot easier if it can be 5% of the night-shift because the software doesn't require a lot of care and feeding.

skuhn · on March 7, 2015

Yeah, my solution is a gross oversimplification -- my math is just back-of-the-napkin stuff, you can actually do much better on component pricing (I certainly don't pay $5000 for a 1G top-of-rack switch) and there are other variables to consider.

To quickly address some of your points:

Geographic redundancy: not included. From a hardware perspective it's a pretty easy upgrade (build two datacenters in CA / VA, buy two racks, put one in each). From a service perspective, it's a bit more challenging. If storage is your business, these are the core components you should own though.

The US Standard region in S3 does sort of provide geographic redundancy, but it's a question of whether it's good enough for your use case. S3 is a complete black box, and you have no way of knowing with certainty that every data blob is actually in both geographic regions. Besides that, it's eventually consistent, so if you need absolute assurances that data is not lost after a write succeeds, you have to implement logic on top of S3 -- and that logic must run in EC2 instances, or you'll pay a fortune in bandwidth to execute it.

No other region in S3 provides geographic redundancy (or eventual consistency, the two are related), so if you wanted to deploy storage in Europe or APAC, you have to handle the geographic distribution yourself -- and it will cost double to store that data.

The small cost difference: 1 rack over 1 year has a pretty small differential between the two. But you can optimize the datacenter numbers, and the S3 numbers are what they are unless you get massive. For example, over 3 years the datacenter cost goes to around $275,000 whereas S3 is $745,000. That's a serious difference now. Ownership is a powerful advantage.

OpenStack Swift: I don't think it will ever be a good solution, based on the project's history. It's unfortunate, but hopefully something else will come along that does blob storage in a sane and scalable way. This is not rocket science, but there are some devils in those details. If a mature open source solution came along with immutable objects, erasure coding, background scrubbing and sane ops tools, it would blow Swift out of the water.

Finally, I think it's a (common) mistake to think that you can get by without ops personnel if you deploy on AWS. The actual site / dc ops component is pretty minor if you build right. Either way, someone still has to run the service, do capacity / project / budget plans, and other things. And with AWS you need people who are capable of winning arguments with their support -- where you will burn substantial time proving where fault lies.

S3 does a lot for you, but it's still a service that you build yours on top of and it's far from a perfect solution that never breaks. Buckets have to be primed before heavy usage. Your file naming scheme will impact your performance at scale. And so on. You need people to look at this, whether you call them ops engineers or sysadmins or developers.

acdha · on March 7, 2015

> Geographic redundancy: not included. … If storage is your business, these are the core components you should own though.

Definitely – part of what I was thinking about is that it's also a good point to actually ask what you actually need. My experience has been in fields (science, digital preservation) where it turns out that people need large amounts of storage and strong guarantees against data corruption but online access is less critical since almost everyone works with more manageable derivatives than the original data. Unlike, say, a database there also aren't concerns with multiple writers since each batch of data originates in one place and should rarely if ever change, even though multiple locations might be generating other batches at the same time which everyone would want to read. That offers a ton of optimizations which are much harder to do if you're trying to maintain the same public contract as S3.

> OpenStack Swift: I don't think it will ever be a good solution, based on the project's history

Sadly, I've been coming to the same conclusion. I've been meaning to look at GlusterFS again as they've been adding things like erasure coding & the API portion of Swift but last time I checked they still haven't added strong integrity checks or scrubbing.

> Finally, I think it's a (common) mistake to think that you can get by without ops personnel if you deploy on AWS. The actual site / dc ops component is pretty minor if you build right. Either way, someone still has to run the service, do capacity / project / budget plans, and other things.

Agreed. The main thing I was referring to is that if you have to run a 24x7 service you have certain minimum staffing requirements. That's easily justified if you have enough demand but a small operation even inside a large company might find it easier to let AWS handle the basic ops so their staff can handle the higher-level ops work during normal business hours. (This is again a good opportunity to review the actual business needs to ask whether you really need immediate responses at 3am or can live with a delay while someone gets paged)

vidarh · on March 7, 2015

It's funny how people thing AWS does away with ops. But doing ops work for people on AWS is highly lucrative - people have a really hard time getting it right.

viraptor · on March 7, 2015

I'm curious - since you went through trying out swift, when was that? Pre, or post diablo? That was the time of a pretty big change.

Also, have you tried ceph at the same scale?

skuhn · on March 7, 2015

Post-Diablo, and it never got to a point where I deployed it at scale and found issues: it flunked out before that.

Haven't tried ceph, although I look at it every few years. It sounds promising but it also sounds really complicated to setup and administer.

toomuchtodo · on March 7, 2015

It seems we've traveled the same path at different times :)

parasubvert · on March 7, 2015

So, why do Instagram and Netflix continue to use S3 for petabytes of video and image data? In Netflix' case, this doesn't even count their vast replicated CDN infrastructure between your device and S3.

The math is never this simple.

Firstly I assume you mean 700 TB, not GB.

You might as well look at reduced-redundancy S3 which brings the annual cost down to $194,220. With that you still have multiple WAN links, excellent throughput, multiple data centers, and no labour for dealing with failed drives.

And if we're truly trying to do backup / archival, where download throughput doesn't matter, let's look at Glacier, even if we do 50000 monthly uploads and retrieve 20% of the 700TB monthly, we're down to $96,630.

Most sizable businesses also aren't buying bargain basement drive arrays from SuperMicro, they're buying EMC, NetApp, 3PAR, or Hitachi.

As with anything S3 is not perfect, but it really depends on your use case if your own data center or rented equipment is appropriate. For many large businesses, Amazon makes more sense.

skuhn · on March 7, 2015

> In Netflix' case, this doesn't even count their vast replicated CDN infrastructure between your device and S3.

There is part of your answer. They only pay S3 outbound on cache misses, not on cache hits. That will lower their S3 bandwidth bill to 1/50th or 1/100th of the full cost.

Also, Netflix doesn't pay list prices for AWS, and they made a call that they wanted to be fully cloud based early in the streaming service's life. Maybe they would do it differently today, but that ship has obviously sailed for them.

On the other hand, lots of large AWS hosted services do discover that they need to move off in order to get their costs right; I have quite a bit of first hand experience there.

> Glacier

Costs and time to retrieve data from Glacier are quite different, and I would only use it for data that I basically never expect to access. A service like Backblaze couldn't use Glacier in reality, despite being a service for backups.

> Bargain basement drive arrays

On the contrary, I think that building it yourself with basic hardware building blocks is much more common at large scale Internet companies. A less technical or more appliance driven company might choose NetApp or EMC, but I don't think it's the right move for somewhere with some in-house expertise. It's the right move if you want more of a black box that you hope will work without much internal effort (hint: it won't work out that way).

moe · on March 7, 2015

Your cost estimates are outlandish, making Amazon S3 look like a bargain.

In reality 1.4PB raw will set you back well below $50k/pa, not $239k/pa.

If you're lazy you could even rent 1.4PB raw from e.g. Hetzner and you'd still be paying under $90k/pa (that would be 23 of their SX290 servers).

skuhn · on March 7, 2015

You can do significantly better with the right vendors and some negotiation -- this is specifically a pricing ballpark, not rock bottom deals. It's not outlandish at all to say that in terms of raw storage you're going to be close to S3's pricing, but it also doesn't reflect well on S3 at all. They are buying at a scale far beyond what you can hope to achieve.

This is money spent on physical equipment that you can choose to lease, that you can take depreciation on, that you can resell -- these are assets, not a line item on a monthly bill. S3 has no such benefits. And this doesn't even taking into account bandwidth, which is the real killer. The stored data has to come out sometime.

1.4PB of raw storage requires ~374 4TB hard drives. If you want to spend $50,000, you have at most $130 to spend on drives, and that's only if you get literally everything else for free including electricity. You can almost buy 4TB desktop class drives for that price from Newegg, but not quite.

Don't buy drives from retail channels, and don't buy desktop drives at all (yes, I know Backblaze does both of these things). HGST 4TB MegaScale or UltraStar drives are a safe bet, depending on your performance requirements. They run between $250-300 in quantity, which means 1.4PB is $93,500 in disks alone.

moe · on March 7, 2015

Stop switching around between hardware cost and yearly cost, jesus... (pa means per annum)

0.7PB on Amazon S3 costs $810k for 3 years (as per their cost calculator).

Renting 1.4PB raw from Hetzner costs about $270k for 3 years.

Running 1.4PB raw on your own hardware costs about $150k for 3 years (on Supermicro JBODs).

It's not that hard really, and my point merely was that the difference is much bigger than you made it sound.

parasubvert · on March 7, 2015

Apples, to apples please.

0.7PB on Amazon S3 reduced redundancy is $582,660 over three years with geo-redundancy and availability across an entire region, and lifecycle to push to Glacier which would drop the price to $289,872 annually given 50,000 monthly uploads and 20% monthly retrieval. No labour when it comes to dealing with equipment failures.

I'd also say that only SMBs and web/internet companies run the equivalent SuperMicro JBODs, most large companies use enterprise storage vendors like EMC, NetApp, HP/3PAR which are premium priced.

moe · on March 7, 2015

Apples, to apples please.

You are right, I should have compared to S3 reduced redundancy, sorry.

So we're down to $582k (S3-RR) vs $270k (Hetzner) vs $150k (own hardware).

That's still a healthy 2x-4x difference.

Glacier

If you introduce Glacier then it's not apples-to-apples anymore.

parasubvert · on March 7, 2015

"If you introduce Glacier then it's not apples-to-apples anymore."

This is a thread about storage costs for backup as a service, so I felt at least it's in the same family of tree-bearing fruit ;)

vidarh · on March 7, 2015

I've done similar calculations for clients, and fully agree.

Cutting the storage costs is easy, but bandwidth in particular is ridiculously priced at Amazon (for the cost of 1TB transfer from AWS, I can rent a server with many times as much transfer and still have money left over), to the point where even if a client for some reason insists on S3 storage for perceived safety, it can still pay to store in S3 but put boxes "in front" with capacity for a single copy of each object, and fall back to S3 in case of local drive failures.

Unless the load is totally write dominated (so not good for backup services), cutting the AWS bandwidth bill can often pay for the extra servers and colo costs many times over.

jrs235 · on March 7, 2015

I think you mean 700TB not 700GB

skuhn · on March 7, 2015

Right, just a unit problem -- the pricing is still correct. I seem to make this mistake a lot lately, since everything is billed per-gig or per-meg, but you only really care about per-TB units. Oops.

atYevP · on March 7, 2015

Don't you mean...ops? Hah...sorry puns :-/

nullc · on March 7, 2015

>1.4PB or 700GB with basic 2-copy redundancy

You mean 700TB.

atYevP · on March 7, 2015

Yev from Backblaze here -> When we first started producing the pods, we wanted to do two things: 1) provide unlimited backup 2) provide it for $5/month. We did the math at the time and found that for $5/month you'd be able to store about 30GB a year on Amazon S3 (at the time it was about $0.15/gb). So we needed to create our own solution to this problem and designed storage pod 1.0, which provided 67 terabytes for about $7500. We figured the average person would store more than 30GB of data so the Amazon route wouldn't work. Ours did and continues to, based on a variety of factors but the main one being, as Amazon's costs decrease, so do ours (except for the whole Thailand thing...https://www.backblaze.com/blog/backblaze_drive_farming/ and https://www.backblaze.com/blog/farming-hard-drives-2-years-a...). Plus it's kind of nice controlling your own destiny!

e12e · on March 7, 2015

How do you figure 0.03 per month is similar to 0.05 per 4? years?

[Ed: Not to mention, from the comments: "our drives [have done] roughly 4 gbps of traffic over the last week." I don't know about you, but I'm not doing 4 gbps sustained to s3 ...]

eCa · on March 7, 2015

Regarding your edit, if I read it correctly that quote is not from Backblaze.

e12e · on March 7, 2015

Yes, that's true. But parent asked why one would want to build a pod. It's an example of why one (not necessarily Backblaze) might want to build a pod.

kylec · on March 7, 2015

Backblaze did the math a few years ago and found that Amazon S3 was about 24x more expensive than building their own pods, for an estimated pod lifetime of about 3 years:

https://www.backblaze.com/blog/petabytes-on-a-budget-how-to-...

vitovito · on March 7, 2015

S3 is $0.03/GB/month.

Your own pod is $0.048/GB/once.

ashish01 · on March 7, 2015

To be fair 0.03 also includes operation costs like power and space, replacements. I _feel_ that a backup company will have an operational advantage having its own storage but its more complicated than comparing cost per GB.

mbrameld · on March 7, 2015

Out of curiosity, did you assume that they just hadn't thought of services like that and didn't run any numbers to see if what they were doing made sense?