Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Why SSDs are worth the money (boingboing.net)
104 points by peteretep on June 17, 2011 | hide | past | favorite | 72 comments


Something to consider before going the SSD route: failure rates are high right now.

The Intel X-25 series seem to be doing OK, but the Vertex2 and Vertex3 (brand new SandForce 2k-based drives) have really terrible failure rates. I've heard mixed things about the Crucial drives.

I am sure you guys have read Jeff Atwood's post[1] about how every SSD they've put into machines in the last year has failed. I shared that around with a group of friends scattered around Chicago and the bay area - the friend in Chicago has a company with 30 employees. They have put SSDs in 12 of the developer machines, by the end of the year 10 had failed (Vertex2 drives).

I think SSDs have a lot to offer, the performance is there for the right use-cases (which is almost every disk I/O scenario save for a few), but for a lot of folks the idea of having drives fail regularly is a suicide-inducing though (current company included).

I'm sitting on my hands for an SSD not because I don't think they are awesome, too expensive or not big enough... I'm waiting because the failure rate seems to be painfully high in these things. I want some more time in the oven before I start sticking them in my work machines.

There also seem to be other non-failure related issues as seen with the Vertex3 launch that everyone jumped on; Lookup "Windows BSOD Vertex3"[2] for a long list of issues. At least enough to make you take pause in buying one.

I have a 11 year old Fujitsu drive that is still running, an 8 year old Seagate 15k SCSI drive that is still chugging and the two SATA's in my desktop now are both 3 years old (no RAID)... I expect 5 years out of a drive at this point, turning the speed up to 11 at the expense of digging back into my workstations or servers to replace busted drives and rebuild RAID arrays... uggggg... at least for me that sounds super painful.

The only SSDs I would consider at this point if for some reason I had to get one are the Intel series, namely the 500 series. The X-25's (as mentioned) are still going strong with low failure rates and there is nothing to suggest the newer releases are any different. I'd rather see some 5-year results from those things before pulling the trigger though, which is why I haven't yet. I also haven't needed the insane speed.

[1] http://www.codinghorror.com/blog/2011/05/the-hot-crazy-solid...

[2] http://www.google.com/search?sourceid=chrome&ie=UTF-8...


SSDs are meant for boot and application drives where lots of random I/O happens. You wouldn't normally use them in RAID except to RAID0 them for better performance and more space.

Instead of anecdotes about failure rates, it's best to look at data. I don't know of any large-scale studies of SSD reliability, but http://forums.storagereview.com/index.php/topic/29329-ssd-fa... looks at return rates, which correlates with failure rates. Intel's return rate is 0.6%. The other SSDs are 2-3%. Hard drives vary from 1-6% for 1TB and 3-10% for 2TB drives. SSDs are more expensive than hard drives, so their return rates should be higher for the same failure rate (people are more likely to return a broken $300 SSD than a $50 hard drive).

The reason many people like SSDs is because they widen the worst bottleneck in modern computers (random disk I/O) by a factor of 10-100. The difference is very noticeable. I highly recommend you try out an SSD as your boot/application drive.


> SSDs are meant for boot and application drives where lots of random I/O happens. You wouldn't normally use them in RAID except to RAID0 them for better performance and more space.

Most folks who really care about massive I/O performance are OLTP implementers (i.e. big Oracle/Postgres/MySQL installations). Those people also really care about reliability, so as long as the failure rate of SSDs is non-zero, even with screaming random IOPS capability, they'd be fools not to use some RAID 1 or RAID 5 implementation.


I'm not entirely certain return rates are a good indicator of failure rates.

First, you only return a hard drive if it's under warranty so that gives you a 1-2 years window.

Second, if you care about security, you don't really return hard drives. Once bought, unless you are certain everything is encrypted on them, the drives never leave the premises.


You can usually get a deal with hw/drive vendors to train and certify your in house staff to declare drives dead and then field destroy with replacement credit.


Intel's published reliability data shows a ~0.4% annual failure rate of the X-25M: http://images.anandtech.com/reviews/storage/Intel/320/reliab...

They claim greater reliability for the new 320 series which is based on the same controller hardware, and have upped the warranty of that from 3 to 5 years: http://newsroom.intel.com/community/intel_newsroom/blog/2011...

There definitely has been a lot of noise about high failure rates in OCZ's Vertex series, but it doesn't seem to be anything inherent in SSDs or even the SandForce chipset. OCZ shipped SSDs with pre-release firmware, uses a number of different flash chip vendors, and may not test sufficiently prior to release.

I'd say buying a reliable SSD is more difficult than buying a HDD since there are quite a few variables, but there is sufficient information available these days to make an informed choice.

Performance is really worth it though! I wish there wasn't such a focus on boot times because I only have to reboot my MacBook Pro for certain OS updates or when it crashes, but high performance random IO makes a huge difference in day to day use. Applications launch almost instantly, I can do a find+grep on nested directories containing thousands of files in a fraction of a second, and compile times are pretty much only limited by CPU speed.


Anybody have some detailed technical insight into why the failure rate has been so high? (i.e., I can guess as well as the next guy, I'm asking for knowledge.) One of their initial promises was reliability due to not having moving parts, and, well, that sounds pretty reasonable as far as marketing copy goes. Why isn't it true?


jerf, the oddest part is that if you read through the de-duplication and redundancy designs of modern SSD controllers (like the SandForce SF-2k series) not only do the drives do automatic de-duplication and compression on the fly, the scatter the bits evenly across the memory arrays (in parallel) to control even wear-and-tear of R/W cycles on the memory.

Then they typically write the bits to multiple locations or an entire section of the drive is maintained for parity information, sort of like an on-disk RAID5 setup where the individual blocks of NAND can be though of as "disks". For example, a 120GB drive typically has something like 150-160GB of storage space on the disk. This is not unlike what spindle disks do with a large section of disk not allocated to the format operation and kept behind for failed sectors that are re-allocated by the controller.

So all that work and the failure rates are still too high for comfort. It's really surprising to me and suggests that all of this needs more baking time.

I have no idea what the failure rates are in lower-end drives like the Kingstons and other drives that are likely rebranded garbage. I imagine it's not any better :)


the interesting this is that Macbook Air are all SSDs - so unless OSX is doing something terribly creative, why arent we seeing the same kind of return/failure rates on Macbook Airs ?


There's a tradeoff between speed and reliability, and the Samsung SSDs that I believe are used in the Macbook Airs lean towards reliabilty at the expense of having non-random IO no better than a normal hard drive. People who buy SSDs by themselves, on the other hand, are often just going down a list of offerings comparing IO speed to price and not having any good way to look at the reliability.


I haven't seen that they aren't.

That, and you never own a MacBook Air more than a couple years, because Steve Says To Buy A New One Then.


chuckle I was expecting the indignant Apple downvotes =)


They fail (or rather you hear about them failing) because they are new. I have graveyards filled with spinny disks that failed, but it's hardly newsworthy because everybody knows drives fail. Has anybody put half HDs and half SSDs in a data center of any size? Anecdotes and data and all that.


It is a physical quality of flash memory that each flash cell has only a limited number of writes it can handle before it fails. If you want to know the dirty details, what happens is that electrons eventually get trapped in the dielectric of a flash cell and ruin everything. This cannot be fixed unless someone invents a new electron capture proof dielectric.

Flash memories use all kinds of electronics to ameliorate this problem, but in the end most they can do is try to equalize the number of writes each cell experiences, which results in the drive working at close to its full capacity for as long as possible but then failing very quickly afterwards.


If this was the failure mode experienced by most drives that fail, though, it wouldn't happen for years and even after you were no longer able to write to the drive you would still be able to read from it. This might be how the more reliable Intel or Samsung drives end up failing, but all evidence is that the failure modes of the highest performance but less reliable drives seem to be more interesting than that.


Why do you think it would not happen for years? As I mentioned in a previous post there is a trade off between price memory size and failure rate so some of the higher density memories will have much lower life than what you may be used to.

And no you would not be able to read from it. When a flash cell wears off the ability to read it degrades or disappears. Depending on the memory you still may be able to recover the data using ECC or the like, but this ability tends to disappear as large numbers of cells start dying.


Mind you, there have been people working on clearing up trapped charges. I doubt there's been much usable progress (IIRC, the methods involve some kind of irradiation of the dielectric, which is difficult to perform in situ.)


I have had the pleasure and challenge of learning a lot about flash memory and I must add that the high failure rates are not aberrations, they are more or less expected. When people talk about SSDs they assume they are like hard drives. I.e., they should last several years, unless there is some kind of defect on them.

Well, that is not true. Any flash memory cell is basically limited as to the amount of times it can be written to. This is the result of certain physical properties of the materials used, and cannot be avoided.

This means that any SSD has a limited number of writes on it before it fails. So SSDs will fail and ones that are written to a lot will fail faster. If you use an SSD for virtual memory, for example, you can really speed up your computer (if it does use a lot of virtual memory) but your SSD will fail quickly.

There is generally a three way trade off between cost, storage size and failure rate. So every company would probably strike a slightly different trade off, so it is not surprising one company's drives to last longer than another's. Furthermore, within a single type of drive different times of failure would mostly depend on how much the drive is written to.

In general we should not accuse companies who make drives that fail of making shoddy merchandise (unless they fail way too soon), this is just the nature of flash memory.


No. The firmware on SSDs utilises wear leveling (they also set aside a certain percentage of flash cells for housekeeping), so for the vast majority of people for whom SSDs have failed, the reason is NOT that the flash cells have died. From what I read, the real reason seems to be software bugs in the firmware.


I can't find any refernces on this. Do you have a link you can share?


Fargren,

Actually all SSDs have to do this. The failure rate on NAND memory is so high at the higher densities that are being used now, that it would be impossible to ship a drive that didn't do it.

You can learn more about it here[1] - while that is a review of the Vertex 3 (SandForce SF-2500 based SSD) the approach (de-duplication, write load-balancing, etc.) is same for all the controllers out there.

There is another awesome article Anand did explaining all the features of some of these higher end controllers and how much work they actually do, but for the life of me I cannot find it. I'll follow up if I do.

[1] http://www.anandtech.com/show/4159/ocz-vertex-3-pro-preview-...



Damn close, but not quite. That article Anand put out during the height of TRIM-gate after discovering the permanent speed-degradation designed into the early drives as they filled up (fixed in all newer controller designs FWIW).

The one I had in my brain was one done after that, it was focused on the SandForce SF-2k series (might have been the 2500) where Anand goes through all of the features in the controller pertaining to data safety. The way it is explained and marketed makes it sound like an SSD will fail once every 15 years, but in reality their failure is horrible... I have no idea why the massive discrepancy. Either these NAND chips are MUCH more volatile than we are being told and Anand's numbers are based on (he has calculations to determine failure rates based on load balancing) or the data parity/recovery systems in these controllers are so insufficient that they underestimate the actual data loss occurring on these things and thus underperform.

Thanks for trying to dig it up though, that article along is certainly worth a read for anyone interested in this topic.


I am very well aware that a flash memory performs wear leveling. However this does not prevent the flash cells from failing, it merely ensures they all fail as late as possible. Wear leveling does not prevent wear it only ensures that wear is generally even between cells.


I'll take reliability over almost anything. I work out on the road 200+ days of 365. A drive failing on me while out in the wilds would be a ~~~~ing catastrophe. I can picture it now, Sunday night, after a long flight, code modified on the flight, not in a remote GIT update in a couple of days... failure in such a scenario would mean i'm out of job. No thanks, I'll take the slow HDD


HDDs do fail, too. A lot. If drives failing for you is a catastrophe, you should consider taking backup media with you.


This may be anecdotal, but I've never had a HDD fail and neither do I know anyone who had a HDD fail.


And how many drives do you have, and how long have you had them? In the last 15 years I've probably had 20 HDD's fail out of 150 total. There was a particularly bad period where I was dealing with about 15 Dell laptops with 1.8" HDD's. Many of those drives failed, some repeatedly. Bad enough that I was buying 5 drives at a time until we could replace the computers.

HDD's from every manufacturer can and do fail at fairly high rates.


If I'm counting correctly I've had 11 drives, and each was used on average 3-5 years I think. I've helped a lot of other people with their computers being the nerd kid, and the problem has not been a failed drive yet (and it has been other failed hardware, like RAM and video cards and sound cards).

If you've had about 20 drives fail, and of 15 Dell laptops a lot of drives failed, some repeatedly, don't you think that your bad experience was primarily caused by that type of drive?


Hard drive failure is probabalistic. You're just one data point. There are plenty of people who have seen a moderate number of drive failures, and some people who have so many drives fail that you have to check that they aren't tapping them with a magnet for good luck or something.

Not believing in hard drive failure because you haven't had any, despite the rest of the word telling you it's a fact of life... well, it says something interesting about human psychology, is all.


Sure I believe in hard drive failures. I'm not saying that hard drives don't fail, I'm just providing my experience with hard drive failure. Where did you get the idea that I don't believe in hard drive failure?

I realize that I'm just one data point. What I don't get is why people downvote me for providing this data point, whereas people don't seem to have a problem with blanket statements like "HDDs do fail, too. A lot." that provide zero data points.

Moreover we are talking about SSDs vs HDDs here. "HDDs do fail, too. A lot." may be true, but if according to Jeff Atwood, a SSD proponent, on the order of half of SSDs fail in the first year then HDDs may fail a lot, but SSDs fail a whole lot more.


At my last job, they got me an SSD. About a month later, I started coding a new feature for their product. It was a few hours' worth of work, and they were using CVS, and didn't really have a good branching model, so I didn't do any work-in-progress checkins.

While I was coding the feature, the SSD failed. This manifested itself via all sorts of strange compiler errors, and so I rebooted. After that, Windows didn't show the SSD anymore, and my code was never seen again.

So, while the SSD did save me some time, I effectively lost it all when it failed, due in part to my stupidity, but also due to the fact that you need to understand that if you buy an SSD, you MUST plan for failure. This is not an option, this is "yes-I-will-wear-a-helmet-while-riding-a-motorcycle-at-70-mph".

Contrast that with my Dell laptop. I bought it in ~2007, and I've never had a problem with it (except I've had to replace the dim screen a couple times).


If you buy storage media, you must plan for failure. Having worked in data centres, I've seen spinning disk drives DOA, die after a week, die in clusters (sequential serials, different machines), etc.

SSDs aren't 100% reliable, but neither are HDs. You lost your SSD at an inopportune time, and that sucks, but it could just as easily have been a spinning disk that died, or it could have died after you finished a nightly system backup so you had no data loss at all. It's pure coincidence.


> I'll take reliability over almost anything.

Then measure reliability by IO operation instead of wall clock time.

I had a catastrophic failure in my MBP HDD after a couple years of use so heavy that if it were an HDD, I simply wouldn't've been able to have done all that work.


I've had a 256GB Crucial SSD running for a year without issue. I wrote about it at http://littlebitofcode.com/2011/03/02/a-year-with-a-ssd. I'm not concerned by failure rates because I have an excellent backup strategy. I've had dozens of spinning disks fail unexpectedly, they can and will fail as spectacularely and unexpectedly as an SSD could.


After 1 year Crystal Reports is giving you a 67% health rating? That seems... not great. I don't know how full the drive is, but I imagine you'll be look at a failed drive by 1.5year mark. For $630 I'm having a hard time getting excited about it.

Totally agree that SSDs will fail as big and randomly as spindle disks, but for 10x the price (or whatever factor it is) is it wrong to expect better reliability?

If you need the speed and have the money, more power to you, especially if you have a good backup strategy and losing a disk is no biggie... I just don't think many people realize HOW unstable these things are. That is all I'm pointing at.


The warranty is five years. I won't lose any data, and in two years I'll get a new drive from the manufacturer. I don't expect any reliability, hard drives aren't reliable, this is nothing new.

The Crystal Disk Reports is based on the flash wear patterns, so if it's at 67% it is likely to last 2 more years not six more months. The way I calculated, the drive saved me enough time in the first year to pay for itself.


It cannot be stressed enough that hard drives (of any type) are not reliable. If your storage strategy is predicated on the assumed failure rates of your hard drives then your in for a nasty shock down the road. And as stated proper warranties eliminate most of the short term risk associated with failures.


I've lost 2 Crucial drives in 2 years.


Amazing how many people are commenting about how SSDs don't help much. I am stunned by how long it takes to do anything on any of my computers that don't have SSDs now. IOPS matter to me.


It sounds like a bunch of people who havent experienced the difference first hand, trying to convince themselves that they dont care.


I have a laptop with SSD at work and an almost identical laptop at home, except it has a hard drive. There are unquestionably a lot of operations that the SSD makes much faster. But in normal day to day use, I can barely tell the difference. Booting is fast as hell, but how often do I reboot? I don't compile much on either machine, but neither do most normal users. I don't think the average laptop user is i/o bound most of the time (unless they're swapping, and then the solution is more RAM not an SSD).

To me, the biggest benefit SSD gives is battery life. I like that it's quieter too. But neither of those is worth the price premium. Which is why I haven't stepped up for my home machine even though I get to compare it on a daily basis.


SSDs aren't going to help you compile faster anyway... Other than network I/O, disk I/O is the slowest operation on a computer, speeding that up does speed up most operations for most people. Everyone is I/O bound all of the time.


For many, it's simply that the increased initial load time isn't worth our time/money.

The biggest bang for the buck comes in the form of added RAM, which augments the kernel's buffer cache on any modern OS. Many of us rarely shut down our hosts (perhaps to update the kernel, but that's about it) so we rarely have a cold buffer cache.

For the average desktop/laptop user, unless your memory usage patterns involve allocating massive heap sizes for fun and profit, or grepping through every file in your filesystem, it's extremely likely that any application you launched or document you opened in the past few days is already in the kernel's buffer cache. And the buffer cache is much faster than any SSD.


I do care and I have no doubt that SSDs are faster and better. I too am holding off until their reliability improves. But I just tested starting my iMac with a 7200 RPM drive in it and it took 31 seconds from the startup chime until I was able to fully use it. So I can also understand people's hesitation.

And besides, if I booted over to Windows 7, it'd take about 4 minutes until this machine was usable, so it's all a matter of perspective.

EDIT: to be fair I just measured Windows 7 and it took 1:30 before the machine was usable.


The boot time is just a convenient "wow" factor, the real benefit comes from the thousands of little everyday tasks becoming CPU-bound instead of IO-bound.

A 7200 RPM drive is absolutely not comparable.


I recently replaced the 5400rpm drive in my macbook with a larger capacity 7200rpm drive, and the difference in performance was quite noticeable. Of course no where near the jump from 7200 to SSD, but still a decent jump. Downside is that the new drive is loud enough to hear, whereas the original drive was dead silent


Yeah, I did the same before jumping to 100% SSD. Also, heat/power consumption.

If you need the storage of a spinner and can't afford a big enough SSD, running a small SSD for the system/apps with a 7200 RPM as a storage drive in the place of the optical drive works pretty well. Look for the cheaper versions of the OptiBay (which is a bit of a ripoff).


I had one, felt no meaningful difference to my every day tasks and returned it. If I have tasks where the HDD is the bottleneck, I simply move them into a ramdisk or I launch them and make a tea or take a walk. I rather have loads of storage space for the same price.


Since I posted this link last year: http://news.ycombinator.com/item?id=1584998 .... I've put a few DBs on SSD (Postgresql and Filemaker Pro). The real-world performance on extreme concurrent usage was been insane, ie something like a order of magnitude faster. For instance, a query on FileMaker Pro that would take up to 10 minutes on a heavily used instance (100 people accessing at one time) now takes seconds.


Dropbox is an awesome way to offset the failure risk. I keep all my development projects on Dropbox, so the moment I write to disk, it's syncing with at least one other running workstation plus the Dropbox remote storage.

(The brains of my homedir are symlinked to Dropbox as well: all dot files including shell history)


I do the same thing. Won't save you when there is no internet, though.

Then again, I lost count of how many times Dropbox saved my butt, so that should be a no-brainer regardless of your storage technology.


A very expensive way, though? My notebooks HD is 250gb, almost filled.


Sorry to see this and respond late, but yeah, mine is pretty big too. Most of that doesn't change hour-to-hour in my use case luckily, so incremental backup to disk is good enough for the bulk. If you work mostly with code or other text, you can probably find a few folders of frequent business that fit on a small Dropbox plan and just sync those.


If your servers were doing as much write as reads, then SSDs might not be as fast. I had (raid10) 4 intel 80gb ssd vs (raid10) 4 15k rpm scsi drives, and the scsi drives were faster with heavy innodb writes. You might wanna do some stress benchmarking before going into production if you have heavy writes. Now I'm sure the FusionIO line of drives will probably be faster than SCSI's, though I haven't tested them.


High failure rate, high price, low capability, but great speeds. So the ideal setup is SSD for your OS and programs and a spinning disk for your documents?


SSD for your OS, apps and documents where speed matters backed up to your spinner.


Yep that's how I do it. I just have /home mounted on another drive in my linux environment and under Windows I changed the location of my My Documents, My Music, Downloads, etc folders to a traditional drive. You could move your entire user profile over, but some apps do a lot of IO in your AppData folder so you may want to keep that on SSD.


here's something I'm curious to know - how often do you reboot your computer? I've got two I use daily, one has an uptime of 47 days, the other (a little macbook that spends all day on the road with me) about 32 days. They're just never off. Do people still turn their machines off?


My Macbook comes out of sleep mode almost instantaneously, so I come out of it with uptimes that are similar to yours. My PC desktop, meanwhile, gets rebooted when Windows Update demands it, so it varies.

There are workplaces where people might be rebooting daily, I don't know. I just know that at home or at work, I don't.


Running 3 raided SSD drives on my home workstation . . . seriously considering upgrading to one or more raided hyperOs DDR2 Ram Disk Drives. The write latency is nasty on compile time.


Give NILFS2 [1] a try. It's log structured so the writes do not cause overwrites of filesystem metadata, directories nor files.

I have a very old, very slow SSD extracted from Asus EeePC, in an auxilliary computer; NILFS2 makes it crazy fast write-wise when compared to XFS and ext4.

[1] http://www.nilfs.org/en/


Why dont you compile on a ram disk then? Its not like you need to persist the intermediate files...


Curious about this, is it RAID-0 for speed or some other variant?


Raid 0, Although I've heard something or other about a larger performance gain being realized by other raid configurations.


Are you sure writes are your bottleneck there?


Yes. Large code base, large number of intemediary files and bin places. Compiling itself is an issue of course also but i'm already on a fairly high end i7 chip with tri channel ddr. More cpus or faster cpus would be nice but I'll take a faster hard drive over processor speed any day.


Everybody keeps talking of boot drives - and the guy in the video boots his laptop. Do you really boot often enough to make a difference?

I think I see my machine boot about 10 times when I reinstall Windows (used to be yearly 10 years ago, much less frequent now), then basically never, modulo city-wide power failures and stealthy Windows Update ninjas in the night.

Is it something peculiar to laptops, e.g. when you forget to charge them and they shutdown completely instead of sleeping?


My Macbook Air screams. I bought a macbook pro about 1 year ago and my brand new MBA (top model) is noticeably faster. Not only boot and shutdown times but overall snappiness of the machine. The main reason for this is the SSD inside. I love it and will be installing one in my PC desktop soon enough. I can live with the high failure rate although I haven't run into any problems myself yet.


I have seen the light and been converted to the Church of SSD. The expense /is/ worth it. I routinely run Windows 7 and Windows Server 2008 (running Windows Deployment Server to reimage computers at different physical locations) on my Macbook Pro via VMWare Fusion at all times (in addition to all the other random apps I have open on the Mac). I haven't noticed /any/ performance penalties day-in-day out. Compare this to my previous Macbook Pro from three years ago, which begged for mercy booting Windows XP. Sure, I have more RAM and an i7 now, but it was always disk thrashing that killed me.... that's not a problem any more.


I have only good things to say about the Corsair SSD's. I'm all SSD now, and they're very fast with very good reviews for a very decent price.


I recently bought from OWC to be shipped internationally. It took 32 days and arrived a DOA drive and had consistent kernel panics (even when doing an initial clone). YMMV but if you're wanting to do an SSD upgrade, get ready to take into account failures.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: