ZFS write performance issues with WD20EARS

The newegg.com has currently very nice offer for Hitachi drive with 2TB storage, spinning at 7200 RPM only for $90 ($89.99 to be precise):
http://newegg.com/Product/Product.aspx?Item=N82E16822145369

More information here:
http://fudzilla.com/home/news/latest/newegg-selling-hitachi-deskstar-2tb-7200rpm-drive-for-8999-shipped
 
I' m not sure how you can recommend a 5-platter MONSTER drive when clean 3-platter drives now reached the same 2TB capacity.

So the newest drives will be:
3-platter WD EARS (4K sectors)
3-platter Samsung F4 EcoGreen (4K sectors)

That saves alot of power and should be much more reliable as well. It also makes the 5400rpm drives go faster than alot of lower density 7200rpm drives. With ZFS there's no real reason to go 7200rpm anymore i think. 5400rpm and sequential I/O is what HDDs do well; for random access you use an SSD configured as L2ARC, so your HDDs won't have to seek; 5400rpm or 7200rpm doesn't matter HDDs suck like floppy disks when they have to seek; so let's prevent that and use them for sequential I/O as much as possible.

The 4K sector issue is annoying though; i think the recordsize tuning or different number of disks in a vdev could solve any issues as posted before in this thread.
 
sub_mesa said:
I' m not sure how you can recommend a 5-platter MONSTER drive when clean 3-platter drives now reached the same 2TB capacity.

So the newest drives will be:
3-platter WD EARS (4K sectors)
3-platter Samsung F4 EcoGreen (4K sectors)

That saves alot of power and should be much more reliable as well. It also makes the 5400rpm drives go faster than alot of lower density 7200rpm drives. With ZFS there's no real reason to go 7200rpm anymore i think. 5400rpm and sequential I/O is what HDDs do well; for random access you use an SSD configured as L2ARC, so your HDDs won't have to seek; 5400rpm or 7200rpm doesn't matter HDDs suck like floppy disks when they have to seek; so let's prevent that and use them for sequential I/O as much as possible.

The 4K sector issue is annoying though; i think the recordsize tuning or different number of disks in a vdev could solve any issues as posted before in this thread.


I can recommend them because they work, and are the best 2tb drives out right now.


i agree it would be nice to have a 7200 rpm 3 platter hitachi or seagate drive but it doesn't change the fact that if you are building large storage arrays with cheap, commodity parts, the hitachi drive is the only choice which makes sense at 2tb

and as far as power usage goes, they don't use any more than any of the other 7200 rpm drives i've used.


infact, among 7200 rpm 2tb drives, they use less than the wd drives and are about even with the seagate raid drives (which cost about 70 dollars more)


and to say there is no reason to use 7200 rpm drives isn't backed up by the mountains of evidence or loads of antidotal reports online.

The fact of the matter is, if you were a frequent reader of the zfs mailing list, you'd see plenty of reports of "green" drives (54-5900rpm) being terrible for ZFS, and generally 7200 rpm drives are recommended.
 
wonslung said:
the hitachi drive is the only choice which makes sense at 2tb

(...)

The fact of the matter is, if you were a frequent reader of the zfs mailing list, you'd see plenty of reports of "green" drives (54-5900rpm) being terrible for ZFS, and generally 7200 rpm drives are recommended.

I just bought 2 x Seagate Barracuda LP 5900RPM 2TB (uses 4 x 500GB platter), I will share how they perform (comparing to Samsung F3 1TB drives that I currently have).

They should not bring any toubles, as they have 512B sectors.
 
There are a few people experimenting with ZFS, RAID, and these new, low power hard drives. I'll probably join the fray soon. Is there a standard and useful file system benchmark which we can all use to compare?
 
don't get me wrong, zfs will function fine on lower rpm drives provided the firmware doesn't like, but i'm just one of the people who believe there is not much gain in going with under 7200 rpm (the 5400-5900 rpm drives power difference isn't enough to warrant the loss in performance for me)



If you are doing write once/ read many with mostly sequential access, the 5400-5900 rpm drives will probably be fine, especially in mirrored vdevs

but whne you start talking about raidz and raidz2 you're going to notice a huge issue, especially if you have ANY random writes/reads


The problem comes in becuase of how ZFS and raidz works, and the low rotational speed.


Remember, raidz uses writes a single block across all drives so whenever you have random i/o it has to sync those blocks across ALL drives in the vdev.

Having multiple vdevs can speed this up some but: basically a raidz vdev is as slow as it's slowest drive. This is why 5400 rpm drives are just so bad for raidz and raidz2
 
wonslung, i disagree almost with everything you said. I also do follow several mailinglists, so feel free to refer to any concrete messages to make your point.

My point is:
You recommend a few generations old HDD that was the first 2TB iteration; the monster disk to avoid; 5 massive platters of 400GB each. Twice the power consumption of Green drives while being slower for sequential workloads than the newest generation Green drives, which now have 666GB platters. The new Samsung F4EG pulls around 140MB/s which is exceptional for a 5400rpm disk. And you could also argue that less mechanical parts and at less friction / heat generation generally improves reliability of a HDD as well.

The whole issue on '5400rpm being slow on ZFS' is not the meager 25% faster seek times and random I/O IOps, but rather that the WD EARS drives to use 4K sectors with 512-byte emulation. So no reason to avoid 5400rpm; newer HDDs are getting this too. The Samsung F4 7200rpm series for example, also gets 4K sectors. And thus would have the same issues in RAID-Z.

I believe the issue here is that 128KiB recordsize is being spread on all data disk members in the RAID-Z (minus parity disks). Thus 4-disks in RAID-Z would be 128 / (4 - 1) = ~43KiB = not aligned with 4K, while 128 / (3 - 1) = 32KiB would be aligned. I still have to test this theory, and i hope i can suggest some fixes. Also some people assume it's the 4K sectors if they get low ZFS performance, while i personally inspected some systems which actually had memory starvation and needed kmem tuning with only 2 or 4GB RAM.

Once i've completed my tests i'll post my findings in here.
Cheers.
 
I don't know what the standard usage case for zfs is, but I'm serving files over nfs, and my
gigabit ethernet card seems to reliably be the bottleneck when serving files.
 
There is some big confusion spread here....

There is nothing bad with the 5400 rpm speed as such. Recording density has increased recently sufficiently, that a 5400 rps drive is able to sustain 150 MBps read/write, while a 10-15,000 rpm drive is able to sustain say 300 MBps read/write.
However, this only applies to sequential operations, where the drive can prefetch data in the cache (data is read form the media at much higher speeds).

The big issue is the 'green' drives. These drives save energy, bu.. not using it :) How a drive does not use energy? It can do this in many ways. Spinning the platters at lower speed is only one of these. Other ways are to use slower and less power hungry electronics. SAS drives have more complex electronic assemblies and the same drive in SAS version consumes more, that with SATA. Of course, the performance is worse. Another, very big power drain in the drive is the head assembly movement. It is simple: if you want small reposition times, you have to waste more energy. One significant way to save energy in the 'green' drives is to make the head assembly move slower, not as aggressive so they have abysmal random seek times. There is almost nothing else a drive does. Maybe, a 'green' drive could shut off parts (motors, chips) but this reflects to the ability to respond quickly to requests.

In a single-user system, or in systems with predictable sequential load any drive will perform well. In a multi-tasking system, when the drive is asked for data scattered all over, things are much different. I still have some old Cheetah drives here. Compared to a green drive, they are better for random type load, but cannot compete on sequential taks, with their 'poor' 30 MBps ;)

So, in summary, while the improvements in recording density have diminished the difference between slower spinning and faster spinning drives in terms of sequential load, the higher end drives spend power on more critical for performance tasks, such as seeks and faster electronics. It seems recently, it is the electronics that is the limiting factor for drive performance. When we talk about 'slow' drives, we usually talk about the random access times and multitasking performance of the drive.
 
sub_mesa said:
Twice the power consumption of Green drives while being slower for sequential workloads than the newest generation Green drives, which now have 666GB platters.
FWIW, higher density media improves random seek times too.
 
FWIW, higher density media improves random seek times too.
I do not believe this is true. Yes the distance between two tracks is smaller, so if you
only need to shift N tracks, it will take less time, and with more bits in each track,
you will probably need to seek less often on some workloads. This still does not change the
distance (and amount of time it takes) that the head needs to move between two random tracks.
 
I have bought 2 x Seagate LP 5900RPM 2TB, put ZFS mirror on top of them, so far so good ;p

PS: They are 512/sector drives, they do not include any 4k/emulations at least.
 
vermaden said:
I have bought 2 x Seagate LP 5900RPM 2TB, put ZFS mirror on top of them, so far so good ;p

PS: They are 512/sector drives, they do not include any 4k/emulations at least.

What's your usual "workload" on those?
 
oliverh said:
What's your usual "workload" on those?

I would not lie saying it's zero to none, it's home (not only file) server, about 1.8TB storage space is enough for me, I have done several benchmarks with iozone/dd and it seems working ok under load, I haven't faced any 'sleep issues' with these drives as WD like to sleep after 8 seconds or so.

It's still 'uder development' hardware, so I may make some tests, it's also not CPU limited since it uses that motherboard with mobile Intel chipset GM965 and T8100 CPU: http://www.msi.com/index.php?func=downloaddetail&type=bios&maincat_no=388&prod_no=1267.

The FreeBSD's base system is on 8GB Kingston 133x CompactFlash card.

It currently has 1GB of RAM, but after I move all data to it, I will transfer 4GB RAM into it after selling older server parts.
 
sub_mesa said:
wonslung, i disagree almost with everything you said. I also do follow several mailinglists, so feel free to refer to any concrete messages to make your point.

My point is:
You recommend a few generations old HDD that was the first 2TB iteration; the monster disk to avoid; 5 massive platters of 400GB each. Twice the power consumption of Green drives while being slower for sequential workloads than the newest generation Green drives, which now have 666GB platters. The new Samsung F4EG pulls around 140MB/s which is exceptional for a 5400rpm disk. And you could also argue that less mechanical parts and at less friction / heat generation generally improves reliability of a HDD as well.

The whole issue on '5400rpm being slow on ZFS' is not the meager 25% faster seek times and random I/O IOps, but rather that the WD EARS drives to use 4K sectors with 512-byte emulation. So no reason to avoid 5400rpm; newer HDDs are getting this too. The Samsung F4 7200rpm series for example, also gets 4K sectors. And thus would have the same issues in RAID-Z.

I believe the issue here is that 128KiB recordsize is being spread on all data disk members in the RAID-Z (minus parity disks). Thus 4-disks in RAID-Z would be 128 / (4 - 1) = ~43KiB = not aligned with 4K, while 128 / (3 - 1) = 32KiB would be aligned. I still have to test this theory, and i hope i can suggest some fixes. Also some people assume it's the 4K sectors if they get low ZFS performance, while i personally inspected some systems which actually had memory starvation and needed kmem tuning with only 2 or 4GB RAM.

Once i've completed my tests i'll post my findings in here.
Cheers.

You can disagree all you want. I'm well aware of the 4k issue. This is the number one issue that those drives aren't good for ZFS (and more acurately, raidz1,2,3)

Raidz uses a variable block size, which is one of the main reasons the 4k drives suffer. I'm on all the same mailing lists, but i also have tested most of the major drives. 5400 RPM drives are CONSIDERABLY slower in raidz configurations than 7200 rpm drives, due to the same IOP issues that make raidz a poor choise for random i/o in the first place. (it all goes back to how raidz works)

Now, for sequential access, this is much less of a problem, but the simple facts are that right now, the best 2tb drive for ZFS is the hitachi 2TB hands down, regardless of what you seem to think.

I do believe this will change, and could change overnight if WD would release a firmware which wasn't flawed.
 
noz said:
I've read through the thread and apparently the EARS and even some of the EADS drives are problematic. Are there any drives by WD that are actually GOOD for ZFS (green/blue/black or model number)? Or, is the only solution to use non-WD 4K drives?

I'm eyeing the Samsung Spinpoint F4: http://www.newegg.com/Product/Product.aspx?Item=N82E16822152245



There are some older, non 4k drives which wd makes which are fine, and some more expensive raid drives which wd makes which work well as well. Right now the best drives for ZFS in a 2TB size are the hitachi drives. The best drives in 1tb from my testing is a tie between the samsung spinpoint f3's and the seagate 7200.12's.

Currently we aren't using any 1.5 tb drives, but in the past we tried several 5400 and 5900 rpm drives and found them very slow for raidz arrays (not too bad for mirrored arrays which have several vdevs) Though if you add a good ssd SLOG device, it can help dramatically.
 
danbi said:
There is some big confusion spread here....

There is nothing bad with the 5400 rpm speed as such. Recording density has increased recently sufficiently, that a 5400 rps drive is able to sustain 150 MBps read/write, while a 10-15,000 rpm drive is able to sustain say 300 MBps read/write.
However, this only applies to sequential operations, where the drive can prefetch data in the cache (data is read form the media at much higher speeds).

The big issue is the 'green' drives. These drives save energy, bu.. not using it :) How a drive does not use energy? It can do this in many ways. Spinning the platters at lower speed is only one of these. Other ways are to use slower and less power hungry electronics. SAS drives have more complex electronic assemblies and the same drive in SAS version consumes more, that with SATA. Of course, the performance is worse. Another, very big power drain in the drive is the head assembly movement. It is simple: if you want small reposition times, you have to waste more energy. One significant way to save energy in the 'green' drives is to make the head assembly move slower, not as aggressive so they have abysmal random seek times. There is almost nothing else a drive does. Maybe, a 'green' drive could shut off parts (motors, chips) but this reflects to the ability to respond quickly to requests.

In a single-user system, or in systems with predictable sequential load any drive will perform well. In a multi-tasking system, when the drive is asked for data scattered all over, things are much different. I still have some old Cheetah drives here. Compared to a green drive, they are better for random type load, but cannot compete on sequential taks, with their 'poor' 30 MBps ;)

So, in summary, while the improvements in recording density have diminished the difference between slower spinning and faster spinning drives in terms of sequential load, the higher end drives spend power on more critical for performance tasks, such as seeks and faster electronics. It seems recently, it is the electronics that is the limiting factor for drive performance. When we talk about 'slow' drives, we usually talk about the random access times and multitasking performance of the drive.




EXACTLY. Too many people here don't understand that the so called "green" drives aren't green at all in raid arrays. They sometimes actually use MORE energy due to poor seek times (they have to spin longer and more often to read the same amount of data) The fact of the matter is, they work well enough for what they are designed for, but for raid arrays, they tend to not offer any considerable amount of power saving. If you REALLY want to save power, use 2.5 inch drives.

Let me be clear, the 5400 rpm drives aren't bad drives, they just aren't great for raidz. If you are building mirrored vdevs, or just don't care about random i/o at all, then go ahead and use them, but don't use them thinking they are going to save you a lot of energy on a ZFS raidz array...One of the biggest problems with them is they are set to "die" after 1-2 seconds of use, so if you use them in raid arrays, they will wear out FAST AS HELL due to all the powerup/power down cycles.

But hey, if you really don't want to take my word for it, just ask someone else who builds raid systems on a regular basis. I don't know ANYONE building raid systems who recommend using green drives.
 
wonslung said:
One of the biggest problems with them is they are set to "die" after 1-2 seconds of use
Are you sure this holds true for all green drives though? As vermaden reports, Seagate LP drives (for one) don't appear to be sleeping...
 
wonslung said:
There are some older, non 4k drives which wd makes which are fine, and some more expensive raid drives which wd makes which work well as well. Right now the best drives for ZFS in a 2TB size are the hitachi drives. The best drives in 1tb from my testing is a tie between the samsung spinpoint f3's and the seagate 7200.12's.

I actually bought two of those F4's I mentioned to replace the green EARS drives I'm using at the moment. I'll know in a few days whether or not they're good.
 
noz said:
I actually bought two of those F4's I mentioned to replace the green EARS drives I'm using at the moment. I'll know in a few days whether or not they're good.
Looking forward to this. :)


vermaden said:
In short, very low power consumption and noise, with 'typical 5400 RPM' performance.
I also noticed this remark:

silentpcreview said:
The F4 seems to be the just the ticket for users who want quiet, high efficiency drives but are paranoid about the frequent head-parking endemic to the Caviar Greens.
 
It seems that this little patch can 'fix' issues with 4k WD Green drives:
http://lists.freebsd.org/pipermail/freebsd-fs/2010-October/009706.html


Code:
[B]/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c[/B]
[color="Red"]-*ashift = highbit(MAX(pp->sectorsize, SPA_MINBLOCKSIZE)) - 1;[/color]
[color="Green"]+*ashift = highbit(MAX(MAX(4096, pp->sectorsize), SPA_MINBLOCKSIZE)) - 1;
[/color]

I'm using this for 3 months with 20 2TB 4kb sector WDC disks (in 2
raidz2 arrays of 10) without any issues. Writes go at 300MB/s.
 
I tried the suggested patch, but unfortunately it killed my ZFS Pool:

Code:
pool: tank
state: UNAVAIL
scrub: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	tank        UNAVAIL      0     0     0  insufficient replicas
	  raidz1    UNAVAIL      0     0     0  corrupted data
	    ad6     ONLINE       0     0     0
	    ad10    ONLINE       0     0     0
	    ad12    ONLINE       0     0     0
 
Back
Top