ZFS performance degradation over time

garrettmoore · Jul 16, 2011

zdb is still running, I'll edit the rest of the results in later. It is taking forever.

Code:

# zdb tank
    version=13
    name='tank'
    state=0
    txg=2002316
    pool_guid=15631058209680076792
    hostid=1304739570
    hostname='leviathan'
    vdev_tree
        type='root'
        id=0
        guid=15631058209680076792
        children[0]
                type='raidz'
                id=0
                guid=14185904529334632668
                nparity=1
                metaslab_array=23
                metaslab_shift=36
                ashift=9
                asize=12002376286208
                is_log=0
                children[0]
                        type='disk'
                        id=0
                        guid=15290683616584576164
                        path='/dev/da0'
                        whole_disk=0
                        DTL=35
                children[1]
                        type='disk'
                        id=1
                        guid=8251901779817056534
                        path='/dev/da1'
                        whole_disk=0
                        DTL=34
                children[2]
                        type='disk'
                        id=2
                        guid=9617199221839498887
                        path='/dev/da2'
                        whole_disk=0
                        DTL=33
                children[3]
                        type='disk'
                        id=3
                        guid=11494989113403118025
                        path='/dev/da3'
                        whole_disk=0
                        DTL=94
                children[4]
                        type='disk'
                        id=4
                        guid=10053854906903946266
                        path='/dev/da4'
                        whole_disk=0
                        DTL=31
                children[5]
                        type='disk'
                        id=5
                        guid=2928242912600629893
                        path='/dev/da5'
                        whole_disk=0
                        DTL=87
                children[6]
                        type='disk'
                        id=6
                        guid=13488841482098780283
                        path='/dev/da6'
                        whole_disk=0
                        DTL=27
                children[7]
                        type='disk'
                        id=7
                        guid=668559818837929671
                        path='/dev/da7'
                        whole_disk=0
                        DTL=128
Uberblock

        magic = 0000000000bab10c
        version = 13
        txg = 2210544
        guid_sum = 9377515222552487103
        timestamp = 1310829373 UTC = Sat Jul 16 11:16:13 2011

Dataset mos [META], ID 0, cr_txg 4, 46.7M, 131 objects
Dataset tank [ZPL], ID 16, cr_txg 1, 4.43T, 78621 objects

I am actually considering replacing all 8 drives with different drives. I have had way too many issues with these and I'm actually kind of fed up. A bunch of them also have a really high load cycle count and will probably need to have the warranty used still. (I've had to warranty 4 out of 8 of them). I'm in Canada so I have limited drive selection - mainly ncix.com, newegg.ca, and canadacomputers.com. I can't find the exact drives you mention but I could get WD Caviar Blacks. Do they have any ridiculous settings which would be problematic for RAIDZ, or would they work well?

edit: Just got back from the store, bought 8 Western Digital Caviar Black (WD2002FAEX) 2000GB (2TB) SATA3 7200RPM 64M Cache (OEM). I'm going to start replacing all of my drives one at a time

I'm running 8.0-REL still. I've never done an upgrade of FreeBSD. Is it reliable/stable to upgrade both the OS, and a ZFS pool? Not losing data is the most important thing to me.

Sebulon · Jul 16, 2011

Hi,

the main issue with WD Green drives is that they are "Advanced format", aka 4k drives (anyone second me on that). You can confirm that by looking at your exact model and post it here or googling it yourself if that exact model is a 4k drive. If so, then your pool needs to be redone from scratch or zpool replaced one by one with regular 512b drives, like e.g. those black drives you mentioned. If the black drives are bigger in size, you will gain that capacity after the final drive is replaced. If you decide to redo the pool, you need to back everything up, crash the pool, then:

Code:

# gnop create -S 4096 da0
# zpool create tank raidz da0.nop da{1,2,3,4,5,6,7}
# zpool export tank
# gnop destroy da0.nop
# zpool import tank
# zdb tank

And look for "ashift=12", instead of the 9 you have now. Worth noting is that the ashift value is per vdev, so you can mix 4k and regular drives in the same pool with separate vdevs, but not in the same vdev, or your performance goes kaput.

PS. I am paranoid enough to have raidz2 on eight drives. I've actually had two drives giving up on me at the same time, so I was glad I made that choice. If you decide to redo your pool, maybe that is worth considering?

/Sebulon

garrettmoore · Jul 16, 2011

I'm so sick of the Green drives doing stupid stuff though like the load cycling (due to 'wdidle'), I figured I'd just cough up the money to replace them entirely.

I'm not going to recreate the pool from scratch because I have 5TB of used space and nowhere else to put that data. I'll just incrementally replace each drive with the 2TB ones.

I'm assuming I'll be OK with not having 2 simultaneous failures. My entire array is for backups of my workstation, and for "backups" of my media, so it's nothing irreplaceable. If I lose it I'll be sad but I can always rebuild it.

Also how long will 'zdb' with no arguments take? It's been running for at least 4 or 5 hours. Is there any point in letting it run, or should I just kill it?

Sebulon · Jul 17, 2011

Kill it. zdb traverses all of your data before it's done, so how long depends on how fast it goes and how much data is stored, which was quite alot.

And I wouldn't be too sad about not fiinding any Samsung drives, if I were you- they drop like flies

I've had to replace all of them within a year. The main thing nowadays, I think, is to have 5.4k rpm 3.5" drives, instead of 7.2k rpm 3.5". They are just as good at shuffling data at large block sizes, but they are generally alot cooler, so they last longer.

Another option is if you have enough interfaces and enough power connectors would be to connect all of them, create a tank2 and zfs send/recv between them. Perhaps less tedious than replacing one drive at a time and waiting for resilver to finish?

I wish you all the best!

/Sebulon

garrettmoore · Jul 21, 2011

Well, this is infuriating.

I finished rebuilding my array with 8x 2TB WD Black drives. My resilvering speed was increasing the fewer green drives I had in the array. From watching gstat, the final drive was resilvering at up to 60MB/s.

My read speed from the array locally seems to be good. Copying files from the pool to somewhere else in the pool gives me around 80MB/s. Copying to my UFS system drive (300GB Seagate Barracuda) gives me about 60MB/s, which is probably around the max speed of that drive.

Writing to the array over the network seems good - around 75MB/s, although it seems to fluctuate. Watching gstat is weird - no disk activity for 5 seconds, then all of the data is written at once.

Reading to the array over the network - AWFUL. I can't get more than 8MB/s or so. I get the exact same speeds via FTP and Samba.

I just removed samba33 and installed samba34 with AIO and some of the suggestions here. No difference.

I was using kern.minvnodes=25000 kern.maxvnodes=75000 and tried setting them to 1000/10000 respectively; no difference.

Prefetch was disabled; I enabled it and rebooted. No difference.

What the hell is going on?

edit: All the above was from my FreeBSD box to my Win7 workstation with a Velociraptor 300GB hdd. I just tried copying files to my Win7 HTPC with an Agility3 SSD, and I get 70MB/s+ over SMB. All the computers are plugged into the same 48 port switch and in the same room. Oh my god I think I'm going to have a brain aneuryism.

AndyUKG · Jul 21, 2011

garrettmoore said:
Reading to the array over the network - AWFUL. I can't get more than 8MB/s or so. I get the exact same speeds via FTP and Samba.

Is read performance better if you share a UFS file system via Samba or FTP? It would certainly be very odd if this was due to ZFS given the other performance test you mention in your last post...

thanks Andy.

garrettmoore · Jul 22, 2011

I get the exact same transfer speeds to my desktop from a UFS partition. It seems like it's some sort of networking issue.

Server is using a Realtek onboard nic on a Gigabyte motherboard:

Code:

re0: <RealTek 8168/8168B/8168C/8168CP/8168D/8168DP/8111B/8111C/8111CP/8111DP PCIe Gigabit Ethernet> port 0xce00-0xceff mem
 0xfddff000-0xfddfffff,0xfdde0000-0xfddeffff irq 18 at device 0.0 on pci3

Desktop is also using a Realtek onboard nic on a Gigabyte motherboard.

HTPC is using a Marvell-Yukon onboard nic.

Performance seems pretty stable now on the server.

edit: Just tried benchmarking with bonnie++ with all default settings. My gmirror RAID1 array (used for the OS, UFS, 2x300GB Seagate Barracuda):

Code:

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
leviathan       16G  1044  99 46138   6 18059   3  1962  93 40800   4 239.1   4
Latency              8094us     322ms    3969ms   93464us     230ms    4940ms
Version  1.96       ------Sequential Create------ --------Random Create--------
leviathan           -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 16907  23 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
Latency               292ms      36us      39us   95931us      63us      50us

ZFS:

Code:

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
leviathan       16G   141  97 182266  37 148702  29   459  99 483250  53 182.0   9
Latency             64380us    5597ms    5687ms   37178us     542ms     595ms
Version  1.96       ------Sequential Create------ --------Random Create--------
leviathan           -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 20104  61 29755  54 20827  94 29241  95 +++++ +++ 28826  91
Latency               184ms     252ms     353us   20234us      64us     174us

If I'm reading that right, 182MB/s write and 483MB/s read? Seems good to me! Also, my speeds are totally consistent - I'm not noticing any slowdown like I used to see. Hooray!!!

ZFS performance degradation over time

garrettmoore

Sebulon

garrettmoore

Sebulon

garrettmoore

AndyUKG

garrettmoore