seeing really poor zfs read/writes

I'm not really sure where to start with this, and I am really big FreeBSD n00b, so please let me know if i missed any applicable information or there is something I can add that will help troubleshoot this issue.

Machine has 4GB RAM, 4x1 TB drives for the zpool, and a USB key for booting the OS.

Thanks in advance.

Basically I'm seeing when I run $ zpool iostat 1 on a fully updated zpool 4 disk sata raidz peak writes and reads of ~5M.

Here is my zpool stats:

Code:
pine# zpool status
  pool: tank
 state: ONLINE
 scan: scrub canceled on Sun Apr  3 08:19:57 2011
config:

	NAME        STATE     READ WRITE CKSUM
	tank        ONLINE       0     0     0
	  raidz1-0  ONLINE       0     0     0
	    ad4p1   ONLINE       0     0     0
	    ad5p1   ONLINE       0     0     0
	    ad6p1   ONLINE       0     0     0
	    ad7p1   ONLINE       0     0     0

errors: No known data errors

Here is my loader.conf:
Code:
kern.cam.boot_delay=10000
zfs_load="YES"
ahci_load="YES"

I am using this command to check read / write speeds.
# dd if=/dev/zero of=foo bs=2M count=10000 ; dd if=foo of=/dev/null bs=2M

I'm pretty sure that this isn't a hardware issue as I was getting better performance when I was using solaris with the same setup.
 
You have ahci.ko loaded at boot, but your drives are adX instead of adaX which means that you do not have AHCI enabled. What hardware do you have there?

Code:
# dd if=/dev/zero of=foo bs=2M count=10000
Check gstat -I 5000000 to see how much the drives are 'loaded' during that 'test'.

Also show here output of that command: # dmesg | grep -E "^ad".
 
Hi,

I´m wondering about what you mean by "fully updated"?
Could you please show the output of:
Code:
# zpool upgrade

One other thing is that you have the drives named "ad4p1" like they were all gpart partitioned. Why?
I would suggest destroying the partition data and creating the pool with only the raw devices instead.
Code:
# gpart delete -i 1 ad4
# gpart destroy ad4
# gpart show ad4
no such geom: ad4
(repeat for each drive)
# zpool create tank raidz ad{4,5,6,7}
# zpool status
tank     ONLINE 0 0 0
 raidz1  ONLINE 0 0 0
  ad4    ONLINE 0 0 0
  ad5    ONLINE 0 0 0
  ad6    ONLINE 0 0 0
  ad7    ONLINE 0 0 0

@vermaden
I´ve had
Code:
ahci_load="YES"
in loader.conf, but I had to recompile the kernel for making my drives show up like adaX-devices.

/Sebulon
 
Oh, and are you absolutely sure you are reading/writing against the pool, cause 5MB/s smells a lot like USB?;)

/Sebulon
 
Sebulon said:
@vermaden
I´ve had ahci_load="YES" in loader.conf, but I had to recompile the kernel for making my drives show up like adaX-devices.

I only need to load ahci.ko at boot to have adaX devices, what hardware do you have?
 
@vermaden
I have
  • 3x Lycom <SiI 3124 SATA300 controller> PCI-X
  • onboard <Intel ICH9 SATA300 controller>

Only the ones on the Intel controller was automatically recognized.

/Sebulon
 
Sebulon said:
Hi,

I´m wondering about what you mean by "fully updated"?
Could you please show the output of:
Code:
# zpool upgrade


/Sebulon
Code:
pine# zpool upgrade
This system is currently running ZFS pool version 28.

All pools are formatted using this version.
vermaden said:
Also show here output of that command: # dmesg | grep -E "^ad".

Code:
pine# dmesg | grep -E "^ad"
ad4: 953869MB <SAMSUNG HD103SJ 1AJ100E4> at ata2-master UDMA100 SATA
ad5: 953869MB <SAMSUNG HD103SJ 1AJ100E4> at ata2-slave UDMA100 SATA
ad6: 953869MB <SAMSUNG HD103SJ 1AJ100E4> at ata3-master UDMA100 SATA
ad7: 953869MB <SAMSUNG HD103SJ 1AJ100E4> at ata3-slave UDMA100 SATA
ad4: 953869MB <SAMSUNG HD103SJ 1AJ100E4> at ata2-master UDMA100 SATA
ad5: 953869MB <SAMSUNG HD103SJ 1AJ100E4> at ata2-slave UDMA100 SATA
ad6: 953869MB <SAMSUNG HD103SJ 1AJ100E4> at ata3-master UDMA100 SATA
ad7: 953869MB <SAMSUNG HD103SJ 1AJ100E4> at ata3-slave UDMA100 SATA


vermaden said:
Check gstat -I 5000000 to see how much the drives are 'loaded' during that 'test'.

during this basically it looked like everything was 'passing' through the da0 devices before going to the raid. you'd see it spike on that, then goto the ad volumes. i've posted a movie of the sample copy here. let me know if you can read the text, etc.

http://www.youtube.com/watch?v=aQOEFTFze8k

here is the zfs iostat 1

Code:
pine# zpool iostat 1
               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
tank        3.00T   637G      5    118   119K   710K
tank        3.00T   637G      0      0  2.49K      0
tank        3.00T   637G     23    615  47.4K  3.27M
tank        3.00T   637G      0      0      0      0
tank        3.00T   637G      0      0      0      0
tank        3.00T   637G      0      0      0      0
tank        3.00T   637G      0      0  1.50K      0
tank        3.00T   637G     47     25   102K  1.14M
tank        3.00T   637G     62     13   132K  1.75M
tank        3.00T   637G     14    931  33.5K  4.62M
tank        3.00T   637G      0      0      0      0
tank        3.00T   637G      6      2  15.0K   130K
tank        3.00T   637G     14    580  32.0K  3.16M
tank        3.00T   637G      0      0      0      0
tank        3.00T   637G      0      0      0      0
tank        3.00T   637G      0      0      0      0
tank        3.00T   637G     10      1  21.0K   128K
tank        3.00T   637G      7    627  16.5K  3.24M
tank        3.00T   637G      1      0  3.99K      0
tank        3.00T   637G      0      0      0      0
tank        3.00T   637G      0      0      0      0
tank        3.00T   637G      0  1.27K  2.00K  4.61M
 
vermaden said:
@Sebulon

The siis(4) driver is not in GENERIC kernel, haven't you tried to just load both ahci(4) and siis(4) at boot?

That tip was awesome man, thanks a bunch! I didn´t even know that driver existed=) It was great, it actually solved a snag with the backplane on my diskrack also. The drives are again working with true hot-swap, as they should be.

@thehigherlife
OK, first of all, FreeBSD 8.2 is at version 15 of zpool. You are using the same pool as from Solaris, which is at a higher version, 28. Lots of difference there in between. Shouldn´t matter as much though, but for troubleshooting, I would suggest that you crash the pool and build it up again with FreeBSD so it´s in sync at 15, as you cannot downgrade a pool.

The "da" is your usb-key. You don´t copy "through" the root device, you copy directly to what´s mounted in the filesystem, which in your case is your pool. But in the film you showed a copy over the network, I saw Finder there. You´ll have to start by measuring the speed you get when copying locally on the NAS. Take a file that´s small enough to fit in your RAM. Then logged into the NAS:
Code:
# dd if=/pool/completlyinnocenttestfile.mkv of=/dev/zero bs=1m
Repeat that until you get like 2-3GB/s transfer. That means it completly buffered in the RAM. Then:
Code:
# dd if=/pool/completlyinnocenttestfile.mkv of=/pool/completlyinnocenttestfile-2.mkv bs=1m
And measure iostat, gstat during. What speed do you get then?
Then you start tweaking the network, NFS or SAMBA for example.

Also remember what I said about the gpart-partitions. ZFS likes raw devices better.

/Sebulon
 
Sebulon said:
@thehigherlife
OK, first of all, FreeBSD 8.2 is at version 15 of zpool. You are using the same pool as from Solaris, which is at a higher version, 28. Lots of difference there in between. Shouldn´t matter as much though, but for troubleshooting, I would suggest that you crash the pool and build it up again with FreeBSD so it´s in sync at 15, as you cannot downgrade a pool.

/Sebulon

I'm using a FreeBSD-provided patch for 8.2 that brings in support for v28. The pool was originally v22 which I upgraded to 28 in FreeBSD. As for rebuilding at the moment it isn't an option as I really don't have a place to put all of the data that is currently on there somewhere else. Also, I was seeing similar issues when I was running the dd command, pretty much the same thing happened when I ran gstat.
 
I just noticed this un added pool as well that was not there when I was using opensolaris.

Code:
  pool: mgmt
    id: 3554407418723745840
 state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
	devices and try again.
   see: http://www.sun.com/msg/ZFS-8000-3C
config:

	mgmt         UNAVAIL  insufficient replicas
	  raidz1-0   UNAVAIL  insufficient replicas
	    dsk/ad4  UNAVAIL  cannot open
	    dsk/ad5  UNAVAIL  cannot open
	    dsk/ad6  UNAVAIL  cannot open
	    dsk/ad7  UNAVAIL  cannot open
 
thehigherlife said:
Code:
pine# zpool upgrade
This system is currently running ZFS pool version 28.

All pools are formatted using this version.
Maybe you have hit some BUG as you use latest 'testing' ZFS v28?

thehigherlife said:
Code:
pine# dmesg | grep -E "^ad"
ad4: 953869MB <SAMSUNG HD103SJ 1AJ100E4> at ata2-master UDMA100 SATA
ad5: 953869MB <SAMSUNG HD103SJ 1AJ100E4> at ata2-slave UDMA100 SATA
ad6: 953869MB <SAMSUNG HD103SJ 1AJ100E4> at ata3-master UDMA100 SATA
ad7: 953869MB <SAMSUNG HD103SJ 1AJ100E4> at ata3-slave UDMA100 SATA
ad4: 953869MB <SAMSUNG HD103SJ 1AJ100E4> at ata2-master UDMA100 SATA
ad5: 953869MB <SAMSUNG HD103SJ 1AJ100E4> at ata2-slave UDMA100 SATA
ad6: 953869MB <SAMSUNG HD103SJ 1AJ100E4> at ata3-master UDMA100 SATA
ad7: 953869MB <SAMSUNG HD103SJ 1AJ100E4> at ata3-slave UDMA100 SATA
When using 2 drives per ATA cable they share the amount of that cable for transfers, so if single drive attached to such cable as master would be able to hit 100-130 theoretical MB/s when both master and slave use it at the same time they would 'fight' for the bandwidth, but it should not be as low as 5 MB/s though.

thehigherlife said:
during this basically it looked like everything was 'passing' through the da0 devices before going to the raid. you'd see it spike on that, then goto the ad volumes. i've posted a movie of the sample copy here. let me know if you can read the text, etc.

http://www.youtube.com/watch?v=aQOEFTFze8k

I think that you passed 500000 (5 zeroes) instead of 5000000 (6 zeroes) ;)

@Sebulon

Welcome mate ;)
 
@thehigherlife:
OK, then I´m with vermaden on that one. Maybe a bug. Or, are you running with dedup with a too small amount of RAM perhaps, since you don´t have any cache device?
And I saw that from iostat that you had filled it up almost entirely, so I understand that you´d want to avoid that. Unfortunately, it´s the only way to troubleshoot and rule out possible fault areas. Maybe if you´d consider getting one 3TB drive, creating a secondary pool and doing send/recv between them. That´s what I do and I´d recommend that any day of the week. It´s always good to have a complete disaster recovery, just in case.

/Sebulon
 
What type of troubleshooting would be helpful to determine if/where the bug resides? Since the system isn't crashing, I'm just seeing poor performance. I'm willing to help try to track down where the bug is.
 
Sebulon said:
@thehigherlife:
Or, are you running with dedup with a too small amount of RAM perhaps, since you don´t have any cache device?

Told ya=)
Add way more RAM or a cache-device to solve that. Suggestion on a cache-device would be eg. Vertex 2 120GB or Intel 320 120GB.

/Sebulon
 
Back
Top