ZFS slow on FreeBSD 9

Hello!

I'm installing FreeBSD 9.0-RELEASE on DL120/G7, it has:
CPU:Intel 4-Core 3.3GHz E3-1240
RAM:8Gb
4x2 TB HDD

Code:
# camcontrol devlist
<WDC WD20EARS-00MVWB0 51.0AB51>    at scbus0 target 0 lun 0 (ada0,pass0)
<WDC WD20EARS-00MVWB0 51.0AB51>    at scbus0 target 1 lun 0 (ada1,pass1)
<WDC WD20EARS-00MVWB0 51.0AB51>    at scbus1 target 0 lun 0 (ada2,pass2)
<WDC WD20EARS-00MVWB0 51.0AB51>    at scbus1 target 1 lun 0 (ada3,pass3)

no tuning in /boot/loader.conf, only this sysctl in sysctl.conf:
Code:
vfs.zfs.txg.write_limit_override=1073741824
kern.maxvnodes=250000

I create raid-z

Code:
# zpool status
  pool: tank
 state: ONLINE
 scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            ada0p3  ONLINE       0     0     0
            ada1p3  ONLINE       0     0     0
            ada2p3  ONLINE       0     0     0
            ada3p3  ONLINE       0     0     0

and then I just try to move files from one place to another I get around 10-15 MB/sec :(
What can I do with that?
Thanks.
 
I fixed this issue, but still get around 15 Mb when I copy a file from one catalog to another.
My /boot/loader.conf:

Code:
zfs_load="YES"
vfs.zfs.prefetch_disable="1"
vfs.root.mountfrom="zfs:tank/root"
vfs.zfs.txg.write_limit_override=1073741824

Code:
r8# zfs get all tank/root/d1
NAME          PROPERTY              VALUE                  SOURCE
tank/root/d1  type                  filesystem             -
tank/root/d1  creation              Wed Apr  4 13:29 2012  -
tank/root/d1  used                  502G                   -
tank/root/d1  available             4.65T                  -
tank/root/d1  referenced            502G                   -
tank/root/d1  compressratio         1.00x                  -
tank/root/d1  mounted               yes                    -
tank/root/d1  quota                 none                   default
tank/root/d1  reservation           none                   default
tank/root/d1  recordsize            128K                   default
tank/root/d1  mountpoint            /mnt/d1                inherited from tank/root
tank/root/d1  sharenfs              off                    default
tank/root/d1  checksum              off                    local
tank/root/d1  compression           off                    default
tank/root/d1  atime                 on                     default
tank/root/d1  devices               on                     default
tank/root/d1  exec                  on                     default
tank/root/d1  setuid                on                     default
tank/root/d1  readonly              off                    default
tank/root/d1  jailed                off                    default
tank/root/d1  snapdir               hidden                 default
tank/root/d1  aclmode               discard                default
tank/root/d1  aclinherit            restricted             default
tank/root/d1  canmount              on                     default
tank/root/d1  xattr                 off                    temporary
tank/root/d1  copies                1                      default
tank/root/d1  version               5                      -
tank/root/d1  utf8only              off                    -
tank/root/d1  normalization         none                   -
tank/root/d1  casesensitivity       sensitive              -
tank/root/d1  vscan                 off                    default
tank/root/d1  nbmand                off                    default
tank/root/d1  sharesmb              off                    default
tank/root/d1  refquota              none                   default
tank/root/d1  refreservation        none                   default
tank/root/d1  primarycache          all                    default
tank/root/d1  secondarycache        all                    default
tank/root/d1  usedbysnapshots       0                      -
tank/root/d1  usedbydataset         502G                   -
tank/root/d1  usedbychildren        0                      -
tank/root/d1  usedbyrefreservation  0                      -
tank/root/d1  logbias               latency                default
tank/root/d1  dedup                 off                    default
tank/root/d1  mlslabel                                     -
tank/root/d1  sync                  disabled               local
tank/root/d1  refcompressratio      1.00x                  -

Code:
r8# zpool status
  pool: tank
 state: ONLINE
 scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            ada0p3  ONLINE       0     0     0
            ada1p3  ONLINE       0     0     0
            ada2p3  ONLINE       0     0     0
            ada3p3  ONLINE       0     0     0

errors: No known data errors
 
Well...there is no answers, we move system to Debian and performance of disk operations is better...I'm very upset
 
Qaz said:
Well...there is no answers, we move system to Debian and performance of disk operations is better...I'm very upset

How about UFS? debian wont fix the disc problem either.
 
The debian installer might align the partitions to the correct offsets.
Isn't it best practice to hand whole discs to ZFS, thus avoiding such things altogether?
 
We tried to set up the system on ufs, it's slower than ZFS, and in ZFS we have right offsets

Code:
r8# zdb|grep ashift
            ashift: 12

It's just slower and I don't know what to do.
 
@Qaz

gkontos said:
You need to align your drives for 4K. Have a look at this thread.

Using gnop is the only thing we know you have done, since you never told us anything more than that. Without more info, there's very little we can do for you. But depending on your setup, there may very well be more that could be done, tweaking your setup both config- and maybe we'd have suggestions hardware-wise too.

Although, if you're more happy using Debian, stick to that. At least you won't be disappointed.

/Sebulon
 
I have posted sysctl, type of RAID I use and so on, I can give more information just say what else is needed, I'm aligning my drives for 4K, I think I have good hardware and that's why my question: why it's so slow?
 
Crivens said:
Isn't it best practice to hand whole discs to ZFS, thus avoiding such things altogether?

Well, it does simplify things. But you have to partition if you want to be able to boot from them.

/Sebulon
 
Qaz said:
Code:
r8# zdb|grep ashift
            ashift: 12

I see that your ZFS pool was built on top of GPT partitions. If those partitions were not 4KB sector aligned, then even a zpool with an ashift of 12 wouldn't have performed well.

Really we needed to see the output of 'gpart show' to check your gpt partition alignment, but I guess it's too late now.
 
Let us see how the drives are performing when you copy files.

First run the commands

$ iostat -xz -w 1 -c 120 > iostat-2min.txt

and

$ zpool iostat -v 10 12 > zpool-2min.txt.

This should capture 2 minutes of iostat of your drives. In the mean time, also start copying files. Then post the results back here.
 
I don't have it server, but have another and performance is not very good. Here is the listing of files:

http://pastehtml.com/view/bv1gfavuu.txt

http://pastehtml.com/view/bv1gp2e6e.txt

Code:
>uname -a
FreeBSD 8.2-RELEASE-p3 FreeBSD 8.2-RELEASE-p3 #0: Tue Sep 27 18:45:57 UTC 2011     
root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

Code:
NAME  PROPERTY  VALUE    SOURCE
tank  version   15       default

Code:
zfs get version
NAME           PROPERTY  VALUE    SOURCE
ssd            version   4        -
tank           version   4        -
tank/root      version   4        -
tank/root/tmp  version   4        -
tank/root/var  version   4        -

Code:
cat /boot/loader.conf
autoboot_delay="3"
loader_logo="beastie"

zfs_load="YES"
vfs.root.mountfrom="zfs:tank/root"
#geom_mirror_load="YES"
vfs.zfs.zio.use_uma="0"

pf_load="YES"

accf_data_load="YES"
accf_http_load="YES"

aio_load="YES"

Code:
#for zfs
vfs.zfs.txg.write_limit_override=1073741824
kern.maxvnodes=250000

#mysql zfs tuning
vfs.zfs.prefetch_disable=1

Code:
zpool status
  pool: ssd
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        ssd         ONLINE       0     0     0
          ad8       ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            ad4p3   ONLINE       0     0     0
            ad6p3   ONLINE       0     0     0

errors: No known data errors

Code:
CPU: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz (3411.50-MHz K8-class CPU)
  Origin = "GenuineIntel"  Id = 0x206a7  Family = 6  Model = 2a  Stepping = 7
real memory  = 17179869184 (16384 MB)
avail memory = 16431161344 (15669 MB)

atacontrol list
ATA channel 2:
    Master:  ad4 <ST33000651AS/CC45> SATA revision 2.x
    Slave:       no device present
ATA channel 3:
    Master:  ad6 <ST33000651AS/CC45> SATA revision 2.x
    Slave:       no device present
ATA channel 4:
    Master:  ad8 <OCZ-VERTEX3/2.15> SATA revision 2.x
    Slave:       no device present
 
Qaz, for the record I also have been considering 9.0 RELEASE as a replacement for a Solaris box. My setup is a fast motherboard with Core i3, 2 GB RAM, 1 Adaptec 3085, 4x400 GB SAS disks. With an old release of Solaris Express Community Edition, build snv_113, I get > 300 MB/s read performance out of that setup. FreeBSD 9.0 slashes these speeds by 4, at 60-80 MB/s - I've done no tuning and I'll open another post asking for tips here but it's frustrating that the same hardware is so slow. If it was for my home NAS I'd be worried about backups taking 24 hours instead of 6 but there's no way I can substitute a busy Solaris box at the office with FreeBSD with these numbers. The FreeBSD handbook says amd64 is largely autotuning (Solaris is) but if this is the best FreeBSD can do I wonder why Solaris is 4 times as fast on the same hardware.
 
Looking at some sample output in bv1gfavuu.txt,

Code:
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ad4       56.9   0.0  4009.1     0.0    0  48.4  87
ad6       60.9   0.0  3551.1     0.0    7  54.8 101
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ad4       54.9   0.0  3603.0     0.0    7  49.8  90
ad6       57.9   0.0  3708.4     0.0    0  64.0  91
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ad4       57.9   0.0  4075.6     0.0    2  66.1  94
ad6       40.0   0.0  2756.3     0.0    2  37.6  69

both ad4 and ad6 are almost 100% busy but can only get total reading speed of 6MB/s. The following snippets

Code:
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ad4       24.0 364.3  1476.1 23789.7   10  17.5  99
ad6       28.9 355.3  1802.5 23009.7   10  20.6 101
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ad4        2.0 561.6   127.7 55570.3   10  17.3 100
ad6        3.0 571.6   178.1 56349.8   10  17.2  99
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ad4        5.0 325.7   319.7 24960.3   10  35.0 101
ad6        3.0 431.6   114.9 38440.3   10  25.2  98
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ad4       47.0 108.0  2808.8 13496.4   10  59.3 100
ad6       63.0   2.0  3662.5     9.0    7  98.1 103

shows that writing is much more impressive. So your problem is actually the reading speed. I would advise setting vfs.zfs.prefetch_disable=0 first to see how it will impact the performance of the whole system.

PS. recordsize should be set to match that of the database.
 
jem said:
I see that your ZFS pool was built on top of GPT partitions. If those partitions were not 4KB sector aligned, then even a zpool with an ashift of 12 wouldn't have performed well.

Speaking of gpt alignment, we're not doing ourselves any favors with the default number of partitions.. The default of 128 partitions in the header causes 34 sectors to be used. I don't recall the number I used, it might have been 240.

Mine typically look like this:
Code:
silo# gpart show ada0
=>        64  5860533041  ada0  GPT  (2.7T)
          64         192     1  freebsd-boot  (96k)
         256    33554432     2  freebsd-swap  (16G)
    33554688  5826978416     3  freebsd-zfs  (2.7T)
  5860533104           1        - free -  (512B)

I think I used commands like: gpart create -s gpt -n 240 ada0
and that would end up with the free space at sector 64 instead of 34. And then used gnop with a 4096 byte sector size to get the 4k arrangement within zfs.

I've seen recipes where people specify the start address to cause the partitions to be 4k aligned, but the gap irritated me so I made the table a little bigger.

Code:
silo# zdb |grep ashift
            ashift: 12
 
Note: ZFS in FreeBSD 8.2, while stable for most things, is not all that performant. This is a known issue, and why you'll always get "upgrade to 8-STABLE after 8.2" advice when asking ZFS questions on the mailing lists.

Fortunately, FreeBSD 8.3 was just released, so you can stick to -RELEASE, and get all the latest ZFS fixes and speed-ups.

Note also, that just handing the entire drive to ZFS and using gnop is not enough to get proper 4K alignment. Even if you give the whole drive to ZFS, it doesn't use the entire disk starting at LBA 0. There's some slack at the beginning for various things. To make sure you are 4K-aligned, use a single GPT partition that starts at the 1 MB boundary (-b 2048 or -b 512 depending) and spans the entire disk. Then use that partition for ZFS. You'll find things go much smoother/faster that way.

Finally, with those drives, you need to get the wdidle3.exe program from Western Digital, and disable all the power-saving features (mainly the Idle Timeout). The default for those drives is under 8 seconds, and will cause all kinds of havoc with RAID controllers and ZFS setups. (We've actually returned/replaced all our *EARS* drives due to the horrible performance of the drives under FreeBSD.)
 
lucinda said:
Qaz, for the record I also have been considering 9.0 RELEASE as a replacement for a Solaris box. My setup is a fast motherboard with Core i3, 2 GB RAM, 1 Adaptec 3085, 4x400 GB SAS disks. With an old release of Solaris Express Community Edition, build snv_113, I get > 300 MB/s read performance out of that setup. FreeBSD 9.0 slashes these speeds by 4, at 60-80 MB/s - I've done no tuning and I'll open another post asking for tips here but it's frustrating that the same hardware is so slow. If it was for my home NAS I'd be worried about backups taking 24 hours instead of 6 but there's no way I can substitute a busy Solaris box at the office with FreeBSD with these numbers. The FreeBSD handbook says amd64 is largely autotuning (Solaris is) but if this is the best FreeBSD can do I wonder why Solaris is 4 times as fast on the same hardware.

I wouldn't qualify a Core i3 and 2 GB of RAM as "a fast motherboard". I'd barely qualify it as "a usable desktop". :) Especially if you are sticking SAS drives onto it.

There most certainly is something wrong with your setup. My home NAS box can barely be considered "desktop-class", considering it's only a lowly 2.8 GHz P4 CPU (single-core, HTT-enabled) with 2 GB of RAM, running 32-bit FreeBSD 8-stable from January (r226546) using the on-board ICH7 SATA controller (non-AHCI) with 4x 500 GB WD Caviar Black HDs.

Pool configuration is a simple dual-mirror setup. And I get, under normal usage, 40-60 MBps of throughput (as shown by zpool iostat and gstat) locally. And I can see the odd burst up to 30 MBps per disk (120 MBps for the pool).

If my horribly under-powered setup can do 60 MBps with ancient SATA controllers and harddrives, then you're doing something wrong if you can only get 60 MBps out of SAS drives and controller.
 
peter@ said:
I've seen recipes where people specify the start address to cause the partitions to be 4k aligned, but the gap irritated me so I made the table a little bigger.

True, there are many recipes out there regarding proper alignment. Quoting from revision r230059

[CMD=""]# gpart add -b 34 -s 94 -t freebsd-boot ad0[/CMD]

I find this always working when dealing with ZFS on ROOT systems.

Dealing with pools in large arrays needs proper alignment also:

[CMD=""]# gpart add -t freebsd-zfs -l disk0 -b 2048 -a 4k daX[/CMD]

I recently had to deal with a large array consisted of 19 striped mirrors on Intel SSD drives with a very poor performance.

The array had to be destroyed. Then gpart(8)() was used to align each disk followed by gnop(8)(). Added to that, a stripe of 2 SSDs were used for CACHE and a mirror for LOG

The results were amazing, we could easily then see 100MBps of write speed on that array. At some point it got to 300MBps which means that speed was not a bottleneck anymore.
 
Back
Top