ZFS on /usr - SLOW

Hi, I just put /usr on ZFS and things are really slow, and I'm wondering if anyone can help me with some ideas.

PHP:
[~]$ iostat 1
      tty             ad0              ad1             cpu
 tin tout  KB/t tps  MB/s   KB/t tps  MB/s  us ni sy in id
   1  135 28.64  74  2.06  47.53  94  4.39  13  1  9  1 77
   0  390  0.00   0  0.00   0.00   0  0.00  11  0 12  2 75
   1  130 64.00   1  0.06  64.00  74  4.62   2  1  4  0 93
   7 8223 25.49  77  1.91   0.00   0  0.00   6  0  5  1 88
   2  778  8.86 162  1.40  33.62 205  6.73  10  0  9  3 78
   0  125  0.00   0  0.00   0.00   0  0.00  16  0  2  1 80
   0  130  0.00   0  0.00   0.00   0  0.00  18  0  4  0 78
   8 8375 64.00   3  0.19   0.00   0  0.00  11  0  4  2 84
   2  928 12.19  47  0.56  64.00   1  0.06  12  0  5  2 80
   0  125  9.15 286  2.55  35.58 249  8.65  14  0 11  1 74
   0  130 11.33  76  0.84   1.00   4  0.00  13  0  7  1 79
   0  247  0.00   0  0.00  64.00   3  0.19  16  0  4  2 78
   1  130  0.00   0  0.00   0.00   0  0.00  16  0  6  2 76
   7 8375 10.92  49  0.52   0.00   0  0.00   5  0  4  0 91
   2 1208  5.60 144  0.79  33.88 264  8.73  12  0  8  1 80
   0  125  0.00   0  0.00   1.00   4  0.00   9  0  5  2 84
   0  130  0.00   0  0.00   0.00   0  0.00  11  0  2  1 86
   0  130 60.00   1  0.06  64.00  69  4.31  25  0  9  2 65
   0  130 64.00   1  0.06  62.70 335 20.50  22  0 12  2 64
   0  130  7.33 163  1.17  37.69 274 10.08  10  0  7  0 83
   0  130 10.86 173  1.83  38.80 189  7.16  14  0 16  2 69

ad0 has /usr on it.

The points where it goes up to 2Mb/sec is just me doing `ls`

At someone's suggestion I changed recordsize to 64K but I haven't noticed much improvement. Here's zfs get all base/usr

PHP:
# zfs get all base/usr
NAME      PROPERTY       VALUE                  SOURCE
base/usr  type           filesystem             -
base/usr  creation       Thu Jun  4 11:21 2009  -
base/usr  used           10.1G                  -
base/usr  available      23.5G                  -
base/usr  referenced     10.1G                  -
base/usr  compressratio  1.00x                  -
base/usr  mounted        yes                    -
base/usr  quota          none                   default
base/usr  reservation    none                   default
base/usr  recordsize     64K                    local
base/usr  mountpoint     /usr                   local
base/usr  sharenfs       off                    default
base/usr  checksum       on                     default
base/usr  compression    off                    default
base/usr  atime          on                     default
base/usr  devices        on                     default
base/usr  exec           on                     default
base/usr  setuid         on                     default
base/usr  readonly       off                    default
base/usr  jailed         off                    default
base/usr  snapdir        hidden                 default
base/usr  aclmode        groupmask              default
base/usr  aclinherit     secure                 default
base/usr  canmount       on                     default
base/usr  shareiscsi     off                    default
base/usr  xattr          off                    temporary
base/usr  copies         1                      default

here's my /boot/loader.conf

PHP:
zfs_load="YES"
vm.kmem_size="330M"
vm.kmem_size_max="330M"
vfs.zfs.arc_max="40M"
vfs.zfs.vdev.cache.size="5M"

Some memory info
PHP:
# sysinfo mem
RAM information

System memory information
Maximum Memory Module Size: 1024 MB
Maximum Total Memory Size: 2048 MB
Maximum Capacity: 2 GB
Number Of Devices: 2

INFO: Run `dmidecode -t memory` to see further information.

System memory summary
Total real memory available:    887 MB
Logically used memory:          311 MB
Logically available memory:     575 MB

Swap information
Device          512-blocks     Used    Avail Capacity
/dev/ad0s1b        4139904       0B     2.0G     0%
 
PHP:
pool: base
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        base        ONLINE       0     0     0
          ad0s1e    ONLINE       0     0     0
          ad0s1d    ONLINE       0     0     0
          ad0s1f    ONLINE       0     0     0

errors: No known data errors

  pool: zster
 state: ONLINE
 scrub: none requested
config:                  

        NAME        STATE     READ WRITE CKSUM
        zster       ONLINE       0     0     0
          ad1s1d    ONLINE       0     0     0
          ad1s1e    ONLINE       0     0     0
          ad1s2d    ONLINE       0     0     0
          ad1s3d    ONLINE       0     0     0

errors: No known data errors

I haven't scrubbed since I changed /usr to zfs, otherwise that shows up with no errors too.

Any ideas?
 
There's why it's so slow ... you are trying to stripe across slices on the same disk. Any write to pool "base" will be written out to all three slices on the same drive, causing massive drive thrashing. Same with pool "zster".

For example, the default record (block) size in ZFS is 128K. To write out a 1 MB file, it will write the metadata to each vdev (slice), then write 128 KB to the first vdev, then 128 KB to the next vdev, then 128 KB to the next vdev, and so on. So for every 128 KB write, the drive head has to move back and forth between each slice.

Note also that ZFS disables the drive's onboard cache if the underlying vdev is not a whole disk. By using individual slices on a drive, you are running without cache, which greatly reduces write throughput.

To get the best performance out of ZFS, use whole drives, don't use slices. And if you're going to use slices, then use slices on separate disks so that the I/O is spread between disks.
 
that makes perfect sense. Thank you.

Hm, the reason why I did this in the first place was because my drive as poorly partitioned and I wanted to be able to use it without repartitioning. So much for that then :/
 
For a much lighter-weight way to lump up disparate slices and partitions (though not as featureful or supposedly fault tolerant as zfs), gconcat(8) worked well for me back in 2007. And you can even get all weird and bust it up into two hunks (either two seperate concats, or just bsdlabel) and gmirror it.

Edit: phoenix is right, gconcat(8) is the point.
 
So long as you don't try to use gmirror using multiple slices/partitions from the same physical disk. :)

gconcat would work, though.
 
Back
Top