Bad write performance

Hey guys,

I have a HP Proliant DL320e Gen8 Server with 8 GB DDR3 RAM and 4 x HDDs (WD Black SATA3) with each 2 TB.

FreeBSD 9.0-RELEASE-p3 is installed on one partiton (UFS filesystem) and the rest of the storage is set up as ZFS pool (RAID 0; not very secure, but that's not the point of my problem).

First of all, here some information about my configurations and system:

Code:
[cmd]df -h[/cmd]

Filesystem     Size    Used   Avail Capacity  Mounted on
/dev/ada0p2    7.9G    5.8G    1.5G    79%    /
devfs          1.0k    1.0k      0B   100%    /dev
/dev/ada0p4    7.9G     32M    7.2G     0%    /tmp
linprocfs      4.0k    4.0k      0B   100%    /compat/linux/proc
storage        7.1T    2.6T    4.6T    36%    /var/www

Code:
[cmd]zpool status storage[/cmd]
 pool: storage
 state: ONLINE
 scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        storage     ONLINE       0     0     0
          ada1      ONLINE       0     0     0
          ada2      ONLINE       0     0     0
          ada3      ONLINE       0     0     0
          ada0p5    ONLINE       0     0     0

errors: No known data errors

Code:
[cmd]zfs get all[/cmd]
NAME     PROPERTY              VALUE                  SOURCE
storage  type                  filesystem             -
storage  creation              Wed Jul 10 17:05 2013  -
storage  used                  2.56T                  -
storage  available             4.55T                  -
storage  referenced            2.56T                  -
storage  compressratio         1.00x                  -
storage  mounted               yes                    -
storage  quota                 none                   default
storage  reservation           none                   default
storage  recordsize            128K                   default
storage  mountpoint            /var/www               local
storage  sharenfs              off                    default
storage  checksum              on                     default
storage  compression           off                    local
storage  atime                 off                    local
storage  devices               on                     default
storage  exec                  on                     default
storage  setuid                on                     default
storage  readonly              off                    default
storage  jailed                off                    default
storage  snapdir               hidden                 default
storage  aclmode               discard                default
storage  aclinherit            restricted             default
storage  canmount              on                     default
storage  xattr                 off                    temporary
storage  copies                1                      default
storage  version               5                      -
storage  utf8only              off                    -
storage  normalization         none                   -
storage  casesensitivity       sensitive              -
storage  vscan                 off                    default
storage  nbmand                off                    default
storage  sharesmb              off                    default
storage  refquota              none                   default
storage  refreservation        none                   default
storage  primarycache          metadata               local
storage  secondarycache        all                    default
storage  usedbysnapshots       0                      -
storage  usedbydataset         2.56T                  -
storage  usedbychildren        19.7M                  -
storage  usedbyrefreservation  0                      -
storage  logbias               latency                default
storage  dedup                 off                    local
storage  mlslabel                                     -
storage  sync                  standard               local
storage  refcompressratio      1.00x                  -

Code:
[cmd]cat /boot/loader.conf[/cmd]
aio_load="YES"
cc_htcp_load="YES"

vm.kmem_size="5G"
vm.kmem_size_max="7G"

vfs.zfs.arc_max="4G"
vfs.zfs.arc_meta_limit="1G"
vfs.zfs.vdev.cache.size="64M"
vfs.zfs.prefetch_disable="1"

kern.maxproc=10000
kern.maxdsiz="1G"
kern.maxbcache=64M
kern.ipc.maxpipekva=4M

net.inet.tcp.syncache.hashsize=32768
net.inet.tcp.syncache.bucketlimit=32
net.inet.tcp.syncache.cachelimit=1048576
net.inet.tcp.hostcache.hashsize=65536
net.inet.tcp.hostcache.cachelimit=1966080
net.inet.tcp.tcbhashsize=524288
vm.pmap.pg_ps_enabled=1

Code:
[cmd]cat /etc/sysctl.conf[/cmd]

vfs.usermount=0
vfs.read_max=32

vfs.aio.max_aio_queue=1024
vfs.aio.max_aio_queue_per_proc=256
vfs.aio.max_aio_per_proc=32
vfs.aio.max_aio_procs=32

vm.pmap.shpgperproc=2048
kern.threads.max_threads_per_proc=4096
kern.ipc.somaxconn=4096
kern.ipc.maxsockets=204800
kern.ipc.nmbjumbop=262144
kern.ipc.nmbjumbo9=65536
kern.ipc.nmbjumbo16=32768
kern.ipc.nmbclusters=262144
kern.ipc.maxsockbuf=10485760
kern.sched.slice=1
kern.ps_arg_cache_limit=4096
kern.maxfiles=204800
kern.maxfilesperproc=200000
kern.maxvnodes=250000

net.inet.tcp.recvspace=65535 # FILE Server
net.inet.tcp.rfc1323=1
net.inet.tcp.delayed_ack=0
net.inet.tcp.recvbuf_max=10485760
net.inet.tcp.recvbuf_inc=65535
net.inet.tcp.sendbuf_max=10485760
net.inet.tcp.sendbuf_inc=65535

net.inet.ip.ttl=128
net.inet.ip.intr_queue_maxlen=4096
net.inet.ip.portrange.first=1024
net.inet.ip.portrange.last=65535
net.inet.ip.rtexpire=2
net.inet.ip.rtminexpire=2
net.inet.ip.rtmaxcache=1024

# Security
net.inet.ip.redirect=0
net.inet.ip.sourceroute=0
net.inet.ip.accept_sourceroute=0
net.inet.icmp.maskrepl=0
net.inet.icmp.log_redirect=0
net.inet.icmp.drop_redirect=1
net.inet.tcp.drop_synfin=1
net.inet.tcp.icmp_may_rst=0
net.inet.udp.blackhole=1
net.inet.tcp.blackhole=2
net.inet.tcp.log_in_vain=1
net.inet.udp.log_in_vain=1
security.bsd.map_at_zero=0 # may break system
security.bsd.see_other_uids=0
security.bsd.see_other_gids=0
security.bsd.conservative_signals=1
security.bsd.unprivileged_proc_debug=0
security.bsd.unprivileged_read_msgbuf=0
security.bsd.hardlink_check_uid=1
security.bsd.hardlink_check_gid=1

# IPV6
net.inet6.icmp6.nodeinfo=0
net.inet6.ip6.use_tempaddr=1
net.inet6.ip6.prefer_tempaddr=1
net.inet6.icmp6.rediraccept=0
net.inet6.ip6.accept_rtadv=0
net.inet6.ip6.auto_linklocal=0

net.inet.tcp.msl=5000
net.inet.tcp.maxtcptw=200000
net.inet.tcp.nolocaltimewait=1
net.inet.tcp.fast_finwait2_recycle=1
net.inet.tcp.keepidle=60000
net.inet.tcp.cc.algorithm=htcp
net.inet.tcp.ecn.enable=1

I don't have so much experience with FreeBSD unfortunately, but on Ubuntu for example, there was for each HDD a disk scheduler and I had to change it from the default setting CFQ to deadline to get more I/O performance. FreeBSD seems not to have something similar?

My problem now is following: large files (from 300 MB up to 5 GB) are uploaded to the server and also get downloaded. The upload speed is very good (up to 100 Mbit/s on a 1 Gbit Network), but when the upload hits the 100%, it takes a long time till until the full file is written to the disk.

I'm using PHP-FPM and NGINX for the upload. The function move_uploaded_file in PHP is used to save the file and this function does not copy the file if the TMP upload directory is on the same partition like the destination directory. So I have changed:

Code:
[file]PHP.INI[/file]
upload_tmp_dir = /var/www/tmp

[file]NGINX.CONF[/file]
client_body_temp_path /var/www/cache 1 2;

At the 100%, zpool iostat -v 1 shows me in an interval of 2-3 seconds a write speed of 80-90 Mbit/s on storage, but only for a short time. Maybe the I/O is blocking?

For a 550 MB uploaded file it takes about 30 to 60 seconds to get stored on the disk and getting the download link to it.

I hope you can give me some tips or find my problem. Maybe tweaking my ZFS configuration, etc.

Thank you!
 
Start by removing these:

Code:
vm.kmem_size="5G"
vm.kmem_size_max="7G"

You never want to touch those on the AMD64 architecture, the kernel autotuning takes care of reasonable defaults.

If the performance is still bad check if your disks are so called 4k sector disks and are used with wrong alignment because the system detects them as 512 bytes/sector disks. Use zdb(8) to find out the ashift properties that are in use, ashift of 9 is 512 bytes/sector, 12 is 4096 bytes/sector. You'll also need to use properly 4k aligned GPT partitions on such disks.
 
I will remove this settings which you have mentioned, @kpa.

I just figured out that my system is using 4K sector disks and my ashift is set to 9 and should be 12, like you wrote. Is it possible to change this and do a realigning?
 
Last edited by a moderator:
You have to recreate the pool, there's no way to change the ashift property of a vdev (ZFS's concept of a single "virtual" device that can be a single physical disk or a mirror of two or more disks or a RAID-Z(1/2) array of disks) after creation.

Search for "ZFS", "gnop" and "4k" and you should find the relevant threads.
 
Why is there one partition in the pool? Does it start on a 4K sector boundary?

Remember that there are two issues: even sector alignment, and ZFS block size (ashift=12). Both are needed.
 
Back
Top