Iffy Samba 4.1 performance on FreeBSD 10.0-RELEASE

I'm in the process of building a NAS for serving media on my home network over NFS and SMB.

I'm running FreeBSD 10.0-RELEASE off an USB stick on an ASRock C2550DI motherboard with 16 GB of ECC memory and two built-in Intel i210 NICs, bonded together with lagg(4). I'm also using jumbo frames:
Code:
$ ifconfig
igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
        options=403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO>
        ether bc:5f:f4:fd:aa:6b
        inet6 fe80::be5f:f4ff:fefd:aa6b%igb0 prefixlen 64 scopeid 0x1 
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
igb1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
        options=403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO>
        ether bc:5f:f4:fd:aa:6b
        inet6 fe80::be5f:f4ff:fefd:aa6c%igb1 prefixlen 64 scopeid 0x2 
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
        inet6 ::1 prefixlen 128 
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3 
        inet 127.0.0.1 netmask 0xff000000 
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
        options=403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO>
        ether bc:5f:f4:fd:aa:6b
        inet 10.0.1.250 netmask 0xffffff00 broadcast 10.0.1.255 
        inet6 fe80::be5f:f4ff:fefd:aa6b%lagg0 prefixlen 64 scopeid 0x4 
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect
        status: active
        laggproto lacp lagghash l2,l3,l4
        laggport: igb1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
        laggport: igb0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
On the client side, I'm using a MacBook Pro with OS X 10.8.5, also with jumbo frames enabled:
Code:
$ ifconfig en0
en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 9000
        options=b<RXCSUM,TXCSUM,VLAN_HWTAGGING>
        ether 00:1e:c2:18:6e:4c 
        inet6 fe80::21e:c2ff:fe18:6e4c%en0 prefixlen 64 scopeid 0x4 
        inet 10.0.1.6 netmask 0xffffff00 broadcast 10.0.1.255
        media: 1000baseT <full-duplex,flow-control>
        status: active
Using benchmarks/iperf, I can measure a throughput of 116 MB/s, which is consistent with Gigabit Ethernet.
Code:
$ iperf -c 10.0.1.250 -fM
------------------------------------------------------------
Client connecting to 10.0.1.250, TCP port 5001
TCP window size: 0.13 MByte (default)
------------------------------------------------------------
[  4] local 10.0.1.6 port 50814 connected with 10.0.1.250 port 5001
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  1157 MBytes   116 MBytes/sec
I'm using Samba 4.1.4, installed from the pkg(8) repository, with the following configuration:
Code:
$ pkg info | grep samba41
samba41-4.1.4_1                A free SMB/CIFS and AD/DC server and client for UNIX
/usr/local/etc/smb4.conf
Code:
[global]
  workgroup = WORKGROUP

  server string = NAS

  security = user
  map to guest = Bad User

  log file = /var/log/samba4/%m.log
  max log size = 50

  dns proxy = no

  load printers = no
  printcap name = /dev/null
  disable spoolss = yes

[test]
  path = /mnt/test
  public = yes
  only guest = yes
  read only = no
  create mask = 0644
  directory mask = 0755
For testing, I've created a md(4) in-memory file system, and copied a 1 GB test file to it:

mkdir /mnt/test && mdmfs -s 1280m md4 /mnt/test && chmod a+rwx /mnt/test

On the client side, I've mounted the share over SMB and NFS:
Code:
$ mount_smbfs //GUEST@10.0.1.250/test ~/mnt/smbtest
$ mount -o tcp 10.0.1.250:/mnt/test ~/mnt/nfstest
$ mount
[...]
//GUEST:@10.0.1.250/test on /Users/pva/mnt/smbtest (smbfs, nodev, nosuid, noowners, mounted by pva)
10.0.1.250:/mnt/test on /Users/pva/mnt/nfstest (nfs, nodev, nosuid, mounted by pva)
I'm currently seeing 36/34 MB/s read/write speeds over SMB:
Code:
$ pv ~/mnt/smbtest/hd_dts_hd_master_audio_sound_check_7_1_lossless.m2ts > /dev/null
1.02GiB 0:00:29 [36.1MiB/s] [================================>] 100%
$ pv hd_dts_hd_master_audio_sound_check_7_1_lossless.m2ts >  ~/mnt/smbtest/hd_dts_hd_master_audio_sound_check_7_1_lossless.m2ts
1.02GiB 0:00:31 [33.8MiB/s] [================================>] 100%
And 93/100 MB/s read/write speed over NFS:
Code:
$ pv ~/mnt/nfstest/hd_dts_hd_master_audio_sound_check_7_1_lossless.m2ts > /dev/null
1.02GiB 0:00:11 [92.9MiB/s] [================================>] 100%
$ pv hd_dts_hd_master_audio_sound_check_7_1_lossless.m2ts > ~/mnt/nfstest/hd_dts_hd_master_audio_sound_check_7_1_lossless.m2ts
1.02GiB 0:00:10 [ 100MiB/s] [================================>] 100%
For completeness' sake, scp(1) speeds are about 34 to 62 MB/s (read) and 28 to 31 MB/s (write), depending on the cipher used:
Code:
$ scp -c aes128-cbc 10.0.1.250:/mnt/test/hd_dts_hd_master_audio_sound_check_7_1_lossless.m2ts /dev/null
hd_dts_hd_master_audio_sound_check_7_1_lossle 100% 1049MB  33.8MB/s   00:31    
$ scp -c aes128-cbc hd_dts_hd_master_audio_sound_check_7_1_lossless.m2ts 10.0.1.250:/mnt/test
hd_dts_hd_master_audio_sound_check_7_1_lossle 100% 1049MB  28.4MB/s   00:37
$ scp -c arcfour 10.0.1.250:/mnt/test/hd_dts_hd_master_audio_sound_check_7_1_lossless.m2ts /dev/null
hd_dts_hd_master_audio_sound_check_7_1_lossle 100% 1049MB  61.7MB/s   00:17
$ scp -c arcfour hd_dts_hd_master_audio_sound_check_7_1_lossless.m2ts 10.0.1.250:/mnt/test
hd_dts_hd_master_audio_sound_check_7_1_lossle 100% 1049MB  30.9MB/s   00:34
While performance over NFS is pretty much what I'd expect, the SMB speeds seem to be a tad on the slow side. What kind of performance should I expect to achieve using SMB over Gigabit Ethernet on modern hardware? And if the performance I'm seeing is not up to par, does anyone have any pointers as to where to start tuning on the server and/or client side?
 
I've been playing with samba41 performance quite a bit of late. I'm getting about 90MB/s reads and 60MB/s writes from my W8.1 workstation. Not perfect but way better than defaults. Look at my smb4.conf for some pointers in my post here https://forums.freebsd.org/viewtopic.php?f=43&t=45504. Things that help alongside the smb.conf settings....

Code:
aio_load="YES"
in /etc/rc.conf

And a load of tweaking with ZFS - if your using it, this made the single greatest increase for me so far. I was playing with this after enabling a SSD based ZIL and L2ARC, which I'd recommend, lots of fun to be had there. /etc/sysctl.conf.
Code:
vfs.zfs.l2arc_write_max=629145600    # Maximum number of bytes written to l2arc per feed        # 8388608 8Mb/s - now 600Mb/s
vfs.zfs.l2arc_write_boost=629145600  # Mostly only relevant at the first few hours after boot   # 8388608 8Mb/s - now 600Mb/s
vfs.zfs.l2arc_headroom=2             # Not sure                                                 # 2 Default
vfs.zfs.l2arc_feed_secs=1            # l2arc feeding period                                     # 1 Default
vfs.zfs.l2arc_feed_min_ms=100        # minimum l2arc feeding period                             # 200 Default
vfs.zfs.l2arc_noprefetch=0           # control whether streaming data is cached or not          # 1 Default
vfs.zfs.l2arc_feed_again=1           # control whether feed_min_ms is used or not               # 1 Default
vfs.zfs.l2arc_norw=1                 # no read and write at the same time                       # 1 Default
 
silkie said:
I've been playing with samba41 performance quite a bit of late. I'm getting about 90MB/s reads and 60MB/s writes from my W8.1 workstation.
Thanks for the information! Turns out the single biggest improvement was to upgrade the client from OS X 10.8.5 to 10.9.2, since the latter supports SMB2 as well as jumbo frames.

After the upgrade, I started seeing read speeds comparable to yours, but writes were as slow as before:
Code:
$ pv ~/mnt/smbtest/hd_dts_hd_master_audio_sound_check_7_1_lossless.m2ts > /dev/null
1.02GiB 0:00:12 [84.7MiB/s] [================>] 100%
$ pv hd_dts_hd_master_audio_sound_check_7_1_lossless.m2ts > ~/mnt/smbtest/hd_dts_hd_master_audio_sound_check_7_1_lossless.m2ts
1.02GiB 0:00:32 [32.4MiB/s] [================>] 100%
The jump in performance caused by the change of SMB protocol level can be verified by forcing clients to use CIFS in smb4.conf:
Code:
max protocol = NT1
This causes the read speed to halve:
Code:
$ pv ~/mnt/smbtest/hd_dts_hd_master_audio_sound_check_7_1_lossless.m2ts > /dev/null
1.02GiB 0:00:24 [43.1MiB/s] [================>] 100%
According to smb.conf(5), the default max client protocol level is SMB3, so we don't need to specify it explicitly in the config file.

Next, I started playing around with Samba configuration options in order to improve the write speed.
Code:
use sendfile = true
got me an increase of nearly 17 MB/s:
Code:
$ pv hd_dts_hd_master_audio_sound_check_7_1_lossless.m2ts > ~/mnt/smbtest/hd_dts_hd_master_audio_sound_check_7_1_lossless.m2ts
1.02GiB 0:00:21 [  49MiB/s] [================>] 100%
and adding
Code:
min receivefile size = 16384
got me another 10 MB/s:
Code:
$ pv hd_dts_hd_master_audio_sound_check_7_1_lossless.m2ts > ~/mnt/smbtest/hd_dts_hd_master_audio_sound_check_7_1_lossless.m2ts
1.02GiB 0:00:17 [59.3MiB/s] [================================>] 100%
I forgot to mention in my previous post that I already had the aio(4) kernel module loaded. Without it, reads crawl along at around 300 - 400 kB/s. I've now also enabled AIO in smb4.conf for all reads and writes:
Code:
aio read size = 1
aio write size = 1
So, in conclusion, the following config options combined with the aio(4) kernel module nearly doubled my write performance:
Code:
use sendfile = true
min receivefile size = 16384
aio read size = 1
aio write size = 1
Oh, and I haven't gotten round to creating a ZFS pool yet, but that's next on my list as soon as the chassis I've ordered arrives.
 
Out of curiosity, I went and benchmarked the Samba configuration I listed above using a Windows 8 virtual machine (the IE10 – Win8 image available from Modern.IE) running under VMWare Fusion 6.0.2, and I'm seeing reads in the 90 - 110 MB/s range and 60 - 65 MB/s writes. Slightly faster than Mavericks, but not by a significant margin.
 
Back
Top