Terrible performance with NFS and iSCSI

Nightwing · Aug 9, 2013

Hello,

I'm having terrible performance with NFS and iSCSI from one of my FreeBSD servers. NFS and iSCSI both go below 5 MB/sec from one server to another while SCP goes at ~60 MB/sec and CIFS go at line-speed (~100 MB/sec)

The server with the issue:

Code:

# uname -a
FreeBSD gin 9.1-RELEASE-p5 FreeBSD 9.1-RELEASE-p5 #0 r254003: Wed Aug  7 03:09:03 UTC 2013     ferry@gin:/usr/obj/usr/src/sys/KA-NERU  amd64

A few tests I did:
iSCSI : 17sec for 100MB = ~5.8MB/sec (did this from an vmware esxi server)

Code:

# date; dd if=/dev/zero of=test.dd bs=1M count=100; date
Fri Aug  9 19:48:43 UTC 2013
100+0 records in
100+0 records out
Fri Aug  9 19:49:00 UTC 2013

NFS: 29 seconds for 100 MB = ~3.6 MB/sec (from another FreeBSD server)

Code:

# dd if=/dev/zero of=test.dd bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes transferred in 29.008874 secs (3614673 bytes/sec)

SCP: 38 seconds for 2 GB = ~52.6 MB/sec (from the same FreeBSD server as the NFS test)

Code:

# scp /mnt/zpool/tmp/test3.dd  user@10.0.0.4:/mnt/zpool/tmp/
Password:
test3.dd                                          100% 2000MB  52.6MB/s   00:38

CIFS: On a windows machine using Samba I get ~98 MB/sec

I've already changed the switch and network cables, keep having the low performance.

My iSCSI settings: (I copied it from a Zfsguru box I also have, that one is running without issues)

Code:

[Global]
Comment "Global configuration"
NodeBase "lan.network.gin"
PidFile /var/run/istgt.pid
AuthFile /usr/local/etc/istgt/auth.conf
MediaDirectory /var/istgt
LogFacility "local7"
DiscoveryAuthMethod Auto
Timeout 30
NopInInterval 20
MaxSessions 16
MaxConnections 4
MaxR2T 32
MaxOutstandingR2T 16
DefaultTime2Wait 2
DefaultTime2Retain 60
FirstBurstLength 262144
MaxBurstLength 1048576
MaxRecvDataSegmentLength 262144
InitialR2T Yes
ImmediateData Yes
DataPDUInOrder Yes
DataSequenceInOrder Yes
ErrorRecoveryLevel 0

[UnitControl]
Comment "Internal Logical Unit Controller"
AuthMethod CHAP Mutual
AuthGroup AuthGroup10000
Portal UC1 127.0.0.1:3261
Netmask 127.0.0.1

[PortalGroup1]
Comment "PortalGroup1"
Portal DA1 10.0.0.4:3260

[InitiatorGroup1]
Comment "Gin"
InitiatorName "ALL"
Netmask 10.0.0.0/24

[LogicalUnit1]
TargetName ginesx
Mapping PortalGroup1 InitiatorGroup1
AuthGroup AuthGroup1
UnitType Disk
QueueDepth 64
LUN0 Storage /dev/zvol/zpool/esx1 200GB

My NFS related settings in rc.conf:

Code:

nfs_reserved_port_only="YES"
nfs_server_enable="YES"
nfsv4_server_enable="YES"
nfsuserd_enable="YES"
rpcbind_enable="YES"
nfs_flags="-u -t -n 4" # serve udp, serve tcp, start 4 instances
mountd_flags="-l -p 1026"
mountd_enable="YES"
rpc_lockd_enable="YES"
rpc_lockd_flags="-p 1027"
rpc_statd_enable="YES"
rpc_statd_flags="-p 1028"

and my /etc/exports:

Code:

/mnt/gin/vm  -alldirs -maproot=root 10.0.0.5

Nightwing · Aug 9, 2013

Just to test a bit more, I took the OS disk of the ZFSguru and put it in the machine with the issues.

Code:

$ uname -a
FreeBSD zfsguru.bsd 9.1-RELEASE FreeBSD 9.1-RELEASE #0: Tue Jan 29 23:54:13 CET 2013     jason@zfsguru:/usr/obj/tmpfs/2-source/sys/OFED-POLLING-ALTQ  amd64

Once booted and started the iSCSI target in ZFSguru (with the same config I'm using in FreeBSD itself) it is running at ~40 MB/sec.

12 seconds for 500 MB = ~41.6 MB/sec

Code:

# date; dd if=/dev/zero of=test.dd
 bs=1M count=500; date
Fri Aug  9 21:03:46 UTC 2013
500+0 records in
500+0 records out
Fri Aug  9 21:03:58 UTC 2013

But I don't want to run ZFSguru on the machine but FreeBSD, so I'm still figuring out what the issue is.

throAU · Aug 12, 2013

Have you turned off the zfs "atime" option on the ZFS filesystem? If not, it will be performing last-accessed time updates to the filesystem every time you access a block with ISCSI or read/write a file with NFS. This hurts performance pretty badly.

You can do this with:
zfs set atime=off tank/filesystem

Assuming "tank/filesystem" is the name of your filesystem in use for NFS or ISCSI.

You will likely want to run your ISCSI targets and NFS shares on a different filesystem to CIFS or general OS data so you can tweak this setting without affecting other data.

Nightwing · Aug 12, 2013

throAU said:
Have you turned off the zfs "atime" option on the ZFS filesystem? If not, it will be performing last-accessed time updates to the filesystem every time you access a block with ISCSI or read/write a file with NFS. This hurts performance pretty badly.

You can do this with:
zfs set atime=off tank/filesystem

Assuming "tank/filesystem" is the name of your filesystem in use for NFS or ISCSI.

You will likely want to run your ISCSI targets and NFS shares on a different filesystem to CIFS or general OS data so you can tweak this setting without affecting other data.

Good tip, I have not.

I have been working on the issue since ZFSguru did not have the issue. I started with a basic FreeBSD 9.1 install and tested from two different machines on the network to see the speeds, after each successful test I went a patch level up. After I got to p5 I still did not get the same issue as before and was baffled by this and started to turn more services on. Once I got to enabling PF the issue started again and I can reproduce it by turning PF on and off.

Tomorrow I am going to work through my PF config to see what causes the issue.

Nightwing · Aug 12, 2013

I went and tested PF out as well, I found that even with a empty PF config and PF enabled it was still slow and when I disabled PF it went fast again.

throAU · Aug 13, 2013

Interesting.

Any reason you're running pf on the same box? It actually sounds like you may have stumbled onto something as every packet is going to need to go through pf when it is enabled (even with an empty rule-set), and with NFS or ISCSI you're talking a very high data rate. Much higher than most would use with pf. Normally you wouldn't have NFS or ISCSI on a box with a firewall on it, having those services on a box connected to the internet would be a bit risky IMHO, firewalled or not.

Try with ipfw?

Nightwing · Aug 13, 2013

throAU said:
Interesting.

Any reason you're running pf on the same box? It actually sounds like you may have stumbled onto something as every packet is going to need to go through pf when it is enabled (even with an empty rule-set), and with NFS or ISCSI you're talking a very high data rate. Much higher than most would use with pf. Normally you wouldn't have NFS or ISCSI on a box with a firewall on it, having those services on a box connected to the internet would be a bit risky IMHO, firewalled or not.

Try with ipfw?

The box is running in my LAN and is not connected directly to the Internet. I have a default where I always install pf on every machine I have for the added security.
I was indeed thinking last night to switch it to ipfw to see how the performance with that would be.

This is also the first time I see this issue, on other file-servers I do not see the issue while the setup is the same. I'm guessing it is a combination of pf and the network card which for the moment is the on-board Realtek until the Intel dual-port card is delivered.

kpa · Aug 13, 2013

PF is known to be a performance killer in such setups because it does not yet have good support for SMP to get the benefits from multiple CPUs/cores and the excessive locking hinders the performance badly. Things are changing in 10-CURRENT in this regard.

Nightwing · Aug 13, 2013

I've tried with ipfw today, but it has the same issue. With ipfw enabled the speeds are below 4 MB/sec and when I disable ipfw the speeds go up to ~60 MB/sec.

Nightwing · Aug 13, 2013

I've found a PCIe Intel card at home and put it in the server and that fixed the problem right away.

My guess is that there is a problem in the Realtek driver in combination with BFP. For future reference my on-board network card is:

Code:

re0@pci0:3:0:0: class=0x020000 card=0x81681849 chip=0x816810ec rev=0x06 hdr=0x00
    vendor     = 'Realtek Semiconductor Co., Ltd.'
    device     = 'RTL8111/8168B PCI Express Gigabit Ethernet controller'
    class      = network
    subclass   = ethernet

kpa · Aug 13, 2013

Well, Realtek cards are not among the recommended ones for high troughput solutions. Intel cards are and for a good reason

throAU · Aug 14, 2013

Yeah, read the comments from the one of the Realtek NIC's source code. An older Realtek NIC, but an indication of some of the brain damage they typically engage in. Some highlights...

From if_rl.c:

Code:

 * The RealTek 8139 PCI NIC redefines the meaning of 'low end.' This is
 * probably the worst PCI ethernet controller ever made, with the possible
 * exception of the FEAST chip made by SMC. The 8139 supports bus-master
 * DMA, but it has a terrible interface that nullifies any performance
 * gains that bus-master DMA usually offers.
 *
 * For transmission, the chip offers a series of four TX descriptor
 * registers. Each transmit frame must be in a contiguous buffer, aligned
 * on a longword (32-bit) boundary. This means we almost always have to
 * do mbuf copies in order to transmit a frame, except in the unlikely
 * case where a) the packet fits into a single mbuf, and b) the packet
 * is 32-bit aligned within the mbuf's data area. The presence of only
 * four descriptor registers means that we can never have more than four
 * packets queued for transmission at any one time.
 *
 * Reception is not much better. The driver has to allocate a single large
 * buffer area (up to 64K in size) into which the chip will DMA received
 * frames. Because we don't know where within this region received packets
 * will begin or end, we have no choice but to copy data from the buffer
 * area into mbufs in order to pass the packets up to the higher protocol
 * levels.
 *
 * It's impossible given this rotten design to really achieve decent
 * performance at 100Mbps, unless you happen to have a 400Mhz PII or
 * some equally overmuscled CPU to drive it.

Before buying a network adapter, I'll often have a look at the driver source, it's always entertaining.

I suspect the Intel NIC you are using has far less CPU overhead to drive it, and thus there is more left over for pf.