FreeBSD 8.2 / ESXi 5 Performance Oddities

When I create a FreeBSD-AMD64 8.2 VM with a disk residing on a datastore on an iSCSI attached SAN I observe some strange performance quirks related to disk I/O.

I have two Cisco C200 servers with 48G of RAM and 10gigE NICs connected to a Cisco Nexus 5548UP which connects via a 60gig etherchannel to a stack of Cisco 3750X switches which connects to an EMC VNXe 3300 SAN via iSCSI over 4 aggregated 1gigE links.

I'm aware of the fact that iSCSI over such a configuration is limited to 1 gig of bandwidth per LUN.
I know the disk array can saturate the 1 gig of bandwidth it has. I've observed this by testing at the application layer with other VMs and via performance monitoring of the switchports/etherchannels which connect my configuration.

If I run a very basic disk test with dd:
Code:
dd if=/dev/zero of=/root/foo bs=512 
^C4459648+0 records in
4459647+0 records out
2283339264 bytes transferred in 21.095505 secs (108238189 bytes/sec)

This looks about right. I can write to the disk at about 108MB/sec. This is a reasonable number for my configuration, I think. Other benchmarks like bonnie++ and iozone show lots of variability in their results, sometimes over 30%.

Here's `mount`
Code:
root@iscsi ~]# mount
/dev/da0s1a on / (ufs, local, soft-updates)
devfs on /dev (devfs, local, multilabel)
/dev/da0s2d on /usr (ufs, local, soft-updates)
/dev/da1 on /opt (ufs, local, soft-updates)

If I try to transfer data from a very fast workstation to a Samba share residing on /dev/da0 (which in reality lives on the iSCSI SAN) on that machine, I get about 5 megabytes of second throughput.

If I create another disk on a datastore that's locally attached to the host running my VM and export that via Samba, I can read and write to it from my workstation at a bit over 100 megabytes/second.

For what it's worth, iperf shows that my bandwidth between the VM and that workstation I'm using for testing is 933mbit/sec. That's about what I'd expect from my config.

If I clone this VM and put the resulting clone's disk on a locally attached datastore, I can read/write to the Samba shares at over 100 megabytes/second.

If I watch systat -v during the writes or reads from the "slow" share I see that when the transfer starts, em0 and mpt0 (my ethernet and SCSI devices on the vm) initially generate lots of interrupts, as they should. VM CPU usage goes up and things look normal. Soon, observed transfer speeds drop and em0 and mpt0 stop generating interrupts for periods of time and CPU usage drops to 0%.

While watching systat -v during a "fast" transfer, things look normal. em0 and mpt0 generate lots of interrupts and CPU usage is appropriate for the workload.

I've experimented with manipulating many different sysctl variables and turning on and off various features and tuning options in smb.conf. Over the course of a few days, I've determined that none of these had any appreciable impact other than reducing the maximum transfer speed of the "fast" shares. For example, sysctl vfs.write_behind=0 reduces my maximum throughput to the fast share by about 5%.

I only observe this slow transfer speed when the exported share is on a datastore which is located on my iSCSI device. NFS has similar performance oddities.

It seems to me that somewhere, something is getting starved, the buffers fill up and things go south from there. I just can't figure out where the problem is. Anyone have any ideas?

Here are the relevant config files:

Code:
RC.CONF

hostname="iscsi.supersecret.testdomain.net"
sshd_enable="YES"
ifconfig_em0="DHCP mtu 9000"
samba_enable="YES"
rpcbind_enable="YES"
nfs_server_enable="YES"
mountd_enable="YES"
mountd_flags="-r"

SYSCTL.CONF
 
# $FreeBSD: src/etc/sysctl.conf,v 1.8.34.1.6.1 2010/12/21 17:09:25 kensmith Exp $
#
#  This file is read when going to multi-user and its contents piped thru
#  ``sysctl'' to adjust kernel values.  ``man 5 sysctl.conf'' for details.
#

# Uncomment this to prevent users from seeing information about processes that
# are being run under another UID.
#security.bsd.see_other_uids=0
vfs.read_max=256

LOADER.CONF

#kern.hz=1000 /*this was a silly thing to try*/
# Beginning of the block added by the VMware software
vmxnet_load="YES"
# End of the block added by the VMware software
# Beginning of the block added by the VMware software
vmxnet3_load="YES"
# End of the block added by the VMware software
aio_load="YES"


SMB.CONF

[global]

netbios name = JUPITER
workgroup = INSIGHT
encrypt passwords = yes
preferred master = yes
domain master = no
local master = yes
security = user
domain logons = no
inherit permissions = yes
follow symlinks = yes
unix extensions = no
dns proxy = no

[test]
path = /usr/export/test
writeable = yes
write list = me

[opt]
path = /opt
writeable = yes
write list = me
[root@iscsi ~]#
 
I've noticed that if I have a virtual machine on a datastore local to the host machine and add to that machine another disk which is on a datastore on the iSCSI disk, disk performance on the local disk is degraded by about 50%.
 
Did you try any other guest operating system?
There are some penalties if you have many ESX(i) guests running off a single SAN LUN, but nowhere near 50%, and that scenario requires more than two virtual machines.

For what it's worth, FreeBSD runs just fine on ESX 4.1 residing on fibre channel attached LUN. There should be some iSCSI tunables in vSphere, you should check there.

Also, checking interrupts on the ESXi shouldn't be a bad idea. But since idiots removed service console from version 5, I don't know how exactly are you supposed to do it.
 
Thanks for your input, Zare. I believe you're right. There's no way I should be seeing I/O performance like this.

To make sure the problem isn't specific to FreeBSD, I installed CentOS 6. The performance is slightly better but the same basic problem remains, that is: stalled I/O.

I logged into one of my ESXi hosts and looked at the output of dmesg after I'd run a couple of benchmarks. Here's a sample:

Code:
2011-11-03T14:05:00.476Z cpu14:2062)WARNING: ScsiDeviceIO: 1218: Device naa.6006048c82736b0174ae4844edc1c73f performance has deteriorated. I/O latency increased from average value of 22423 microseconds to 451891 microseconds.
2011-11-03T14:05:01.245Z cpu14:2062)WARNING: ScsiDeviceIO: 1218: Device naa.6006048c82736b0174ae4844edc1c73f performance has deteriorated. I/O latency increased from average value of 22423 microseconds to 1065519 microseconds.
2011-11-03T14:05:01.279Z cpu12:2060)WARNING: ScsiDeviceIO: 1218: Device naa.6006048c82736b0174ae4844edc1c73f performance has deteriorated. I/O latency increased from average value of 22543 microseconds to 2287540 microseconds.
2011-11-03T14:05:01.956Z cpu14:859100)ScsiDeviceIO: 1198: Device naa.6006048c82736b0174ae4844edc1c73f performance has improved. I/O latency reduced from 2287540 microseconds to 448736 microseconds.
2011-11-03T14:05:03.237Z cpu8:2056)WARNING: ScsiDeviceIO: 1218: Device naa.6006048c82736b0174ae4844edc1c73f performance has deteriorated. I/O latency increased from average value of 22636 microseconds to 471095 microseconds.
2011-11-03T14:05:07.569Z cpu13:2061)WARNING: ScsiDeviceIO: 1218: Device naa.6006048c82736b0174ae4844edc1c73f performance has deteriorated. I/O latency increased from average value of 22708 microseconds to 1176154 microseconds.
2011-11-03T14:05:07.569Z cpu13:2061)WARNING: ScsiDeviceIO: 1218: Device naa.6006048c82736b0174ae4844edc1c73f performance has deteriorated. I/O latency increased from average value of 22708 microseconds to 2642677 microseconds.
2011-11-03T14:05:07.583Z cpu11:859102)WARNING: ScsiDeviceIO: 1218: Device naa.6006048c82736b0174ae4844edc1c73f performance has deteriorated. I/O latency increased from average value of 22890 microseconds to 5430004 microseconds.
2011-11-03T14:05:12.569Z cpu12:861149)WARNING: ScsiDeviceIO: 1218: Device naa.6006048c82736b0174ae4844edc1c73f performance has deteriorated. I/O latency increased from average value of 23416 microseconds to 11122647 microseconds.
2011-11-03T14:05:21.413Z cpu6:2054)ScsiDeviceIO: 2316: Cmd(0x412401ae9c00) 0x12, CmdSN 0x1354b to dev "naa.600508e0000000004d7a76bac30f180d" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
2011-11-03T14:05:22.212Z cpu3:2051)ScsiDeviceIO: 1198: Device naa.6006048c82736b0174ae4844edc1c73f performance has improved. I/O latency reduced from 11122647 microseconds to 2218132 microseconds.
2011-11-03T14:05:25.956Z cpu15:2063)ScsiDeviceIO: 1198: Device naa.6006048c82736b0174ae4844edc1c73f performance has improved. I/O latency reduced from 2218132 microseconds to 441434 microseconds.
2011-11-03T14:05:28.031Z cpu8:859100)WARNING: ScsiDeviceIO: 1218: Device naa.6006048c82736b0174ae4844edc1c73f performance has deteriorated. I/O latency increased from average value of 24660 microseconds to 653117 microseconds.
2011-11-03T14:05:43.404Z cpu9:861149)WARNING: ScsiDeviceIO: 1218: Device naa.6006048c82736b0174ae4844edc1c73f performance has deteriorated. I/O latency increased from average value of 24778 microseconds to 1478760 microseconds.
2011-11-03T14:05:49.579Z cpu0:863193)ScsiDeviceIO: 1198: Device naa.6006048c82736b0174ae4844edc1c73f performance has improved. I/O latency reduced from 1478760 microseconds to 287452 microseconds.
2011-11-03T14:05:56.962Z cpu10:865247)WARNING: ScsiDeviceIO: 1218: Device naa.6006048c82736b0174ae4844edc1c73f performance has deteriorated. I/O latency increased from average value of 24943 microseconds to 500936 microseconds.
2011-11-03T14:06:15.912Z cpu11:2059)WARNING: ScsiDeviceIO: 1218: Device naa.6006048c82736b0174ae4844edc1c73f performance has deteriorated. I/O latency increased from average value of 25099 microseconds to 527911 microseconds.
2011-11-03T14:06:21.368Z cpu13:861149)WARNING: ScsiDeviceIO: 1218: Device naa.6006048c82736b0174ae4844edc1c73f performance has deteriorated. I/O latency increased from average value of 25140 microseconds to 1065213 microseconds.
2011-11-03T14:06:26.386Z cpu9:2057)ScsiDeviceIO: 1198: Device naa.6006048c82736b0174ae4844edc1c73f performance has improved. I/O latency reduced from 1065213 microseconds to 209317 microseconds.
2011-11-03T14:06:43.481Z cpu9:2057)WARNING: ScsiDeviceIO: 1218: Device naa.6006048c82736b0174ae4844edc1c73f performance has deteriorated. I/O latency increased from average value of 25320 microseconds to 509678 microseconds.
2011-11-03T14:06:48.847Z cpu14:861149)WARNING: ScsiDeviceIO: 1218: Device naa.6006048c82736b0174ae4844edc1c73f performance has deteriorated. I/O latency increased from average value of 25412 microseconds to 1414710 microseconds.
2011-11-03T14:06:53.567Z cpu12:859100)ScsiDeviceIO: 1198: Device naa.6006048c82736b0174ae4844edc1c73f performance has improved. I/O latency reduced from 1414710 microseconds to 277555 microseconds.
2011-11-03T14:06:57.423Z cpu10:2058)WARNING: ScsiDeviceIO: 1218: Device naa.6006048c82736b0174ae4844edc1c73f performance has deteriorated. I/O latency increased from average value of 25510 microseconds to 578942 microseconds.
2011-11-03T14:06:58.749Z cpu14:859102)WARNING: ScsiDeviceIO: 1218: Device naa.6006048c82736b0174ae4844edc1c73f performance has deteriorated. I/O latency increased from average value of 25553 microseconds to 1278990 microseconds.
2011-11-03T14:06:59.899Z cpu8:2056)ScsiDeviceIO: 1198: Device naa.6006048c82736b0174ae4844edc1c73f performance has improved. I/O latency reduced from 1278990 microseconds to 251087 microseconds.
2011-11-03T14:07:01.236Z cpu6:865242)WARNING: ScsiDeviceIO: 1218: Device naa.6006048c82736b0174ae4844edc1c73f performance has deteriorated. I/O latency increased from average value of 25553 microseconds to 518278 microseconds.
2011-11-03T14:07:04.647Z cpu10:859100)ScsiDeviceIO: 1198: Device naa.6006048c82736b0174ae4844edc1c73f performance has improved. I/O latency reduced from 518278 microseconds to 102810 microseconds.

That doesn't look right at all.
 
As it turns out, this problem was related to a failing Catalyst 3750X.

The smoking gun happened last night. We had converted all our datastores from iSCSI to NFS. This basically tripled performance but some problems still remained. I was doing some benchmarks on my VMs when two of them just locked up. I got a stream of emails from my SAN saying that all of the sudden, ports were miscabled. Frustrated, I called it a night and went to bed. When I came in to work this morning, 1 of the 3750 switches in my stack of 7 had no link light on any ports. Power cycling the switch was fruitless. We removed the switch from the stack and moved the SAN over to another switch.

When I fired up the VMs and ran some benchmarks, performance was optimal.

The moral of the story: Sometimes it really is a hardware problem.
 
Zare said:
But since idiots removed service console from version 5

Very much _not true. You can ssh to ESXi, you just have to enable it. You are logging to ESXi userspace (busybox) directly (finally).

@no_fuse: Great you found the problem. I admit I didn't go trough all your comments in the thread as you did find the solution and it was strictly hardware related. I once had a problem on EVA storage with micro-bursting - I think it's worth checking.
 
Back
Top