Poor disk performance in VMWare

tuaris · Jul 25, 2014

A problem I've always had with running Linux and VMWare as both a host and guest has been poor disk performance. With FreeBSD guests I've noticed the performance is a lot better when compared to Linux. However it's still a problem a times. I've noticed that a simple copy operation of a very small file causes the entire system to come to a grinding halt for almost half a second on FreeBSD 10 with Workstation 10. This is strange since I have 6 GB and 4 CPUs assigned to that machine. Oddly, it does not do that when I give it less resources. That's just one observation.

My bigger problem is on my VMware ESXi cluster. I run many FreeBSD 9.2 64-bit and 32-bit vm's. Disk performance is still better than if I were to run a Linux guest, but also no where near as great as a Windows guest. Moderate IO will cause the system to slow down quite a bit. For example the nightly cron jobs wreck havoc on my SAN. When one machine starts it's cron job (the "find" command specifically) it causes the other machines to slow down significantly. It's a snowball effect since the other virtual machines in the cluster will also start their nightly crons shortly after or around the same time. The end result is that all the FreeBSD vm's come to a halt for almost 1 hour or more, completely inaccessible.

I could disable or reschedule the cron jobs but that won't fix the underlying issue. The ESXi cluster and SAN performs well with my Windows based vm's, so it's not a bandwidth or hardware bottleneck. I use NAS4Free and iSCSI.

What can I "tweak" on the FreeBSD vm's to improve disk performance?
I already have the emulators/open-vm-tools-nox11 port installed.

SirDice · Jul 25, 2014

tuaris said:
I already have the emulators/open-vm-tools-nox11 port installed.

I'd dump that one and install the original VMWare tools from VMWare itself. Don't install the binary versions though they've been compiled for FreeBSD 8.x. Extract the files and recompile the drivers.

This should do the trick:

Code:

mount -t cd9660 -r ro /dev/cd0 /mnt

cd /tmp
tar -zxvf /mnt/vmware-freebsdtools.tar.gz # not sure if that's the correct name

cd /tmp/vmware-tools-distrib/lib/modules/source 
tar -zxvf vmblock.tar
tar -zxvf vmmemctl.tar
tar -zxvf vmnet3.tar

cd vmblock-only && make && make install && cd -
cd vmmemctl-only && make && make install && cd - 
cd vmxnet3-only && make && make install && cd -

cd ../../../

./vmware-install.pl -d # -d uses default settings for everything

tuaris · Aug 1, 2014

Thank you.

For FreeBSD 10.0-RELEASE I used the patches from http://ogris.de/vmware/freebsd10.html. I was able to get the vmblock and vmmemctl modules to build. The vmxnet3 module fails with the error:

Code:

vmxnet3_rx.c:330:21: error: use of undeclared identifier 'M_FRAG'

It's probably not a problem since vmxnet3 is included in 10?

tuaris · Oct 18, 2014

For anyone else having the same issue with a similar setup, I found the underlying problem in disk performance. In my (poor) design there is a single 1 Gbps link between the VMWare ESXi cluster and the iSCSI cluster. All those VM's are saturating the connection when the nightly maintenance tasks are running at the same time. It was a "bandwidth or hardware bottleneck" as I incorrectly wrote off earlier.

What maintenance tasks should I look at rescheduling?

wblock@ · Oct 18, 2014

The periodic daily and periodic weekly jobs in /etc/crontab run find(1) for a couple of things. Part of the daily job is /etc/periodic/security/100.chksetuid and /etc/periodic/security/110.neggrpperm. One or both of those is fairly disk intensive. The weekly job rebuilds the locate(1) database, trying to catalog every file.

Might be easiest to set the daily and weekly execution times ten or twenty minutes later for each VM. Depending on the VMs, some of those jobs might be unnecessary and could be disabled in periodic.conf(5).

tuaris · Jan 7, 2015

wblock@ said:
The periodic daily and periodic weekly jobs in /etc/crontab run find(1) for a couple of things. Part of the daily job is /etc/periodic/security/100.chksetuid and /etc/periodic/security/110.neggrpperm. One or both of those is fairly disk intensive. The weekly job rebuilds the locate(1) database, trying to catalog every file.

I created the file /etc/periodic.conf with the following contents:

Code:

security_status_neggrpperm_enable="NO"
security_status_chksetuid_enable="NO"
weekly_locate_enable="NO"

However the two security tasks still ran the following day.

Code:

root  1282  0.0  0.0  14184  1424 ??  Ss  8Dec14  0:28.87 |-- /usr/sbin/cron -s
root  56464  0.0  0.0  14184  1492 ??  I  3:01AM  0:00.00 | `-- cron: running job (cron)
root  56466  0.0  0.0  14540  1776 ??  Is  3:01AM  0:00.02 |  `-- /bin/sh - /usr/sbin/periodic daily
root  56475  0.0  0.0  14540  1792 ??  I  3:01AM  0:00.03 |  |-- /bin/sh - /usr/sbin/periodic daily
root  56734  0.0  0.0  14540  1768 ??  I  3:03AM  0:00.01 |  | `-- /bin/sh /etc/periodic/daily/450.status-security
root  56735  0.0  0.0  14540  1776 ??  I  3:03AM  0:00.01 |  |  `-- /bin/sh - /usr/sbin/periodic security
root  56742  0.0  0.0  14540  1788 ??  I  3:03AM  0:00.00 |  |  |-- /bin/sh - /usr/sbin/periodic security
root  56744  0.0  0.0  14540  1784 ??  I  3:03AM  0:00.01 |  |  | `-- /bin/sh - /etc/periodic/security/100.chksetuid
root  56748  0.0  0.1  9948  3188 ??  DN  3:03AM  0:08.22 |  |  |  |-- find -sx / /dev/null -type f ( -perm -u+x -or -perm -g+x
root  56749  0.0  0.0  14540  1784 ??  I  3:03AM  0:00.00 |  |  |  `-- /bin/sh - /etc/periodic/security/100.chksetuid
root  56751  0.0  0.0  9952  1296 ??  I  3:03AM  0:00.00 |  |  |  `-- cat

Is there something that needs to be restarted to apply the changes?

tuaris · Jan 7, 2015

After reading the man page a little more, I think I also need to set these?

Code:

security_status_chksetuid_period="NO"
security_status_neggrpperm_period="NO"

tuaris · Jan 7, 2015

tuaris said:
After reading the man page a little more, I think I also need to set these?

Code:

security_status_chksetuid_period="NO" security_status_neggrpperm_period="NO"

Those worked.

Poor disk performance in VMWare

Administrator