FreeBSD9, ZFS, virtualbox, mem utilisation going high, fatal trap 9 after sometime

Dear all,

I would need your help to understand the behaviour I am about to describe.

Running FreeBSD 9.0 amd64 (STABLE - custom kernel where I kept only the drivers I needed for my build). System has 8GB of RAM. I have got the system installed on a zfs mirror. I have got an extra raidz zpool for data.

I installed vbox-ose 4.0.14. I run a windows VM in it which has all the vbox extensions for 4.0.14 installed. The VDI for the VM is actually stored on the zfsroot mirror. The Vbox VM has a shared folder which points to the raidz zpool on the FreeBSD system.

Everything seems to be fine until after some time when I go back and try to access the VM, the whole system hangs with the following error.

Code:
kernel: Fatal trap 9: general protection fault while in kernel mode
kernel: cpuid = 0; apic id = 00
kernel: instruction pointer     = 0x20:0xffffffff8098edc6
kernel: stack pointer           = 0x28:0xffffff8234a29910
kernel: frame pointer           = 0x28:0xffffff8234a29960
kernel: code segment            = base 0x0, limit 0xfffff, type 0x1b
kernel: = DPL 0, pres 1, long 1, def32 0, gran 1
kernel: processor eflags        = interrupt enabled, resume, IOPL = 0
kernel: current process         = 6 (pagedaemon)
kernel: trap number             = 9

If I do not run virtualbox, the system is stable.

I started to check things here and there looking for anything odd. I found the following and would welcome any pointer.

Top will show an always-increasing wired memory while the VBox process does not consume more memory, nor any other process.
Code:
last pid:  2636;  load averages:  0.77,  0.44,  0.21                                                     up 0+02:34:59  15:21:17
56 processes:  2 running, 53 sleeping, 1 waiting
CPU:  3.6% user,  0.0% nice,  7.1% system,  0.0% interrupt, 89.3% idle
Mem: 126M Active, 50M Inact, 5540M Wired, 228K Cache, 2193M Free
Swap: 8192M Total, 8192M Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
   11 root          2 155 ki31     0K    32K CPU1    1 291:25 170.46% idle
 2550 boris        18  20    0  2482M  2279M IPRT S  1   3:08 25.10% VirtualBox
 2211 boris         1  22    0 76968K 46196K select  0   0:42  4.20% Xvnc
   12 root         19 -84    -     0K   304K WAIT    0   0:45  0.59% intr
    0 root        307 -92    0     0K  4912K -       1   0:08  0.00% kernel
 2274 boris         1  20    0 16700K  1952K select  0   0:05  0.00% top
 2531 boris         3  20    0   197M 30708K uwait   1   0:03  0.00% VirtualBox
   13 root          3  -8    -     0K    48K -       0   0:02  0.00% geom
 2535 boris         9  20    0 84256K 12324K uwait   1   0:01  0.00% VBoxSVC
 2533 boris         1  20    0 45176K  4972K select  1   0:01  0.00% VBoxXPCOMIPCD
    3 root          7  -8    -     0K   144K tx->tx  0   0:01  0.00% zfskern
   93 root          1 -20    -     0K    16K IPRT S  0   0:01  0.00% TIMER
 2031 root          1  20    0 60716K  5512K select  0   0:01  0.00% smbd
 2271 boris         1  20    0 67664K  4684K select  0   0:00  0.00% xterm
 2216 boris         1  20    0 67668K  6876K select  0   0:00  0.00% fluxbox
   16 root          1  16    -     0K    16K syncer  1   0:00  0.00% syncer
   14 root          1 -16    -     0K    16K -       0   0:00  0.00% yarrow
   15 root         32 -68    -     0K   512K -       0   0:00  0.00% usb
 2027 root          1  20    0 51628K  3780K select  0   0:00  0.00% nmbd
    1 root          1  20    0  6280K   424K wait    1   0:00  0.00% init
   18 root          1 -16    -     0K    16K sdflus  1   0:00  0.00% softdepflush
 2215 boris         1  20    0 67664K  3904K select  0   0:00  0.00% xterm
 1740 root          1  20    0 12184K  1272K select  1   0:00  0.00% syslogd
 2575 boris         1  20    0 17572K  2296K wait    1   0:00  0.00% bash
 2570 root          1  20    0 68016K  4280K sbwait  1   0:00  0.00% sshd
   17 root          1 -16    -     0K    16K vlruwt  0   0:00  0.00% vnlru
 2077 root          1  20    0 14260K  1292K nanslp  0   0:00  0.00% cron
    9 root          1 -16    -     0K    16K psleep  1   0:00  0.00% bufdaemon
 2574 boris         1  20    0 68016K  4292K select  1   0:00  0.00% sshd
    2 root          1 -16    -     0K    16K -       1   0:00  0.00% fdc0
 2636 boris         1  20    0 16700K  2012K CPU0    1   0:00  0.00% top
 2035 root          1  20    0 58144K  4736K select  0   0:00  0.00% winbindd
 2241 boris         1  20    0 17572K  2244K ttyin   0   0:00  0.00% bash
 1373 root          1  20    0 10372K  3396K select  0   0:00  0.00% devd
 2273 boris         1  20    0 17572K  2220K wait    1   0:00  0.00% bash
    6 root          1 -16    -     0K    16K psleep  0   0:00  0.00% pagedaemon
 2143 root          1  43    0 58136K  4764K select  1   0:00  0.00% winbindd
 2068 root          1  20    0 46876K  3112K select  0   0:00  0.00% sshd
 2144 root          1  43    0 58084K  4948K select  1   0:00  0.00% winbindd
 2147 root          1  52    0 12184K   984K ttyin   1   0:00  0.00% getty
 2149 root          1  52    0 12184K   984K ttyin   0   0:00  0.00% getty
 2154 root          1  52    0 58136K  5108K select  1   0:00  0.00% winbindd
 2146 root          1  52    0 12184K   984K ttyin   1   0:00  0.00% getty
 2152 root          1  52    0 12184K   984K ttyin   0   0:00  0.00% getty
 2150 root          1  52    0 12184K   984K ttyin   0   0:00  0.00% getty
 2153 root          1  52    0 12184K   984K ttyin   1   0:00  0.00% getty
 2148 root          1  52    0 12184K   984K ttyin   1   0:00  0.00% getty
 2151 root          1  52    0 12184K   984K ttyin   0   0:00  0.00% getty
 2155 root          1  20    0 60716K  5444K select  1   0:00  0.00% smbd
 1268 root          1  52    0 14364K   976K select  0   0:00  0.00% moused
    8 root          1 155 ki31     0K    16K pgzero  0   0:00  0.00% pagezero
   94 root          2 -16    -     0K    32K sleep   0   0:00  0.00% ng_queue
    4 root          1 -16    -     0K    16K waitin  0   0:00  0.00% sctp_iterator
    7 root          1 -16    -     0K    16K psleep  1   0:00  0.00% vmdaemon
   10 root          1 -16    -     0K    16K audit_  0   0:00  0.00% audit
    5 root          1 -16    -     0K    16K ccb_sc  0   0:00  0.00% xpt_thrd

At the end of writing this post, wired mem is now up by another few hundred megabytes:
Code:
Mem: 122M Active, 51M Inact, [B]5922M Wired,[/B] 228K Cache, 1814M Free

while VBox process is still stable in terms of mem utilisation:
Code:
 2550 boris        18  20    0  [B]2479M[/B]  2276M IPRT S  0   6:54 31.69% VirtualBox

I then looked elsewhere and found the following:
Code:
vmstat -m | grep solaris
      solaris 62898 3180453K       -  2267653  16,32,64,128,256,512,1024,2048,4096

The 3180453K keeps on going up.

After capturing some of the output, it is now:

Code:
vmstat -m | grep solaris
      solaris 63893 3284856K       -  2374412  16,32,64,128,256,512,1024,2048,4096

I have had this system under zfs v15 and FreeBSD 8.2 STABLE running fine for quite a few months.
I am wondering if I should do something to ZFS.

For info, I have the following in /boot/loader.conf:

Code:
more /boot/loader.conf 
zfs_load="YES"
vfs.root.mountfrom="zfs:zroot"
hw.hptrr.attach_generic=0
# Disable ZFS prefetching

# http://southbrain.com/south/2008/04/the-nightmare-comes-slowly-zfs.html
# Increases overall speed of ZFS, but when disk flushing/writes occur,
# system is less responsive (due to extreme disk I/O).
# NOTE: Systems with 4 GB of RAM or more have prefetch enabled by default.
vfs.zfs.prefetch_disable="1"

# Decrease ZFS txg timeout value from 30 (default) to 5 seconds.  This
# should increase throughput and decrease the "bursty" stalls that
# happen during immense I/O with ZFS.
# http://lists.freebsd.org/pipermail/freebsd-fs/2009-December/007343.html
# http://lists.freebsd.org/pipermail/freebsd-fs/2009-December/007355.html
# default in FreeBSD since ZFS v28
vfs.zfs.txg.timeout="5"

and in /etc/sysctl.conf:

Code:
more /etc/sysctl.conf 
# $FreeBSD: release/9.0.0/etc/sysctl.conf 112200 2003-03-13 18:43:50Z mux $
#
#  This file is read when going to multi-user and its contents piped thru
#  ``sysctl'' to adjust kernel values.  ``man 5 sysctl.conf'' for details.
#

# Uncomment this to prevent users from seeing information about processes that
# are being run under another UID.
#security.bsd.see_other_uids=0
# Increase number of vnodes; we've seen vfs.numvnodes reach 115,000
# at times.  Default max is a little over 200,000.  Playing it safe...
# If numvnodes reaches maxvnode performance substantially decreases.
kern.maxvnodes=250000


# Set TXG write limit to a lower threshold.  This helps "level out"
# the throughput rate (see "zpool iostat").  A value of 256MB works well
# for systems with 4 GB of RAM, while 1 GB works well for us w/ 8 GB on 
# disks which have 64 MB cache. <
>

vfs.zfs.txg.write_limit_override=1073741824

Any help in progressing on identifying what could cause the constant increase in memory utilisation would help as I suspect it does not help when memory runs out at some stage.

Thanks,

Boris
 
Try if limiting the size of ARC cache makes a difference, put this in /boot/loader.conf
Code:
vfs.zfs.arc_max="2048M"

2GB is just my guess for a good value for your system, play around with it if necessary.
 
The ZFS ARC uses Wired memory. Thus, as that value increases, it means the ZFS ARC is increasing in size. Until it runs out of memory and locks up the system.

The reason this happens is that you are double (possibly tiple) caching every file accessed in the VM. The VM accesses a file, and caches it in the VM's RAM. Since the virtual disk image is in the pool, ZFS caches the blocks in the ARC. The more files you access in the VM, the more RAM ZFS uses to cache the blocks in the ARC. Until there no RAM left in the host, and the pagedaemon (which looks for memory pages to write out to swap/disk) locks everything up.

You need to limit the ARC size so that there's enough room in RAM for ARC + VirtualBox + VM memory. Possibly even set the primarycache property on the ZFS filesystem where the VM disk image is stored to metadata (only cache file metadata instead of the actual file contents). That way, only the VM caches the accessed files, instead of the VM and the host system.
 
Thanks a lot it seems to have addressed the memory which is now stable after a few hours wired memory has stopped increasing.

To be clear, at the time I read your response, wired mem reached 7768M (it means it took only a few hours to reach that level of utilisation).

I applied your recommended config for ZFS, rebooted and now wired does not go beyond 4472M and the
Code:
 vmstat -m | grep solaris
is also stable peaking at 2105084K or so.

Obviously, I forced some I/O activity from the VM to the shared folder (zpool) and it is still going on as I am writing this.

Thanks again, I will let it run for the night, add another VM to make sure it is rock solid.

Boris
 
To close on this, thanks again for the recommendation, tested over night and no issue found, memory utilisation is still as it should.

Thanks for your support,

Boris
 
Back
Top