ZFS Need help with interpreting zfs status and statistic reports

We have a FreeBSd-12.0p3 host that originally started life as version 10.2. It is used to host a number of BHyve vms, all but one of which are running FreeBSD-12.0. as well. INET09 is supposed to be a CentOS-6.9 system but will not boot into the installer. SAMBA-01 is a FreeBSD-10.2 system which we are not touching until the alternate DC SAMBA-02 is configured and joined to the domain.
Code:
vm list
NAME      DATASTORE  LOADER     CPU  MEMORY  VNC  AUTOSTART  STATE
inet09    default    grub       2    4G      -    No         Locked (vhost03.hamilton.harte-lyne.ca)
inet13    default    bhyveload  2    4G      -    Yes [1]    Running (2018)
inet14    default    bhyveload  2    4G      -    Yes [2]    Running (2027)
inet16    default    bhyveload  2    4G      -    Yes [3]    Running (2863)
inet17    default    bhyveload  2    4G      -    Yes [4]    Running (80363)
inet18    default    bhyveload  2    4G      -    Yes [5]    Running (3129)
inet19    default    bhyveload  2    4G      -    Yes [6]    Running (3179)
samba-01  default    bhyveload  2    4G      -    Yes [7]    Running (3156)
samba-02  default    bhyveload  2    4G      -    Yes [8]    Running (45944)

The majority of our vms were created using this template:
Code:
loader="bhyveload"
cpu=2
memory=4G
utctime=yes
network0_type="virtio-net"
network0_switch="public"
disk0_type="virtio-blk"
disk0_name="disk0"
disk0_dev="sparse-zvol"


Recently our zfs health scans have reported that we are at 80% utilisation. These are the current storage statistics:

Code:
zpool list
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
bootpool  1.98G   283M  1.71G        -         -    15%    13%  1.00x  ONLINE  -
zroot     10.6T  8.52T  2.11T        -         -    55%    80%  1.00x  ONLINE  -

zfs list
NAME                      USED  AVAIL  REFER  MOUNTPOINT
bootpool                  283M  1.58G   280M  /bootpool
zroot                    4.33T   674G   140K  /zroot
zroot/ROOT               30.1G   674G   140K  none
zroot/ROOT/default       30.1G   674G  16.1G  /
zroot/tmp                10.6M   674G   215K  /tmp
zroot/usr                 901M   674G   140K  /usr
zroot/usr/home            140K   674G   140K  /usr/home
zroot/usr/ports           900M   674G   900M  /usr/ports
zroot/usr/src             140K   674G   140K  /usr/src
zroot/var                 154M   674G   140K  /var
zroot/var/audit           140K   674G   140K  /var/audit
zroot/var/crash           140K   674G   140K  /var/crash
zroot/var/log            94.3M   674G  9.54M  /var/log
zroot/var/mail           2.62M   674G   174K  /var/mail
zroot/var/tmp            56.6M   674G  56.1M  /var/tmp
zroot/vm                 4.28T   674G  10.7G  /zroot/vm
zroot/vm/inet09           157K   674G   157K  /zroot/vm/inet09
zroot/vm/inet13           617G   674G   169K  /zroot/vm/inet13
zroot/vm/inet13/disk0     617G   674G   185G  -
zroot/vm/inet14           487G   674G   151K  /zroot/vm/inet14
zroot/vm/inet14/disk0     487G   674G  99.7G  -
zroot/vm/inet16           266G   674G   169K  /zroot/vm/inet16
zroot/vm/inet16/disk0     266G   674G  65.4G  -
zroot/vm/inet17          1.39T   674G   169K  /zroot/vm/inet17
zroot/vm/inet17/disk0    1.39T   674G   391G  -
zroot/vm/inet18           607G   674G   157K  /zroot/vm/inet18
zroot/vm/inet18/disk0     607G   674G   113G  -
zroot/vm/inet19           507G   674G   169K  /zroot/vm/inet19
zroot/vm/inet19/disk0     507G   674G   209G  -
zroot/vm/samba-01         198G   674G   169K  /zroot/vm/samba-01
zroot/vm/samba-01/disk0   198G   674G  82.2G  -
zroot/vm/samba-02        48.6G   674G   169K  /zroot/vm/samba-02
zroot/vm/samba-02/disk0  48.6G   674G  25.9G  -
zroot/vm/samba_dc01.img   210G   880G  3.34G


Also, quite recently, we have had inet17, which hosts our cyrus-imap service, hang on several occasions. In every case there is nothing in the various logs (mesages/maillog/auth.log) on the vm nor anything in the vm log or in /var/log/messages on the host. I am casting about for possible contributing factors and my concern is that zfs might be one.

The host was originally installed using the guided zfs on root option. I can no longer recall how I configured BHyve when it was installed. The zpool arrangement is:
Code:
zpool status
  pool: bootpool
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
    still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
    the pool may no longer be accessible by software that does not support
    the features. See zpool-features(7) for details.
  scan: scrub repaired 0 in 0 days 00:00:09 with 0 errors on Thu Mar 28 16:22:10 2019
config:

    NAME        STATE     READ WRITE CKSUM
    bootpool    ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        ada0p2  ONLINE       0     0     0
        ada1p2  ONLINE       0     0     0
        ada2p2  ONLINE       0     0     0
        ada3p2  ONLINE       0     0     0

errors: No known data errors

  pool: zroot
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
    still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
    the pool may no longer be accessible by software that does not support
    the features. See zpool-features(7) for details.
  scan: scrub in progress since Sat May  4 02:45:00 2019
    7.33T scanned at 24.7M/s, 7.29T issued at 24.6M/s, 8.54T total
    0 repaired, 85.38% done, 0 days 14:47:41 to go
config:

    NAME            STATE     READ WRITE CKSUM
    zroot           ONLINE       0     0     0
      raidz2-0      ONLINE       0     0     0
        ada0p4.eli  ONLINE       0     0     0
        ada1p4.eli  ONLINE       0     0     0
        ada2p4.eli  ONLINE       0     0     0
        ada3p4.eli  ONLINE       0     0     0

The host system was set up as encrypted zfs on root raidz2 with four (4) 2T hdds and has four (4) additional 2T drives in the chassis (8x2T total).

What is the suggested course of action?
 
What is the suggested course of action?
Well, what exactly is your question?
Disks full? Add more storage space or clean up old stuff. With regards to CentOS not working, did you install it with XFS perhaps? As far as I know that doesn't play nice, try ext4 instead.
 
My real question is: Why did one of my vm's simply stop responding? Is this problem storage related? Whether it is relevant or not none of my vm's exhibited this behaviour before upgrading to 12.0.

My direct question is: Do I actually have a problem with storage?

My follow up question depends upon the answer to the first: If I have a problem with storage then what is the best course of action to increase it?

With other LVM systems I would simply increase the size of the LV by adding one of the spare disks and increasing the logical volume size. But, with ZFS I understand that one cannot add a disk to a raidz-2 pool. I have run across stray references to FreeBSD-12 being able to do that but no actual documentation describing if or how this can be done. Can it? If so where is the documentation? I can find nothing in the current handbook respecting this feature.

The CentOS reference was simply to explain the vm in the locked state. That vm never completed installation because it never reaches the point where the CentOS installer actually runs. This is a problem that I raised elsewhere on these forums and has been put to the side for now. Apparently one needs a particular OS release ISO to get it to work. I have obtained an ISO image that is purported to work but have not yet found the time to try it out.
 
My real question is: Why did one of my vm's simply stop responding? Is this problem storage related?
It's more likely memory related. Did you limit the ARC size? How much memory does the host have? And how much of it is assigned to VMs? Make sure there's enough left over for ARC and the OS of the host itself. I've noticed that bhyve and ZFS can start competing for the same free memory which results in both fighting over it and stalling in the mean time.

My direct question is: Do I actually have a problem with storage?
Besides the high-water (80%) mark? Not that I can tell.
 
It's more likely memory related. Did you limit the ARC size?
Code:
sysctl vfs | egrep 'arc'
vfs.zfs.arc_min_prescient_prefetch_ms: 6
vfs.zfs.arc_min_prefetch_ms: 1
vfs.zfs.l2arc_norw: 1
vfs.zfs.l2arc_feed_again: 1
vfs.zfs.l2arc_noprefetch: 1
vfs.zfs.l2arc_feed_min_ms: 200
vfs.zfs.l2arc_feed_secs: 1
vfs.zfs.l2arc_headroom: 2
vfs.zfs.l2arc_write_boost: 8388608
vfs.zfs.l2arc_write_max: 8388608
vfs.zfs.arc_meta_strategy: 0
vfs.zfs.arc_meta_limit: 16453230592
vfs.zfs.arc_free_target: 347916
vfs.zfs.arc_kmem_cache_reap_retry_ms: 0
vfs.zfs.compressed_arc_enabled: 1
vfs.zfs.arc_grow_retry: 60
vfs.zfs.arc_shrink_shift: 7
vfs.zfs.arc_average_blocksize: 8192
vfs.zfs.arc_no_grow_shift: 5
vfs.zfs.arc_min: 8226615296
vfs.zfs.arc_max: 65812922368
I think that these are the defaults. I cannot recall actually setting any of these.

How much memory does the host have?
Code:
sysctl hw | egrep 'hw.(phys|user|real)'
hw.physmem: 68650315776
hw.usermem: 37536940032
hw.realmem: 68719476736
And how much of it is assigned to VMs?
Code:
grep memory /zroot/vm/**/*conf
/zroot/vm/inet09/inet09.conf:memory=4G
/zroot/vm/inet13/inet13.conf:memory=4G
/zroot/vm/inet14/inet14.conf:memory=4G
/zroot/vm/inet16/inet16.conf:memory=4G
/zroot/vm/inet17/inet17.conf:memory=4G
/zroot/vm/inet18/inet18.conf:memory=4G
/zroot/vm/inet19/inet19.conf:memory=4G
/zroot/vm/samba-01/samba-01.conf:memory=4G
/zroot/vm/samba-02/samba-02.conf:memory=4G
9 x 4 = 36Gb or about 56%

Make sure there's enough left over for ARC and the OS of the host itself. I've noticed that bhyve and ZFS can start competing for the same free memory which results in both fighting over it and stalling in the mean time.

From what I can tell from the above un allocated memory should equal about 28Gb.

Besides the high-water (80%) mark? Not that I can tell.

Does FreeBSD-12 support adding disks to a raidz pool?
 
I think that these are the defaults. I cannot recall actually setting any of these.
The only one you need to set is vfs.zfs.arc_max. By default ZFS ARC will have a max of 1GB less than the total amount of RAM. You typically will want to limit this.

Does FreeBSD-12 support adding disks to a raidz pool?
Adding an additional set of 4 disks with RAID-Z(2) is certainly supported. You can also replace each 2TB drive with a 4TB (or even bigger) drive, one at a time. And grow the entire pool (autoexpand) once all disks have been replaced and resilvered. But I suspect you want to add a single disk to the same pool (RAID-Z expansion)? I'm not sure that landed already, I'm not even sure the current ZFS will get that feature. It may come when ZoL is imported and enabled though.
 
Re: Storage.
Given the number of vms installed, most of which run the same OS as the host, would enabling zfs de-duplication be worth considering in my situation?
 
Irrelevant for us. It has to do with changes in the Linux 5.0 kernel, which we don't have, or use, or import.
I am confused over the nomenclature then. If the Linux kernel will not support ZFS on Linux then has ZoL become something else?
 
I've seen references by ZFS developers that performance begins to degrade at >=80% utilization. You've hit 80% and recently started experiencing issues. Sounds like correlation to me.

First Internet search hit:
Unfortunately for you, with jails and things like transmission running, that does not apply to you: you will start to experience pain, then painful pain, then severe pain, then agony as your write speeds drop, because you are writing other stuff to the pool, and doing this introduces fragmentation, which is where the 80% thing comes from. It won't actually BREAK anything to pass the 80% mark, but it could eventually get bad enough that you're cursing ZFS and want to swear off it for the rest of your life. Every time you free space on your pool, you create little fragmented regions of free space in between the other stuff that's on your pool. As the pool fills to its ultimate capacity, ZFS is forced to allocate those useless little bits of scattershot space in order to fulfill your space requests, and that's a miserable experience to fill those all in.

That's write performance-specific, but maybe it's what's causing your VM to hang or crash, too.
 
Given the number of vms installed, most of which run the same OS as the host, would enabling zfs de-duplication be worth considering in my situation?
Rule of thumb is that you need 5GB of RAM per TB of storage in order to use dedup effectively. With 10TB of storage you'll need at least 50GB of RAM for ZFS alone. I wouldn't go there.
 
All the systems we are running on this host ran formerly on a host with just 1T of storage. Going to IMAP-3 is supposed to consume more disc but a greater than eight-fold increase seems a little outlandish. Therefore I infer something else is at work.

We do regular zfs snapshots of the host and also on those guests that have zfs filesystems. I have a couple of questions relating to this practice:

1. Is there any benefit to running snapshots on the guests as well as the hosts?
2. How much additional space would this take over just running snapshots on the host?
3. How can I determine how much space would be recovered by rebasing/deleting snapshots on a given guest and on the host system?
4. We run regular scrubs on both the host and the guests. Is there any benefit to doing this on the guests as well as the host?

I am not really sure of what questions to ask in these circumstances.
 
Back
Top