ZFS ZFS scrub stalling - Windows VM with SQL server misbehaves

tcn · May 27, 2019

Hi,

I am running a FreeBSD 12.0 server with a few VMs. One of them is a Windows Server 2016 which homes a few software and an SQL database. The VMs all have a ZVOL assigned.
If I start scrubbing the pool, first the scrubbing is really slow and if running sufficiently long will stall. The Windows Server VM will not crash but any access to the SQL database will fail.
I have to stop and restart the VM after cancelling the scrub.

I don't remember when was the first time I had a successful scrub. I tried many things but for now, I had to suspend the scrub. I will wait for the weekend to stop VMs and try to scrub again.

Must the scrub always be performed when VMs are powered down? It used to work well but I have to admit that this particular VM is getting busier. Maybe scrubbing an MSSQL database is a bad idea?
Would increasing the ARC cache (using ARCL2) do any good?

Thanks,
tcn

zirias@ · May 27, 2019

scrubbing is at its core a simple operation: read (and therefore verify checksums of) each used block. So there shouldn't be any problem like what you describe. I have several bhyve VMs with zvol disks on my system as well (two of them running Windows) and never had a problem with scrub. Well, I don't run database servers in those Windows VMs ...

ralphbsz · May 27, 2019

As Zirias said: Scrubbing should be transparent from a functionality point of view. And scrubbing is a good and necessary thing (see the old NetApp research paper), so don't stop doing it.

But scrubbing competes for throughput with the normal workload. Try this: Turn scrubbing off, and use tools like iostat and "zpool iostat" to see how much workload you are getting from your Windows VM. They should not be keeping the disk 100% busy. Then turn scrubbing on, while watching the disk IO. If you are getting to the point where having both normal workload and scrubbing together completely overload the system ... then you have a difficult to solve problem.

You might also be running out of memory. Matter-of-fact, I would try *reducing* the ARC cache in addition to increasing it; it could be that your system doesn't have enough physical memory for both your VMs and other workload, and ZFS internal usage, which increases while scrubbing.

tcn · May 31, 2019

Thanks for your replies!

I have already decreased the ARC; it was taking half the 128GB of memory when I decided to drop it down.
sysctl vfs.zfs.arc_max

Code:

vfs.zfs.arc_max: 34359738368

Currently I have this as a memory snapshot status (all VMs running): top

Code:

last pid: 17155;  load averages:  5.01,  3.55,  2.96                                                                                             up 41+11:42:01  08:56:19
321 processes: 1 running, 319 sleeping, 1 zombie
CPU:  3.2% user,  0.0% nice,  4.4% system,  0.0% interrupt, 92.4% idle
Mem: 16G Active, 42G Inact, 4043M Laundry, 49G Wired, 257M Buf, 14G Free
ARC: 31G Total, 11G MFU, 16G MRU, 91M Anon, 1179M Header, 3240M Other
     24G Compressed, 45G Uncompressed, 1.89:1 Ratio
Swap: 64G Total, 17G Used, 47G Free, 26% Inuse

and :
[zpool iostat -v emTank

Code:

                               capacity     operations    bandwidth
pool                        alloc   free   read  write   read  write
--------------------------  -----  -----  -----  -----  -----  -----
emTank                      10.4T  11.0T    174    396  1.86M  5.53M
  raidz2                    10.4T  11.0T    174    396  1.86M  5.53M
    gpt/disk0.eli      -      -     12     60   331K  1.74M
    gpt/disk6.eli      -      -     13     61   339K  1.75M
    gpt/disk2.eli      -      -     13     60   359K  1.74M
    gpt/disk3.eli      -      -     12     57   332K  1.74M
    gpt/disk4.eli      -      -     12     60   339K  1.75M
    gpt/disk5.eli      -      -     13     60   359K  1.74M
--------------------------  -----  -----  -----  -----  -----  -----

The only thing I do not like is the fragmentation level:
zpool list

Code:

NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
emTank    21.4T  10.4T  11.0T        -         -    28%    48%  1.00x  ONLINE  -

We are to move soon and I will probably rebuild the array with brand new disks to get rid of fragmentation.

I will check the I/O statistics while running a scrub over the weekend. It usually hangs at about 60 Gigs or so. I will also check again for memory shortage but I doubt it; I do not remember having memory issue when I tried to diagnose this problem at first.

Best regards,
tcn

tcn · Jun 1, 2019

I started a scrub tonight. It is ongoing but the issued is very slow. I guess it is the wait for the issued portion that the scanned speed decreases. It is, when scanning, around 650M/s.
I read somewhere that scrub will take about 10% or memory (I guess 10% of free memory) so maybe it is a memory issue...... 13G free should be enough though no?

Code:

  pool: emTank
state: ONLINE
  scan: scrub in progress since Fri May 31 20:02:18 2019
        439G scanned at 187M/s, 64.9M issued at 27.7K/s, 10.4T total
        0 repaired, 0.00% done, no estimated completion time
config:

        NAME                        STATE     READ WRITE CKSUM
        emTank                      ONLINE       0     0     0
          raidz2-0                  ONLINE       0     0     0
            gpt/disk0.eli  ONLINE       0     0     0
            gpt/disk6.eli  ONLINE       0     0     0
            gpt/disk2.eli  ONLINE       0     0     0
            gpt/disk3.eli  ONLINE       0     0     0
            gpt/disk4.eli  ONLINE       0     0     0
            gpt/disk5_S1Z28YZJ.eli  ONLINE       0     0     0
        spares
          gpt/disk7.eli    AVAIL  
          gpt/disk1.eli    AVAIL  

errors: No known data errors

tcn

tcn · Jun 2, 2019

I had to stop the scrub. It hung at 4TB with 10TB to scan. Speed was just going down and estimated completion time was increasing.

I am not sure what makes the scrub stop like this. ZFS io stats are normal, memory is still ok (about 5 GB free).

Is it bad to snapshot during a scrub? I have frequent snapshots (15 mins) on some fast changing directories and snapshots transfer (from this host to a remote one) during the night. Maybe this is what is causing the delay or the stall?

I will have to re-test next weekend.

tcn

ZFS ZFS scrub stalling - Windows VM with SQL server misbehaves

tcn

zirias@

ralphbsz

tcn

tcn

tcn