Need assistance with sluggish ZFS host

Hi Everyone,

I am trying to come back to FreeBSD after turning to Linux in 1998. What I am building is a host with ZFS and several MySQL instances in jails. Well, it works but I am hitting some weird behavior which I can't explain.

First a bit about my hardware, yes I remember I had to craft it to match to FreeBSD not otherwise. Here it is:

1TB of RAM
2x Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz
16x TOSHIBA PX05SVB096Y SAS3 SSD
and as it is Dell it has Dell's HBA330+ controller
Intel(R) Ethernet Connection XL710
4 NUMA domains

FreeBSD 11.1-release-p9

There is 2x ZFS:
- system root with 2x nvmem M.2 drives running as mirror
- data root with 16x Toshiba in mirrors (8 vdevs)
- compression enabled (lz4), no dedup

My plan was to utilize most of RAM as ARC and run MySQLs with direct flush or small buffers. I started testing it by running some well known SQL statements and it worked very well, however, not always.

Let's rewind to the very first day.

I had to install MySQL data from xtrabackup files. I had them in tar+lzop, 800G in file, around 1.8T unpacked. Run "lzop -d file | tar xf -" and bang - after unpacking around a half of its size system stopped responding properly. I repeated this a few times with exactly the same result:

- Whatever was open, say, a file in vim, still worked
- any other action such as ls, vi, top - hung frozen

After a few tries I started top first and kept watching. Nice, got some clue - these processes are waiting for another process trying to free ARC. Left for hours didn't progress so warm reboot.

Next step was to limit arc to 640G and set mem target and minimum to be 128/64. It worked, I was able to unpack files. Then I run MySQL with some tests with no issues, however, I noted wired memory growing fast and far beyond ARC. Disabled UMA - helped a bit but made it slow. Enabled UMA again.

Then further testing with unpacking. After having two tests run over MySQL and ARC sitting around 350G while wired around 500G, I started "pbzip2 -d file > deleteme". file.bz2 was 400G size with 1.8T packed content.
Whole process runs at 15% of drives speed until around 400G is unpacked. Then it slows down to next to nothing. ARC is full (640G), wired leaving around 63G free (less than min set) and still growing slowly. System is terribly slow, I am able to launch top but it takes minutes to start, no option to open a file, ls sometimes returns after minutes. Killing bzip doesn't help. Once I tried to delete "deleteme" file while system was dying, actually I had two files, one from previous test. Run "rm file1; rm file2" then ls. Files were gone but system deteriorated further. Warm reboot after around 30 minutes of nothing but top still running and reporting 59G free in RAM. After reboot surprise - files are back :) Conclusion - unable to flush transaction.
After that I limited number of blocks deleted in single transaction to 200000 and then to 50000 but that had no influence.

I switched timers to HPET as I allowed C3 state. Disallowing C3 (still C1/C2 allowed and powerd running) made SQL test faster but nothing else better.

So where am I now?
I am able to kill it every time I run unpacking for long enough. I tested compression=disabled with no difference. While dead it deteriorates slowly even when left alone, shows load around 5 while only top runs and reboot fixes it. BTW - it also doesn't reboot stopping after famous "uptime", disabling usb wait doesn't help.

What should I look at or read about to fix it?

Cheers,
 
Have you tried monitoring your system to see where the bottleneck is? Meaning, run all of top(1) ( top -SH and top -SH -mio -ototal), gstat(8) ( gstat -p), systat(1) ( systat -vmstat, systat -zarc, systat -iostat, systat -pigs), zpool(8) ( zpool iostat -Td, zpool status -v -Td 5) simultaneously in ssh sessions. Also tell syslogd(8) to forward events to a remote host, so you can monitor what's happening. This way you won't be bothered if host gets unresponsive again.

Do you have a swap partition ( swapctl -l)? Can you turn it off ( swapoff -a)?
 
- gstat with no options shows little activity on drives and when dead next to nothing

[root@testmother ~]# swapctl -l
Device: 1024-blocks Used:
/dev/ada0p2 2097152 0
/dev/ada1p2 2097152 0

vm.swap_enabled="0"

I run some of the others and didn't spot anything. I will run them again tomorrow to post results.

Cheers,
 
I've had similar issues on a few VMs I created. Turned out to be memory related. There was too much trying to grab RAM and when it ran out it eventually stalled everything.
 
For the VMs it was easy, I just gave them more memory. But in your case that's probably not useful. For your situation I would probably limit ARC (since it interferes with MySQL's own caching) and verify all MySQL instances are not configured to use too much memory. Make sure ARC plus all maxed memory usages of all the MySQL instances add up to less than the amount of RAM the machine has. You can also find a couple of specific settings for each ZFS filesystem that has a MySQL database on them: https://wiki.freebsd.org/ZFSTuningGuide#MySQL
 
I've had similar issues on a few VMs I created. Turned out to be memory related. There was too much trying to grab RAM and when it ran out it eventually stalled everything.

Any tips how to limit it?
For the VMs it was easy, I just gave them more memory. But in your case that's probably not useful. For your situation I would probably limit ARC (since it interferes with MySQL's own caching) and verify all MySQL instances are not configured to use too much memory. Make sure ARC plus all maxed memory usages of all the MySQL instances add up to less than the amount of RAM the machine has. You can also find a couple of specific settings for each ZFS filesystem that has a MySQL database on them: https://wiki.freebsd.org/ZFSTuningGuide#MySQL

Yep, would be difficult to add while it has 1TB already.

I set MySQL buffer to be 2G only so that whole process takes 3G. I am aware of double cache issue, I "tried innodb_flush_method = O_DIRECT" but SQL alone is not a problem, it dies when I copy files. I started thinking it could be too much RAM for CPU computing power. It is easy to scan 64G of RAM to find and evict 5G but not as easy to scan 700G of RAM to find and evict 250G.

I noticed there is a setting for ZFS which sets number of background workers doing compression, default to 3/4 of available threads. Some people say having it at 200% helps. Is there anything similar working on ARC or Wired? Working on scans and evictions? Maybe I need more CPU power there as the RAM is giant?

How can I find what else stays in Wired? It grows to 1.4 of ARC size.
 
I tried to rsync files out to defragment and to change recordsize which I screwed (128k instead 16k under InnoDB). Started rsyc and:

- Run 50-100MB/s while from ARC.
- slowed down 10x when run beyond cached data. I tested it by re-rsyncing large file which again run fast to the same point (around 50%) and then slowed down the same way.
- Left for hours deteriorated and now there is no transfer but rsync still there. It seems the same issue which I had before and I am able to launch stuff so hopefully I would be able to collect these commands above.

First top:

Code:
last pid: 39994;  load averages:  3.08,  3.16,  3.91                                                                                                                                                                 up 0+21:04:14  07:16:55
39 processes:  4 running, 34 sleeping, 1 stopped
CPU:  0.0% user,  0.0% nice,  5.7% system,  0.0% interrupt, 94.3% idle
Mem: 22M Active, 55M Inact, 25M Laundry, 692G Wired, 306G Free
ARC: 642G Total, 622G MFU, 13G MRU, 428K Anon, 7368M Header, 821M Other
     634G Compressed, 874G Uncompressed, 1.38:1 Ratio
Swap: 4096M Total, 4096M Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
  870 root          1  52    0 46444K  3420K CPU38  38   0:01  49.83% sudo
36306 zabbix        1  52    0 34204K  3820K piperd 35  73:27   8.46% zabbix_agentd
32268 zabbix        1  52    0 13180K  2728K CPU16  16   0:00   8.40% sh
98222 zabbix        1  52    0  8356K  2148K piperd 49   0:00   7.55% sed
96024 root          1  52    0 46444K  3428K CPU15  15   0:00   7.18% sudo
41294 zabbix        1  52    0 34204K  3824K select 40  73:15   0.78% zabbix_agentd
28420 root          1  20    0 20164K  3588K CPU56  56   0:02   0.05% top
10987 zabbix        1  20    0 35788K  9780K nanslp 70   0:43   0.05% zabbix_agentd
46093 root          1  20    0 10496K  2612K select 65   1:51   0.01% syslogd
13449 root          1  20    0 10464K  1552K select 66   0:10   0.01% powerd
69986 root          1  20    0 85232K  7828K select 64   0:03   0.01% sshd
90137 root          1  20    0 20584K 12492K select 57   0:12   0.00% ntpd
94177 root          1  52    0 11864K  2920K arc_re 31 522:08   0.00% rsync
80547 root          1  52    0 13912K  3676K arc_re 32 437:28   0.00% rsync
18407 zabbix        1  52    0 34204K  3808K piperd 58  73:35   0.00% zabbix_agentd
44610 zabbix        1  52    0 35788K  4744K piperd 65  13:11   0.00% zabbix_agentd
93587 root          1  52    0 28892K  3996K nanslp 61   2:48   0.00% smartd
69409 root          1  52    0 46652K 15584K STOP   79   0:48   0.00% dtrace
14650 root          1  20    0 85232K  7660K select  6   0:07   0.00% sshd

Stuck on arc_reclaim_thread!
 
top -SH

Code:
last pid: 52449;  load averages:  2.64,  2.73,  3.30                                                                                                                                                                 up 0+21:12:44  07:25:25
1557 processes:83 running, 1346 sleeping, 1 stopped, 127 waiting
CPU:  0.0% user,  0.0% nice,  2.5% system,  0.0% interrupt, 97.5% idle
Mem: 23M Active, 55M Inact, 25M Laundry, 691G Wired, 307G Free
ARC: 642G Total, 622G MFU, 13G MRU, 491K Anon, 7368M Header, 821M Other 634G Compressed, 874G Uncompressed, 1.38:1 Ratio
Swap: 4096M Total, 4096M Free

  PID USERNAME   PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
   11 root       155 ki31     0K  1280K CPU0    0  20.6H 100.00% idle{idle: cpu0}
   11 root       155 ki31     0K  1280K CPU6    6  20.6H 100.00% idle{idle: cpu6}
   11 root       155 ki31     0K  1280K CPU2    2  20.6H 100.00% idle{idle: cpu2}
   11 root       155 ki31     0K  1280K CPU7    7  20.6H 100.00% idle{idle: cpu7}
   11 root       155 ki31     0K  1280K CPU14  14  20.4H 100.00% idle{idle: cpu14}
   11 root       155 ki31     0K  1280K CPU56  56  20.3H 100.00% idle{idle: cpu56}
   11 root       155 ki31     0K  1280K CPU23  23  20.1H 100.00% idle{idle: cpu23}
   11 root       155 ki31     0K  1280K CPU17  17  20.1H 100.00% idle{idle: cpu17}
   11 root       155 ki31     0K  1280K CPU20  20  20.0H 100.00% idle{idle: cpu20}
   11 root       155 ki31     0K  1280K CPU35  35  19.9H 100.00% idle{idle: cpu35}
   11 root       155 ki31     0K  1280K CPU21  21  19.9H 100.00% idle{idle: cpu21}
   11 root       155 ki31     0K  1280K CPU59  59  19.8H 100.00% idle{idle: cpu59}
   11 root       155 ki31     0K  1280K CPU58  58  19.7H 100.00% idle{idle: cpu58}
   11 root       155 ki31     0K  1280K RUN    39  18.9H 100.00% idle{idle: cpu39}
43739 root        52    0 46444K  3428K select 25   0:05 100.00% sudo
   11 root       155 ki31     0K  1280K RUN    48  20.2H  99.77% idle{idle: cpu48}
   11 root       155 ki31     0K  1280K CPU22  22  20.3H  99.73% idle{idle: cpu22}
   11 root       155 ki31     0K  1280K CPU18  18  19.6H  99.55% idle{idle: cpu18}
   11 root       155 ki31     0K  1280K CPU60  60  18.9H  99.25% idle{idle: cpu60}
   11 root       155 ki31     0K  1280K CPU38  38  19.8H  98.55% idle{idle: cpu38}
   11 root       155 ki31     0K  1280K CPU31  31  19.9H  96.95% idle{idle: cpu31}
   11 root       155 ki31     0K  1280K CPU34  34  20.0H  93.34% idle{idle: cpu34}
   11 root       155 ki31     0K  1280K CPU1    1  20.7H  49.62% idle{idle: cpu1}
31100 root        28    0 24260K  6752K CPU48  48   0:01   0.21% top
   12 root       -60    -     0K  2032K WAIT    0   2:27   0.12% intr{swi4: clock (0)}
   27 root       -16    -     0K    16K -      23   1:31   0.10% racctd
10987 zabbix      20    0 35788K  9780K nanslp 23   0:44   0.04% zabbix_agentd
13449 root        20    0 10464K  1552K select 14   0:11   0.01% powerd
   16 root       -16    -     0K    16K -      31   0:37   0.01% rand_harvestq
46093 root        20    0 10496K  2612K select 56   1:51   0.01% syslogd
69986 root        20    0 85232K  7828K select 18   0:03   0.01% sshd
   21 root       -16    -     0K    16K psleep 38   0:04   0.00% vmdaemon
   12 root       -92    -     0K  2032K WAIT    0   2:42   0.00% intr{irq287: ixl0:q0}
90137 root        20    0 20584K 12492K select 60   0:12   0.00% ntpd{ntpd}
   12 root       -92    -     0K  2032K WAIT    0   0:01   0.00% intr{irq296: ixl1:q0}
    5 root        -8    -     0K   192K spa->s  6   4:09   0.00% zfskern{trim Raid10}
   12 root       -92    -     0K  2032K WAIT    1   0:33   0.00% intr{irq288: ixl0:q1}
    0 root       -92    -     0K 20112K -      56   0:01   0.00% kernel{ixl0 aq}
    5 root        -8    -     0K   192K spa->s 20   0:20   0.00% zfskern{trim zroot}


top -SH -mio -ototal

Code:
last pid: 90557;  load averages:  2.45,  2.62,  3.20                                                                                                                                                                 up 0+21:14:06  07:26:47
1561 processes:84 running, 1349 sleeping, 1 stopped, 127 waiting
CPU:  0.0% user,  0.0% nice,  7.3% system,  0.0% interrupt, 92.7% idle
Mem: 25M Active, 55M Inact, 25M Laundry, 691G Wired, 307G Free
ARC: 642G Total, 622G MFU, 13G MRU, 495K Anon, 7368M Header, 821M Other 634G Compressed, 874G Uncompressed, 1.38:1 Ratio
Swap: 4096M Total, 4096M Free

  PID USERNAME     VCSW  IVCSW   READ  WRITE  FAULT  TOTAL PERCENT COMMAND
    5 root           10      0      0     15      0     15  48.39% zfskern{txg_thread_enter}
87961 root            1      1     10      0      0     10  32.26% zfs
46093 root            4      0      0      2      0      2   6.45% syslogd
57285 root            3      2      2      0      0      2   6.45% sudo
52853 root            3      2      2      0      0      2   6.45% sudo
79214 root            0      0      0      0      0      0   0.00% ftcleanup
69409 root            0      0      0      0      0      0   0.00% dtrace
94177 root            0      0      0      0      0      0   0.00% rsync
38103 root            0      0      0      0      0      0   0.00% bash
69986 root            2      0      0      0      0      0   0.00% sshd
57386 root            0      0      0      0      0      0   0.00% bash
84516 root            0      0      0      0      0      0   0.00% sshd
15202 mailnull        0      0      0      0      0      0   0.00% exim
20801 root            0      0      0      0      0      0   0.00% bash
14650 root            0      0      0      0      0      0   0.00% sshd
37796 root            0      0      0      0      0      0   0.00% getty
32316 root            0      0      0      0      0      0   0.00% getty
25008 root            0      0      0      0      0      0   0.00% getty
23383 root            0      0      0      0      0      0   0.00% getty
19741 root            0      0      0      0      0      0   0.00% getty
14635 root            0      0      0      0      0      0   0.00% getty
13728 root            0      0      0      0      0      0   0.00% getty
11065 root            0      0      0      0      0      0   0.00% getty
89155 root            0      0      0      0      0      0   0.00% cron
18335 root            0      0      0      0      0      0   0.00% sshd
93587 root            0      0      0      0      0      0   0.00% smartd
13449 root            7      0      0      0      0      0   0.00% powerd
90137 root            2      0      0      0      0      0   0.00% ntpd{ntpd}
44610 zabbix          0      0      0      0      0      0   0.00% zabbix_agentd
41294 zabbix          5      1      0      0      0      0   0.00% zabbix_agentd
36306 zabbix          3      0      0      0      0      0   0.00% zabbix_agentd
18407 zabbix          0      0      0      0      0      0   0.00% zabbix_agentd
10987 zabbix          2      0      0      0      0      0   0.00% zabbix_agentd
93779 zabbix          0      0      0      0      0      0   0.00% zabbix_agentd
86550 root            0      0      0      0      0      0   0.00% devd
98361 root            0      0      0      0      0      0   0.00% moused
13727 root            0      0      0      0      0      0   0.00% adjkerntz
   27 root            2      0      0      0      0      0   0.00% racctd
   26 root            3      0      0      0      0      0   0.00% syncer


gstat -p

Code:
dT: 1.012s  w: 1.000s
L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    0      0      0      0    0.0      0      0    0.0    0.0| ada0
    0      0      0      0    0.0      0      0    0.0    0.0| ada1
    0      0      0      0    0.0      0      0    0.0    0.0| da0
    0      0      0      0    0.0      0      0    0.0    0.0| da1
    0      0      0      0    0.0      0      0    0.0    0.0| da2
    0      0      0      0    0.0      0      0    0.0    0.0| da3
    0      0      0      0    0.0      0      0    0.0    0.0| da4
    0      0      0      0    0.0      0      0    0.0    0.0| da5
    0      0      0      0    0.0      0      0    0.0    0.0| da6
    0      0      0      0    0.0      0      0    0.0    0.0| da7
    0      0      0      0    0.0      0      0    0.0    0.0| da8
    0      0      0      0    0.0      0      0    0.0    0.0| da9
    0      0      0      0    0.0      0      0    0.0    0.0| da10
    0      0      0      0    0.0      0      0    0.0    0.0| da11
    0      0      0      0    0.0      0      0    0.0    0.0| da12
    0      0      0      0    0.0      0      0    0.0    0.0| da13
    0      0      0      0    0.0      0      0    0.0    0.0| da14
    0      0      0      0    0.0      0      0    0.0    0.0| da15
 
systat -vmstat

Code:
    3 users    Load  2.63  2.66  3.10                  Apr 13 07:29
   Mem usage:  69%Phy 64%Kmem
Mem: KB    REAL            VIRTUAL                      VN PAGER   SWAP PAGER
        Tot   Share      Tot    Share    Free           in   out     in   out
Act  110724    9104   812164    13160 322617k  count
All  111268    9628   822448    19056          pages
Proc:                                                            Interrupts
  r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Flt        ioflt  1133 total
  1          31       349  125  494 1135   38  117     26 cow         cpu0:timer
                                                       69 zfod        cpu1:timer
2.6%Sys   0.0%Intr  0.0%User  0.0%Nice 97.4%Idle         ozfod       cpu41:time
|    |    |    |    |    |    |    |    |    |           %ozfod       cpu40:time
=                                                         daefr       cpu5:timer
                                           dtbuf      115 prcfr       cpu48:time
Namei     Name-cache   Dir-cache   8388608 desvn      846 totfr       cpu27:time
   Calls    hits   %    hits   %    662440 numvn          react       cpu78:time
     711     711 100                607811 frevn          pdwak       cpu54:time
                                                          pdpgs       cpu36:time
Disks  ada0  ada1   da0   da1   da2   da3   da4           intrn       cpu66:time
KB/t  14.67 15.46  0.00  0.00  0.00  0.00  0.00   723924k wire        cpu26:time
tps       8     7     0     0     0     0     0     22032 act         cpu11:time
MB/s   0.11  0.11  0.00  0.00  0.00  0.00  0.00     56216 inact       cpu43:time
%busy     0     0     0     0     0     0     0     25928 laund       cpu10:time
                                                  322617k free        cpu4:timer
                                                          buf         cpu8:timer
                                                                      cpu64:time
                                                                      cpu42:time
                                                                      cpu55:time
                                                                      cpu24:time
                                                                      cpu17:time
                                                                      cpu45:time
                                                                      cpu65:time
                                                                      cpu16:time
                                                                      cpu79:time
                                                                      cpu76:time
                                                                      cpu49:time
                                                                      cpu22:time
                                                                      cpu44:time
                                                                      cpu32:time
                                                                      cpu68:time
                                                                      cpu39:time
                                                                      cpu69:time
                                                                      cpu74:time
                                                                      cpu34:time
                                                                      cpu61:time
                                                                      cpu20:time
                                                                      cpu12:time

    3 users    Load  2.63  2.66  3.10                  Apr 13 07:29
   Mem usage:  69%Phy 64%Kmem
Mem: KB    REAL            VIRTUAL                      VN PAGER   SWAP PAGER
        Tot   Share      Tot    Share    Free           in   out     in   out
Act  110724    9104   812164    13160 322617k  count
All  111268    9628   822448    19056          pages
Proc:                                                            Interrupts
  r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Flt        ioflt  1133 total
  1          31       349  125  494 1135   38  117     26 cow         cpu0:timer
                                                       69 zfod        cpu1:timer
2.6%Sys   0.0%Intr  0.0%User  0.0%Nice 97.4%Idle         ozfod       cpu41:time
|    |    |    |    |    |    |    |    |    |           %ozfod       cpu40:time
=                                                         daefr       cpu5:timer
                                           dtbuf      115 prcfr       cpu48:time
Namei     Name-cache   Dir-cache   8388608 desvn      846 totfr       cpu27:time
   Calls    hits   %    hits   %    662440 numvn          react       cpu78:time
     711     711 100                607811 frevn          pdwak       cpu54:time
                                                          pdpgs       cpu36:time
Disks  ada0  ada1   da0   da1   da2   da3   da4           intrn       cpu66:time
KB/t  14.67 15.46  0.00  0.00  0.00  0.00  0.00   723924k wire        cpu26:time
tps       8     7     0     0     0     0     0     22032 act         cpu11:time
MB/s   0.11  0.11  0.00  0.00  0.00  0.00  0.00     56216 inact       cpu43:time
%busy     0     0     0     0     0     0     0     25928 laund       cpu10:time
                                                  322617k free        cpu4:timer
                                                          buf         cpu8:timer
                                                                      cpu64:time
                                                                      cpu42:time
                                                                      cpu55:time
                                                                      cpu24:time
                                                                      cpu17:time
                                                                      cpu45:time
                                                                      cpu65:time
                                                                      cpu16:time
                                                                      cpu79:time
                                                                      cpu76:time
                                                                      cpu49:time
                                                                      cpu22:time
                                                                      cpu44:time
                                                                      cpu32:time
                                                                      cpu68:time
                                                                      cpu39:time
                                                                      cpu69:time
                                                                      cpu74:time
                                                                      cpu34:time
                                                                      cpu61:time
                                                                      cpu20:time
                                                                      cpu12:time
    3 users    Load  2.63  2.66  3.10                  Apr 13 07:29
   Mem usage:  69%Phy 64%Kmem
Mem: KB    REAL            VIRTUAL                      VN PAGER   SWAP PAGER
        Tot   Share      Tot    Share    Free           in   out     in   out
Act  110724    9104   812164    13160 322617k  count
All  111268    9628   822448    19056          pages
Proc:                                                            Interrupts
  r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Flt        ioflt  1133 total
  1          31       349  125  494 1135   38  117     26 cow         cpu0:timer
                                                       69 zfod        cpu1:timer
2.6%Sys   0.0%Intr  0.0%User  0.0%Nice 97.4%Idle         ozfod       cpu41:time
|    |    |    |    |    |    |    |    |    |           %ozfod       cpu40:time
=                                                         daefr       cpu5:timer
                                           dtbuf      115 prcfr       cpu48:time
Namei     Name-cache   Dir-cache   8388608 desvn      846 totfr       cpu27:time
   Calls    hits   %    hits   %    662440 numvn          react       cpu78:time
     711     711 100                607811 frevn          pdwak       cpu54:time
                                                          pdpgs       cpu36:time
Disks  ada0  ada1   da0   da1   da2   da3   da4           intrn       cpu66:time
KB/t  14.67 15.46  0.00  0.00  0.00  0.00  0.00   723924k wire        cpu26:time
tps       8     7     0     0     0     0     0     22032 act         cpu11:time
MB/s   0.11  0.11  0.00  0.00  0.00  0.00  0.00     56216 inact       cpu43:time
%busy     0     0     0     0     0     0     0     25928 laund       cpu10:time
                                                  322617k free        cpu4:timer
                                                          buf         cpu8:timer
                                                                      cpu64:time
                                                                      cpu42:time
                                                                      cpu55:time
                                                                      cpu24:time
                                                                      cpu17:time
                                                                      cpu45:time
                                                                      cpu65:time
                                                                      cpu16:time
                                                                      cpu79:time
                                                                      cpu76:time
                                                                      cpu49:time
                                                                      cpu22:time
                                                                      cpu44:time
                                                                      cpu32:time
                                                                      cpu68:time
                                                                      cpu39:time
                                                                      cpu69:time
                                                                      cpu74:time
                                                                      cpu34:time
                                                                      cpu61:time
                                                                      cpu20:time
                                                                      cpu12:time
 
systat -zarc
Code:
                       Total     MFU     MRU    Anon     Hdr   L2Hdr   Other
     ZFS ARC            642G    621G     12G    428K   7367M      0K    820M
                                rate    hits  misses   total hits total misses
     arcstats                  :  0%       0       0     51354648     39471192
     arcstats.demand_data      :  0%       0       0     46362087      4523687
     arcstats.demand_metadata  :  0%       0       0      2670818      4992774
     arcstats.prefetch_data    :  0%       0       0      2174883     29882852
     arcstats.prefetch_metadata:  0%       0       0       146860        71879
     zfetchstats               :  0%       0       0      4492247      9591496
     arcstats.l2               :  0%       0       0            0            0
     vdev_cache_stats          :  0%       0       0         8650        19624
systat -iostat,
Code:
                    /0   /1   /2   /3   /4   /5   /6   /7   /8   /9   /10
     Load Average   >>>>>>>>>>>
          /0%  /10  /20  /30  /40  /50  /60  /70  /80  /90  /100
cpu  user|
     nice|
   system|XX
interrupt|
     idle|XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

          /0%  /10  /20  /30  /40  /50  /60  /70  /80  /90  /100
ada0  MB/s
      tps|
ada1  MB/s
      tps|
da0   MB/s
      tps|
da1   MB/s
      tps|
da2   MB/s
      tps|
da3   MB/s
      tps|
da4   MB/s
      tps|
da5   MB/s
      tps|
da6   MB/s
      tps|
da7   MB/s
      tps|
da8   MB/s
      tps|
da9   MB/s
      tps|
da10  MB/s
      tps|
da11  MB/s
      tps|
da12  MB/s
      tps|
da13  MB/s
      tps|
da14  MB/s
      tps|
da15  MB/s
      tps|
pass0 MB/s
      tps|
pass1 MB/s
      tps|
pass2 MB/s
      tps|
pass3 MB/s
      tps|
pass4 MB/s
      tps|
pass5 MB/s
      tps|
pass6 MB/s
      tps|

systat -pigs

Code:
                    /0   /1   /2   /3   /4   /5   /6   /7   /8   /9   /10
     Load Average   >>>>>>>>>>>>>

                    /0%  /10  /20  /30  /40  /50  /60  /70  /80  /90  /100
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root        zfskern XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root           idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
zpool status -v -Td 5
Code:
Fri Apr 13 07:34:05 AEST 2018
  pool: Raid10
state: ONLINE
  scan: scrub repaired 0 in 0h23m with 0 errors on Mon Apr  9 14:28:25 2018
config:

        NAME        STATE     READ WRITE CKSUM
        Raid10      ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            da0p1   ONLINE       0     0     0
            da8p1   ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            da1p1   ONLINE       0     0     0
            da9p1   ONLINE       0     0     0
          mirror-2  ONLINE       0     0     0
            da2p1   ONLINE       0     0     0
            da10p1  ONLINE       0     0     0
          mirror-3  ONLINE       0     0     0
            da3p1   ONLINE       0     0     0
            da11p1  ONLINE       0     0     0
          mirror-4  ONLINE       0     0     0
            da4p1   ONLINE       0     0     0
            da12p1  ONLINE       0     0     0
          mirror-5  ONLINE       0     0     0
            da5p1   ONLINE       0     0     0
            da13p1  ONLINE       0     0     0
          mirror-6  ONLINE       0     0     0
            da6p1   ONLINE       0     0     0
            da14p1  ONLINE       0     0     0
          mirror-7  ONLINE       0     0     0
            da7p1   ONLINE       0     0     0
            da15p1  ONLINE       0     0     0

errors: No known data errors

  pool: zroot
state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Thu Apr  5 21:04:54 2018
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            ada0p3  ONLINE       0     0     0
            ada1p3  ONLINE       0     0     0
 
My ARC tunes:
Code:
vfs.zfs.l2arc_norw: 1
vfs.zfs.l2arc_feed_again: 1
vfs.zfs.l2arc_noprefetch: 0
vfs.zfs.l2arc_feed_min_ms: 200
vfs.zfs.l2arc_feed_secs: 1
vfs.zfs.l2arc_headroom: 2
vfs.zfs.l2arc_write_boost: 8388608
vfs.zfs.l2arc_write_max: 8388608
vfs.zfs.arc_meta_limit: 171798691840
vfs.zfs.arc_free_target: 1813966
vfs.zfs.compressed_arc_enabled: 1
vfs.zfs.arc_shrink_shift: 7
vfs.zfs.arc_average_blocksize: 8192
vfs.zfs.arc_min: 137438953472
vfs.zfs.arc_max: 687194767360
 
Hi Everyone,

I am trying to come back to FreeBSD after turning to Linux in 1998. What I am building is a host with ZFS and several MySQL instances in jails. Well, it works but I am hitting some weird behavior which I can't explain.

First a bit about my hardware, yes I remember I had to craft it to match to FreeBSD not otherwise. Here it is:

1TB of RAM
2x Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz
16x TOSHIBA PX05SVB096Y SAS3 SSD
and as it is Dell it has Dell's HBA330+ controller
Intel(R) Ethernet Connection XL710
4 NUMA domains

FreeBSD 11.1-release-p9

There is 2x ZFS:
- system root with 2x nvmem M.2 drives running as mirror
- data root with 16x Toshiba in mirrors (8 vdevs)
- compression enabled (lz4), no dedup

My plan was to utilize most of RAM as ARC and run MySQLs with direct flush or small buffers. I started testing it by running some well known SQL statements and it worked very well, however, not always.

Let's rewind to the very first day.

I had to install MySQL data from xtrabackup files. I had them in tar+lzop, 800G in file, around 1.8T unpacked. Run "lzop -d file | tar xf -" and bang - after unpacking around a half of its size system stopped responding properly. I repeated this a few times with exactly the same result:

- Whatever was open, say, a file in vim, still worked
- any other action such as ls, vi, top - hung frozen

After a few tries I started top first and kept watching. Nice, got some clue - these processes are waiting for another process trying to free ARC. Left for hours didn't progress so warm reboot.

Next step was to limit arc to 640G and set mem target and minimum to be 128/64. It worked, I was able to unpack files. Then I run MySQL with some tests with no issues, however, I noted wired memory growing fast and far beyond ARC. Disabled UMA - helped a bit but made it slow. Enabled UMA again.

Then further testing with unpacking. After having two tests run over MySQL and ARC sitting around 350G while wired around 500G, I started "pbzip2 -d file > deleteme". file.bz2 was 400G size with 1.8T packed content.
Whole process runs at 15% of drives speed until around 400G is unpacked. Then it slows down to next to nothing. ARC is full (640G), wired leaving around 63G free (less than min set) and still growing slowly. System is terribly slow, I am able to launch top but it takes minutes to start, no option to open a file, ls sometimes returns after minutes. Killing bzip doesn't help. Once I tried to delete "deleteme" file while system was dying, actually I had two files, one from previous test. Run "rm file1; rm file2" then ls. Files were gone but system deteriorated further. Warm reboot after around 30 minutes of nothing but top still running and reporting 59G free in RAM. After reboot surprise - files are back :) Conclusion - unable to flush transaction.
After that I limited number of blocks deleted in single transaction to 200000 and then to 50000 but that had no influence.

I switched timers to HPET as I allowed C3 state. Disallowing C3 (still C1/C2 allowed and powerd running) made SQL test faster but nothing else better.

So where am I now?
I am able to kill it every time I run unpacking for long enough. I tested compression=disabled with no difference. While dead it deteriorates slowly even when left alone, shows load around 5 while only top runs and reboot fixes it. BTW - it also doesn't reboot stopping after famous "uptime", disabling usb wait doesn't help.

What should I look at or read about to fix it?

Cheers,
My advice .. Dont go back to freebsd. Reason ? bloaded as hell
 
Rebooted to fix, load 100HZ and disable trim.

After 6h of running rsync out with read-only activity on ZFS:
- First change is that Wired is next to ARC - 477ARC, 496 Wired.
- Still dying ready to reboot now, ^C on running top takes ages.

Reducing ARC to 64G now.
 
Here is some fun....
After reboot and loader.conf set to 64G ARC limit:
2x rsync out and then...

- While climbing up ARC 2x30MB's
- 2x50MB/s after hitting 64G in ARC
- expanded ARC to 128G and 2x30MB/s while climbing, 2x40MB/s while full ARC
- Decreased ARC to 32GB and 2x60MB/s

In general load on drives reflects transfer. ARC is 100% MFU.
Now I am scratching my head why transfer is slower with 128G ARC?!


WCPU on rsyncs is 25% while ARC is full and transfer 2x60MB/s, 90% while climbing ARC and transfer 2x30MB/s
 
Long shot maybe but have you tested your memory with memtest?

Also, what is the output of #zdb | grep ashift ? Might be irrelevant in this case but it never hurts to know if your pool is properly aligned.
 
Long shot maybe but have you tested your memory with memtest?

Also, what is the output of #zdb | grep ashift ? Might be irrelevant in this case but it never hurts to know if your pool is properly aligned.
ashift is 12. Memory is tested, this is new server and Dell does memtest + smarttests (short and long) on drives before delivery.
 
As with reduced ARC it doesn't want to die, I played a bit.

- There is general rule - if ARC is climbing, does not matter from what value up to what I am getting 2x 30MB/s
- I shifted shift from 7 to 11 and now building ARC up to 256G. Seems better so far.

Next reboot enabling trim and 1000HZ
 
There is more which I don't understand. I am rsyncing out, no writing. After two days of READING fragmentation dropped from 32% to 12%. Does it heal by copying on READ while sees fragmentation?
 
Back
Top