We are using a Freebsd 11.0-RELEASE-p9 server as an iSCSI storage backend for a vmWare ESXi 6 cluster. The server is a Dell PowerEdge R620 with an attached md1220 disk enclosure full of SSDs. Those SSDs are setup in mirrored vdevs and a zvol shared over iSCSI with vmWare which has it formatted with vmfs5. What we are experiencing is high system CPU that seems to correspond with high write activity. This has caused some issues with VM performance including IO timeouts requiring fscks/reboots of VMs.
Another thing to note is that we are relying on lz4 compression to allow for some "oversubscription" -- we present an 8T B volume to vmWare via iSCSI, but there's really only 5 TB available in the pool.
My suspicion is that this is a result of fragmentation in the metaslabs, but it is only at 52% capacity -- previous experience (and what is considered the "traditional" wisdom in ZFS-land) is that 80% capacity is about the magic tipping point. I'm not sure how to definitively prove that is or is not actually the issue. Block size is 4 KB to match the underlying SSDs, and from the
Relevant config from ctl.conf:
Output of top:
Another thing to note is that we are relying on lz4 compression to allow for some "oversubscription" -- we present an 8T B volume to vmWare via iSCSI, but there's really only 5 TB available in the pool.
My suspicion is that this is a result of fragmentation in the metaslabs, but it is only at 52% capacity -- previous experience (and what is considered the "traditional" wisdom in ZFS-land) is that 80% capacity is about the magic tipping point. I'm not sure how to definitively prove that is or is not actually the issue. Block size is 4 KB to match the underlying SSDs, and from the
zdb -M
output, it looks like there are still plenty of 4 KB blocks available. Any pointers or suggestions would be most appreciated. zdb -M
(just the summary
Code:
pool storage fragmentation 71%
9: 25113486 **************
10: 69352157 **************************************
11: 73587828 ****************************************
12: 73809494 ****************************************
13: 33958358 *******************
14: 16266259 *********
15: 7790609 *****
16: 3767282 ***
17: 1679263 *
18: 507941 *
19: 99539 *
20: 11143 *
21: 4242 *
22: 1517 *
23: 157 *
24: 26 *
25: 1 *
Code:
$ zpool list storage
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
storage 5.09T 2.66T 2.42T - 71% 52% 1.00x ONLINE -
Code:
$ zpool status storage
pool: storage
state: ONLINE
scan: scrub canceled on Fri Jul 7 01:37:15 2017
config:
NAME STATE READ WRITE CKSUM
storage ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
multipath/0RY81DLA ONLINE 0 0 0
multipath/0RY82EYA ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
multipath/0RY82PAA ONLINE 0 0 0
multipath/0RY82PGA ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
multipath/0RY82SJA ONLINE 0 0 0
multipath/0RY82XHA ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
multipath/0RY82ZGA ONLINE 0 0 0
multipath/0RY842VA ONLINE 0 0 0
mirror-4 ONLINE 0 0 0
multipath/0RY842YA ONLINE 0 0 0
multipath/0RY84A8A ONLINE 0 0 0
mirror-5 ONLINE 0 0 0
multipath/0RY84BNA ONLINE 0 0 0
multipath/0RY84BXA ONLINE 0 0 0
mirror-6 ONLINE 0 0 0
multipath/0RY84D2A ONLINE 0 0 0
multipath/0RY84D4A ONLINE 0 0 0
spares
multipath/0RY857DA AVAIL
Code:
$ zfs get all storage/vm-storage
NAME PROPERTY VALUE SOURCE
storage/vm-storage type volume -
storage/vm-storage creation Fri Jul 7 13:17 2017 -
storage/vm-storage used 2.66T -
storage/vm-storage available 2.26T -
storage/vm-storage referenced 2.66T -
storage/vm-storage compressratio 1.78x -
storage/vm-storage reservation none default
storage/vm-storage volsize 8T local
storage/vm-storage volblocksize 4K -
storage/vm-storage checksum on default
storage/vm-storage compression lz4 local
storage/vm-storage readonly off default
storage/vm-storage copies 1 default
storage/vm-storage refreservation none default
storage/vm-storage primarycache all local
storage/vm-storage secondarycache all default
storage/vm-storage usedbysnapshots 0 -
storage/vm-storage usedbydataset 2.66T -
storage/vm-storage usedbychildren 0 -
storage/vm-storage usedbyrefreservation 0 -
storage/vm-storage logbias throughput inherited from storage
storage/vm-storage dedup off default
storage/vm-storage mlslabel -
storage/vm-storage sync standard local
storage/vm-storage refcompressratio 1.78x -
storage/vm-storage written 2.66T -
storage/vm-storage logicalused 4.65T -
storage/vm-storage logicalreferenced 4.65T -
storage/vm-storage volmode dev local
storage/vm-storage snapshot_limit none default
storage/vm-storage snapshot_count none default
storage/vm-storage redundant_metadata all default
Relevant config from ctl.conf:
Code:
target iqn.2017-03.com.flightaware:target0 {
alias "VMware Target"
portal-group pg0
auth-group ag0
lun 0 {
path /dev/zvol/storage/vm-storage
size 8796093022208
}
}
Output of top:
Code:
CPU: 0.7% user, 0.0% nice, 64.8% system, 3.7% interrupt, 30.8% idle
Mem: 5880K Active, 39M Inact, 104G Wired, 21G Free
ARC: 90G Total, 10G MFU, 69G MRU, 46M Anon, 7070M Header, 3924M Other
Swap: 16G Total, 1800K Used, 16G Free
PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
0 root 8 - 0K 12240K - 13 65.8H 36.05% kernel{cblk0 taskq_4}
0 root 8 - 0K 12240K - 17 65.5H 33.79% kernel{cblk0 taskq_9}
0 root -16 - 0K 12240K - 7 88.7H 32.92% kernel{zio_write_intr_7}
0 root 8 - 0K 12240K - 18 65.7H 32.66% kernel{cblk0 taskq_6}
12 root -92 - 0K 1440K WAIT 6 94.8H 31.78% intr{irq316: ix1:q6}
0 root -16 - 0K 12240K - 13 88.6H 31.24% kernel{zio_write_intr_5}
0 root 8 - 0K 12240K - 17 65.7H 31.07% kernel{cblk0 taskq_7}
0 root 8 - 0K 12240K CPU12 12 65.5H 30.49% kernel{cblk0 taskq_11}
0 root 8 - 0K 12240K - 6 65.8H 30.09% kernel{cblk0 taskq_8}
0 root 8 - 0K 12240K - 4 65.8H 29.43% kernel{cblk0 taskq_13}
0 root -15 - 0K 12240K - 17 113.9H 29.00% kernel{zio_write_issue_}
0 root -15 - 0K 12240K - 13 113.8H 28.95% kernel{zio_write_issue_}
0 root 8 - 0K 12240K - 19 65.7H 28.86% kernel{cblk0 taskq_0}
0 root -16 - 0K 12240K - 3 88.6H 28.84% kernel{zio_write_intr_3}
0 root 8 - 0K 12240K - 8 65.2H 28.69% kernel{cblk0 taskq_3}
0 root -16 - 0K 12240K - 4 88.6H 28.53% kernel{zio_write_intr_2}
0 root -15 - 0K 12240K - 13 113.9H 28.46% kernel{zio_write_issue_}
0 root 8 - 0K 12240K CPU1 1 65.7H 28.42% kernel{cblk0 taskq_5}
0 root -16 - 0K 12240K - 13 88.6H 28.29% kernel{zio_write_intr_0}
0 root -15 - 0K 12240K - 14 113.8H 28.24% kernel{zio_write_issue_}
0 root -16 - 0K 12240K - 5 88.7H 28.24% kernel{zio_write_intr_1}
0 root -16 - 0K 12240K - 19 88.6H 28.01% kernel{zio_write_intr_4}
0 root 8 - 0K 12240K - 23 65.8H 27.92% kernel{cblk0 taskq_12}
0 root -15 - 0K 12240K - 2 113.9H 27.84% kernel{zio_write_issue_}
0 root -15 - 0K 12240K - 20 113.9H 27.14% kernel{zio_write_issue_}
0 root -15 - 0K 12240K - 8 113.9H 26.90% kernel{zio_write_issue_}
0 root -15 - 0K 12240K - 14 113.9H 26.60% kernel{zio_write_issue_}
0 root -15 - 0K 12240K - 9 113.9H 26.58% kernel{zio_write_issue_}
0 root -15 - 0K 12240K - 5 113.9H 26.55% kernel{zio_write_issue_}
0 root -16 - 0K 12240K - 15 88.6H 26.20% kernel{zio_write_intr_6}
0 root -15 - 0K 12240K - 18 113.9H 26.17% kernel{zio_write_issue_}
0 root -15 - 0K 12240K - 23 113.9H 26.07% kernel{zio_write_issue_}
0 root 8 - 0K 12240K - 17 65.7H 25.62% kernel{cblk0 taskq_10}
0 root -15 - 0K 12240K - 1 113.9H 25.41% kernel{zio_write_issue_}
0 root -15 - 0K 12240K - 15 113.9H 25.33% kernel{zio_write_issue_}
0 root -15 - 0K 12240K - 17 113.9H 25.31% kernel{zio_write_issue_}
0 root 8 - 0K 12240K - 21 65.7H 25.12% kernel{cblk0 taskq_1}
0 root 8 - 0K 12240K - 21 65.2H 25.02% kernel{cblk0 taskq_2}
0 root -15 - 0K 12240K - 7 113.9H 24.84% kernel{zio_write_issue_}
0 root -15 - 0K 12240K - 17 113.8H 23.88% kernel{zio_write_issue_}
0 root -15 - 0K 12240K - 20 113.9H 22.09% kernel{zio_write_issue_}
12 root -88 - 0K 1440K WAIT 4 36.1H 19.28% intr{irq320: mps1}
12 root -88 - 0K 1440K WAIT 9 36.0H 19.05% intr{irq319: mps0}
4 root -16 - 0K 96K RUN 11 33.8H 18.50% cam{doneq1}
4 root -16 - 0K 96K - 15 35.6H 17.57% cam{doneq3}
4 root -16 - 0K 96K - 9 33.9H 17.13% cam{doneq2}
4 root -16 - 0K 96K - 3 35.2H 16.59% cam{doneq0}
1319 root -16 - 0K 128K - 19 78.4H 15.66% ctl{work0}
5 root -8 - 0K 160K tx->tx 5 119.0H 13.98% zfskern{txg_thread_enter}
4 root -16 - 0K 96K - 8 25.1H 13.61% cam{doneq4}
5 root -8 - 0K 160K arc_us 17 58.1H 11.11% zfskern{arc_user_evicts_}
0 root -16 - 0K 12240K icl_tx 3 42.4H 10.84% kernel{cfiscsitx}
0 root -16 - 0K 12240K icl_rx 3 41.8H 10.31% kernel{cfiscsirx}
12 root -92 - 0K 1440K WAIT 0 63.5H 5.83% intr{irq301: ix0:q0}
7401 adam.schumach 22 0 26264K 6844K CPU20 20 0:02 4.11% top
1319 root -16 - 0K 128K - 2 18.3H 3.80% ctl{work4}
0 root -92 - 0K 12240K - 15 354:53 2.88% kernel{ix1:q6}
0 root -16 - 0K 12240K icl_rx 7 26.4H 2.83% kernel{cfiscsirx}
34115 root 52 0 30168K 5412K piperd 19 108:19 2.17% zabbix_agentd
0 root -16 - 0K 12240K icl_tx 8 18.4H 2.12% kernel{cfiscsitx}
1319 root -16 - 0K 128K - 0 691:39 2.08% ctl{work3}
34117 root 52 0 30168K 5436K piperd 19 108:45 1.98% zabbix_agentd
0 root -16 - 0K 12240K - 18 143:22 1.46% kernel{zio_read_intr_7_}
0 root -16 - 0K 12240K - 2 143:35 1.45% kernel{zio_read_intr_7_}
0 root -16 - 0K 12240K - 9 143:35 1.44% kernel{zio_read_intr_7_}
0 root -16 - 0K 12240K - 7 143:03 1.43% kernel{zio_read_intr_2_}
0 root -16 - 0K 12240K - 19 142:17 1.43% kernel{zio_read_intr_1_}
0 root -16 - 0K 12240K - 5 142:13 1.38% kernel{zio_read_intr_3_}
[....]