ZFS zfs send receive slow transfer speed

That's one of the reasons I'm moving data off this server. It doesn't have enough SSD slots to allow for mirroring of the special device.

I found something else out. Two days ago I wrote a 10Gb test file and reading it was very fast: 500+MB/sec.
Re-reading the same file now, is quite slow:

Code:
cat test1.img | pv -rtab > /dev/null
9.77GiB 0:01:06 [ 150MiB/s] [ 150MiB/s]

but if I run it a second time, it's fast again

Code:
cat test1.img | pv -rtab > /dev/null
9.77GiB 0:00:17 [ 566MiB/s] [ 566MiB/s]

I assume some sort of cache goes into effect.

Pretty much any file I haven't accessed in a while only gets about 120MB/sec.
I tested with dd as well and got a bit slightly better speeds.

This has me wondering if maybe I've made a mistake somewhere in configuring this server ? Or I'm going crazy somehow. Veeam routinely reports writing at 300MB/sec. How can I really test what the hardware can do ?

Code:
-rw-r--r--   1 root    pulsar  10485760000 May 13 23:18 test1.img

dd if=test1.img of=/dev/null bs=1M
10000+0 records in
10000+0 records out
10485760000 bytes transferred in 55.541516 secs (188791388 bytes/sec)

Code:
dd if=test1.img of=/dev/null bs=128k
80000+0 records in
80000+0 records out
10485760000 bytes transferred in 30.720653 secs (341326078 bytes/sec)

I don't know what to believe anymore
 
Yes, any time you read a file, it will be warm in the cache if you come back before it is evicted; it's one of the cardinal issues with trying to do benchmarking of different configurations -- you have to run long enough to invalidate the cached items that will be hit at the start of a subsequent "experiment". (For ZFS, this can be very hard to do for certain; starting a test ~immediately following a reboot is one of the best, although zpool export/import works well, too.)

That said, this sure feels slow for your configuration; I would be tempted to try to capture gstat, zpool iostat, and hotkernel usage all while reading a set of large known cold (not recently read/written) files (You can use cat to read a number of large files... cat filepat* | pv > /dev/null.) If you don't have some device hitting close to 95% in gstat, it's likely not a layout/device limitation, so something else is going on. I've never used a special device, so I don't know how that plays into / impacts the situation here, but it should show up on gstat utilization if it is a bottleneck.

Do you have any non-bone-stock zpool/zfs settings or sysctls set?

(Using dtrace / hotkernel is not kernel debugging per se, and I've done it in production many times; from Oracle's description: DTrace is a comprehensive dynamic tracing framework for troubleshooting kernel and application problems on production systems in real time.)
 
Is there a way to know if the backplane and or external cables are getting saturated ? This server is made up of 3 chassis: 1 x 36 HDD (multipath) + 44 HDD expansion + 24 HDD expansion. This is all wired up via LSI 9207i/e cards that run @ 6Gbps.

There is no disk that reaches anywhere near 90%+ while doing the read.

I don't have any special zfs config, no special sysctl,

I tried checking IRQs on the LSI cards: mpr0/1 are the local multipath backplanes, mps0/1 are the external chassis. They all seem to be getting good usage

Code:
vmstat -i 1 100

irq80: mpr0                          484        484
irq81: mpr1                            0          0
irq82: mps0                          391        391
irq114: mps1                         545        544


Code:
cat file.back | pv -rtab > /dev/null
29.8GiB 0:03:05 [ 164MiB/s] [ 164MiB/s]

Code:
/usr/local/share/dtrace-toolkit/hotkernel

zfs.ko`dbuf_write_physdone                                 54   0.0%
kernel`vm_reserv_alloc_page                                54   0.0%
zfs.ko`arc_is_unauthenticated                              55   0.0%
kernel`rxd_frag_to_sd                                      55   0.0%
zfs.ko`mze_compare                                         55   0.0%
kernel`ithread_loop                                        56   0.0%
zfs.ko`list_insert_tail                                    56   0.0%
zfs.ko`vdev_geom_io_start                                  56   0.0%
zfs.ko`arc_buf_destroy_impl                                56   0.0%
zfs.ko`dbuf_hold                                           57   0.0%
zfs.ko`zio_compress_zeroed_cb                              57   0.0%
kernel`trap                                                57   0.0%
zfs.ko`metaslab_trace_fini                                 57   0.0%
zfs.ko`vdev_geom_fill_unmap_cb                             57   0.0%
kernel`xpt_done                                            57   0.0%
kernel`mpssas_scsiio_complete                              57   0.0%
zfs.ko`list_link_active                                    57   0.0%
kernel`mpssas_action_scsiio                                58   0.0%
kernel`uma_zalloc_smr                                      58   0.0%
zfs.ko`dbuf_prefetch_impl                                  59   0.0%
zfs.ko`multilist_sublist_unlock                            59   0.0%
zfs.ko`dsl_dir_tempreserve_space                           59   0.0%
zfs.ko`dnode_rele_and_unlock                               60   0.0%
kernel`xpt_run_allocq                                      60   0.0%
kernel`vop_lock                                            61   0.0%
zfs.ko`zfs_freebsd_getattr                                 61   0.0%
kernel`microtime                                           62   0.0%
zfs.ko`sa_get_db                                           63   0.0%
zfs.ko`dnode_block_freed                                   64   0.0%
kernel`kvprintf                                            64   0.0%
zfs.ko`dnode_sync                                          65   0.0%
kernel`_sleep                                              65   0.0%
zfs.ko`range_tree_is_empty                                 65   0.0%
zfs.ko`spa_indirect_vdevs_loaded                           65   0.0%
kernel`dastart                                             66   0.0%
kernel`dofileread                                          67   0.0%
kernel`mps_push_sge                                        67   0.0%
kernel`ixl_update_stats_counters                           68   0.0%
kernel`taskqueue_enqueue                                   69   0.0%
kernel`getbinuptime                                        69   0.0%
kernel`vm_object_deallocate                                69   0.0%
zfs.ko`dbuf_create                                         69   0.0%
zfs.ko`abd_free                                            69   0.0%
kernel`lockmgr_slock                                       70   0.0%
kernel`xpt_done_process                                    71   0.0%
kernel`foffset_lock                                        71   0.0%
zfs.ko`dmu_objset_spa                                      71   0.0%
zfs.ko`zap_hash                                            73   0.0%
zfs.ko`dmu_zfetch_fini                                     73   0.0%
kernel`lookup                                              75   0.0%
kernel`seltdclear                                          75   0.0%
zfs.ko`zio_vdev_io_assess                                  76   0.0%
zfs.ko`dbuf_include_in_metadata_cache                      78   0.0%
zfs.ko`dbuf_dirty                                          79   0.0%
kernel`rms_runlock                                         80   0.0%
zfs.ko`arc_evictable_space_increment                       80   0.0%
kernel`devvn_refthread                                     81   0.0%
kernel`vm_page_free_prep                                   82   0.0%
kernel`binuptime                                           82   0.0%
zfs.ko`zfs_tstamp_update_setup_ext                         82   0.0%
kernel`pmap_is_prefaultable                                82   0.0%
kernel`userret                                             82   0.0%
zfs.ko`avl_find                                            83   0.0%
kernel`vm_page_alloc_domain_after                          83   0.0%
zfs.ko`dbuf_write_ready                                    83   0.0%
zfs.ko`zfs_blkptr_verify                                   84   0.0%
zfs.ko`arc_buf_alloc_impl                                  87   0.0%
zfs.ko`arc_evict_state                                     87   0.0%
kernel`_pmap_unwire_ptp                                    88   0.0%
kernel`lockmgr_unlock                                      89   0.0%
kernel`seltdwait                                           89   0.0%
kernel`vm_page_tryxbusy                                    90   0.0%
kernel`lock_destroy                                        92   0.0%
zfs.ko`arc_read_done                                       92   0.0%
zfs.ko`avl_rotation                                        92   0.0%
zfs.ko`zio_gang_tree_free                                  93   0.0%
zfs.ko`dsl_dir_diduse_transfer_space                       93   0.0%
zfs.ko`dataset_kstats_update_read_kstats                   95   0.0%
zfs.ko`zio_nowait                                          95   0.0%
kernel`xpt_done_td                                         97   0.0%
zfs.ko`arc_read                                            98   0.0%
kernel`sigqueue_move_set                                   99   0.0%
kernel`_thread_lock                                        99   0.0%
kernel`vm_fault                                           100   0.0%
zfs.ko`list_create                                        101   0.0%
kernel`vn_io_fault                                        101   0.0%
zfs.ko`list_insert_head                                   101   0.0%
kernel`xpt_run_devq                                       103   0.0%
kernel`mpr_push_ieee_sge                                  103   0.0%
zfs.ko`avl_destroy_nodes                                  104   0.0%
kernel`lock_init                                          104   0.0%
zfs.ko`kmem_cache_free                                    104   0.0%
zfs.ko`dmu_read_uio_dbuf                                  105   0.0%
kernel`knote                                              106   0.0%
dtrace.ko`dtrace_dynvar_clean                             107   0.0%
kernel`cloneuio                                           107   0.0%
zfs.ko`arc_buf_destroy                                    107   0.0%
kernel`ahci_intr                                          108   0.0%
zfs.ko`dnode_hold_impl                                    108   0.0%
zfs.ko`__sx_xunlock                                       110   0.0%
zfs.ko`sa_attr_op                                         114   0.0%
zfs.ko`dnode_destroy                                      115   0.0%
kernel`sx_try_xlock_int                                   115   0.0%
kernel`cache_fplookup                                     115   0.0%
zfs.ko`zfs_zaccess_aces_check                             115   0.0%
kernel`foffset_unlock_uio                                 115   0.0%
kernel`nanouptime                                         116   0.0%
kernel`_sx_sunlock_int                                    118   0.0%
kernel`spinlock_enter                                     118   0.0%
zfs.ko`zfs_acl_next_ace                                   119   0.0%
zfs.ko`dbuf_whichblock                                    120   0.0%
kernel`cache_lookup                                       120   0.0%
kernel`tcp_output                                         122   0.0%
zfs.ko`dsl_dataset_block_kill                             122   0.0%
zfs.ko`dbuf_rele                                          122   0.0%
zfs.ko`avl_numnodes                                       123   0.0%
kernel`iflib_rxeof                                        123   0.0%
zfs.ko`zio_execute                                        124   0.0%
kernel`taskqueue_run_locked                               124   0.0%
kernel`vm_map_lookup_entry                                125   0.0%
kernel`maybe_yield                                        126   0.0%
zfs.ko`range_tree_add_impl                                127   0.0%
zfs.ko`kmem_cache_alloc                                   128   0.0%
zfs.ko`vdev_queue_offset_compare                          130   0.0%
kernel`vmspace_fork                                       131   0.0%
zfs.ko`metaslab_trace_init                                131   0.0%
zfs.ko`vdev_stat_update                                   135   0.0%
kernel`cpu_set_syscall_retval                             135   0.0%
zfs.ko`abd_iter_advance                                   138   0.0%
zfs.ko`dmu_read_uio_dnode                                 139   0.0%
zfs.ko`list_head                                          140   0.0%
kernel`cv_init                                            142   0.0%
kernel`ahci_ch_intr_main                                  143   0.0%
zfs.ko`spa_config_exit                                    143   0.0%
kernel`ahci_ch_intr                                       145   0.0%
zfs.ko`dbuf_destroy                                       145   0.0%
kernel`keg_alloc_slab                                     148   0.0%
zfs.ko`dbuf_compare                                       148   0.0%
kernel`sx_init_flags                                      148   0.0%
zfs.ko`arc_access                                         152   0.0%
kernel`cache_enter_time                                   158   0.0%
zfs.ko`zio_vdev_io_done                                   161   0.0%
zfs.ko`metaslab_alloc_dva                                 166   0.0%
zfs.ko`dbuf_cache_multilist_index_func                    166   0.0%
ipfw.ko`ipfw_chk                                          168   0.0%
kernel`rangelock_enqueue                                  169   0.0%
kernel`rangelock_unlock                                   169   0.0%
zfs.ko`vdev_queue_io_to_issue                             170   0.0%
kernel`vm_radix_lookup_le                                 171   0.0%
zfs.ko`metaslab_rangesize32_compare                       173   0.0%
kernel`rms_rlock                                          178   0.0%
zfs.ko`arc_change_state                                   185   0.0%
zfs.ko`dmu_buf_will_dirty_impl                            186   0.0%
kernel`vn_read                                            191   0.0%
zfs.ko`range_tree_seg32_compare                           194   0.0%
kernel`_rm_rlock                                          194   0.0%
zfs.ko`avl_remove                                         199   0.0%
kernel`_sx_slock_int                                      200   0.0%
zfs.ko`spa_config_enter                                   204   0.0%
kernel`kern_sigaction                                     205   0.0%
zfs.ko`buf_hash_find                                      220   0.0%
kernel`sys_read                                           224   0.0%
zfs.ko`list_remove                                        225   0.0%
kernel`pagecopy                                           239   0.0%
kernel`lock_mtx                                           240   0.0%
kernel`cpu_fetch_syscall_args                             244   0.0%
zfs.ko`dmu_zfetch_prepare                                 244   0.0%
zfs.ko`avl_add                                            248   0.0%
kernel`vm_map_pmap_enter                                  248   0.0%
kernel`kern_setitimer                                     249   0.0%
kernel`vm_radix_insert                                    253   0.0%
zfs.ko`arc_released                                       254   0.0%
zfs.ko`abd_iterate_func                                   255   0.0%
zfs.ko`multilist_remove                                   257   0.0%
zfs.ko`zio_remove_child                                   260   0.0%
kernel`vm_object_collapse                                 261   0.0%
zfs.ko`zfs_rangelock_enter_impl                           263   0.0%
kernel`pmap_kextract                                      264   0.0%
kernel`_sx_xlock_hard                                     265   0.0%
zfs.ko`abd_iter_map                                       269   0.0%
zfs.ko`zfs_rangelock_exit                                 272   0.0%
kernel`uiomove_faultflag                                  273   0.0%
kernel`sys_write                                          277   0.0%
zfs.ko`zrl_remove                                         282   0.0%
zfs.ko`multilist_insert                                   286   0.0%
kernel`pmap_enter                                         289   0.0%
zfs.ko`cityhash4                                          290   0.0%
zfs.ko`zio_create                                         290   0.0%
kernel`__mtx_lock_sleep                                   297   0.0%
zfs.ko`abd_copy_to_buf_off_cb                             299   0.0%
kernel`vm_page_rename                                     312   0.0%
zfs.ko`zio_add_child                                      313   0.0%
kernel`bounce_bus_dmamap_load_ma                          319   0.0%
zfs.ko`dbuf_rele_and_unlock                               323   0.0%
kernel`vm_radix_lookup                                    335   0.0%
zfs.ko`zfs_btree_find                                     342   0.0%
kernel`selfdfree                                          351   0.0%
kernel`vm_page_alloc_noobj_domain                         383   0.0%
kernel`doselwakeup                                        391   0.0%
zfs.ko`dbuf_read                                          399   0.0%
zfs.ko`zfs_read                                           399   0.0%
zfs.ko`zio_wait                                           405   0.0%
zfs.ko`zio_ready                                          414   0.0%
zfs.ko`dbuf_hold_impl                                     425   0.0%
kernel`malloc                                             426   0.0%
kernel`vm_radix_remove                                    429   0.0%
kernel`free_pv_chunk_dequeued                             442   0.0%
zfs.ko`aggsum_add                                         449   0.0%
kernel`sched_idletd                                       459   0.0%
zfs.ko`dmu_buf_hold_array_by_dnode                        501   0.0%
zfs.ko`arc_buf_access                                     506   0.0%
kernel`fget_unlocked                                      556   0.0%
kernel`selrecord                                          557   0.0%
zfs.ko`dbuf_find                                          608   0.0%
kernel`pipe_read                                          658   0.0%
zfs.ko`zrl_add_impl                                       664   0.0%
zfs.ko`lz4_decompress_zfs                                 821   0.0%
kernel`kern_select                                        863   0.0%
kernel`vm_reserv_rename                                  1041   0.0%
kernel`pagezero_erms                                     1086   0.0%
zfs.ko`zio_done                                          1097   0.0%
kernel`memmove_erms                                      1208   0.0%
kernel`cpu_search_highest                                1226   0.0%
kernel`free                                              1228   0.0%
kernel`pipe_poll                                         1254   0.0%
kernel`uma_zfree_arg                                     1353   0.0%
kernel`amd64_syscall                                     1519   0.0%
kernel`uma_zalloc_arg                                    1620   0.0%
kernel`memset_erms                                       1639   0.0%
kernel`PHYS_TO_VM_PAGE                                   1705   0.0%
kernel`pipe_write                                        2334   0.0%
kernel`copyin_nosmap_erms                                3713   0.1%
kernel`pmap_copy                                         3757   0.1%
zfs.ko`lz4_compress_zfs                                  3760   0.1%
kernel`0xffffffff81                                      4465   0.1%
kernel`copyout_nosmap_erms                               4702   0.1%
kernel`get_pv_entry                                      4740   0.1%
zfs.ko`fletcher_4_sse2_native                            5318   0.1%
kernel`memcpy_erms                                       9943   0.2%
kernel`spinlock_exit                                     9949   0.2%
kernel`lock_delay                                       25034   0.5%
kernel`hpet_get_timecount                               39640   0.8%
kernel`pmap_try_insert_pv_entry                         66611   1.4%
kernel`pmap_remove_pages                                93348   2.0%
kernel`cpu_idle                                        359224   7.6%
kernel`acpi_cpu_idle                                  4021373  85.1%
 
Nope, I'll paste them below.

The 36 HDD main server is SAS3 so that's why it uses the mpr driver. It also has the LSI 9308 card, I might have explained incorrectly above. The 9207-8e are for the external chassis.

Also, another interesting tidbit is that some of the drives are 12Gbps and identified as scsi. Camcontrol identify doesn't work on them, I need to use camcontrol inquiry

Can this mix of SAS2/SAS3 and 6/12gbps drives cause the overall speed to be affected ? I think SATA speeds range from 1.5 to 12Gbps but in my case I'm barely getting 1Gbps, so not sure where that's coming from. In any case, I guess this proves that ZFS can work wonders when you have lots of RAM and a fast special device. I never even knew there were speed issues until I started doing this transfer.

Code:
camcontrol identify da0
pass0: <HGST HUS726T6TALE604 VKAZW420> ACS-2 ATA SATA 3.x device
pass0: 1200.000MB/s transfers, Command Queueing Enabled

protocol              ACS-2 ATA SATA 3.x
device model          HGST HUS726T6TALE604

Code:
camcontrol inquiry da143
pass153: <HGST HUS726060AL5210 ADR2> Fixed Direct Access SPC-4 SCSI device
pass153: Serial Number         K8K7D7DN
pass153: 600.000MB/s transfers, Command Queueing Enabled

Code:
dev.mps.0.spinup_wait_time: 3
dev.mps.0.chain_alloc_fail: 0
dev.mps.0.enable_ssu: 1
dev.mps.0.max_io_pages: -1
dev.mps.0.max_chains: 16384
dev.mps.0.chain_free_lowwater: 11215
dev.mps.0.chain_free: 16384
dev.mps.0.io_cmds_highwater: 395
dev.mps.0.io_cmds_active: 0
dev.mps.0.msg_version: 2.0
dev.mps.0.driver_version: 21.02.00.00-fbsd
dev.mps.0.firmware_version: 20.00.07.00
dev.mps.0.max_evtframes: 32
dev.mps.0.max_replyframes: 2048
dev.mps.0.max_prireqframes: 128
dev.mps.0.max_reqframes: 2048
dev.mps.0.msix_msgs: 1
dev.mps.0.max_msix: 16
dev.mps.0.disable_msi: 0
dev.mps.0.disable_msix: 0
dev.mps.0.debug_level: 0x3,info,fault
dev.mps.0.%domain: 0
dev.mps.0.%parent: pci4
dev.mps.0.%pnpinfo: vendor=0x1000 device=0x0087 subvendor=0x1000 subdevice=0x3040 class=0x010700
dev.mps.0.%location: slot=0 function=0 dbsf=pci0:4:0:0
dev.mps.0.%driver: mps
dev.mps.0.%desc: Avago Technologies (LSI) SAS2308
dev.mps.%parent:

Code:
dev.mps.1.spinup_wait_time: 3
dev.mps.1.chain_alloc_fail: 0
dev.mps.1.enable_ssu: 1
dev.mps.1.max_io_pages: -1
dev.mps.1.max_chains: 16384
dev.mps.1.chain_free_lowwater: 9111
dev.mps.1.chain_free: 16384
dev.mps.1.io_cmds_highwater: 798
dev.mps.1.io_cmds_active: 0
dev.mps.1.msg_version: 2.0
dev.mps.1.driver_version: 21.02.00.00-fbsd
dev.mps.1.firmware_version: 20.00.07.00
dev.mps.1.max_evtframes: 32
dev.mps.1.max_replyframes: 2048
dev.mps.1.max_prireqframes: 128
dev.mps.1.max_reqframes: 2048
dev.mps.1.msix_msgs: 1
dev.mps.1.max_msix: 16
dev.mps.1.disable_msi: 0
dev.mps.1.disable_msix: 0
dev.mps.1.debug_level: 0x3,info,fault
dev.mps.1.%domain: 1
dev.mps.1.%parent: pci14
dev.mps.1.%pnpinfo: vendor=0x1000 device=0x0087 subvendor=0x1000 subdevice=0x3040 class=0x010700
dev.mps.1.%location: slot=0 function=0 dbsf=pci0:134:0:0
dev.mps.1.%driver: mps
dev.mps.1.%desc: Avago Technologies (LSI) SAS2308
dev.mps.0.use_phy_num: 1
dev.mps.0.dump_reqs_alltypes: 0
dev.mps.0.encl_table_dump:

Code:
dev.mpr.1.prp_page_alloc_fail: 0
dev.mpr.1.prp_pages_free_lowwater: 0
dev.mpr.1.prp_pages_free: 0
dev.mpr.1.use_phy_num: 1
dev.mpr.1.dump_reqs_alltypes: 0
dev.mpr.1.spinup_wait_time: 3
dev.mpr.1.chain_alloc_fail: 0
dev.mpr.1.enable_ssu: 1
dev.mpr.1.max_io_pages: -1
dev.mpr.1.max_chains: 16384
dev.mpr.1.chain_free_lowwater: 16381
dev.mpr.1.chain_free: 16384
dev.mpr.1.io_cmds_highwater: 66
dev.mpr.1.io_cmds_active: 0
dev.mpr.1.msg_version: 2.5
dev.mpr.1.driver_version: 23.00.00.00-fbsd
dev.mpr.1.firmware_version: 16.00.01.00
dev.mpr.1.max_evtframes: 32
dev.mpr.1.max_replyframes: 2048
dev.mpr.1.max_prireqframes: 128
dev.mpr.1.max_reqframes: 2048
dev.mpr.1.msix_msgs: 1
dev.mpr.1.max_msix: 96
dev.mpr.1.disable_msix: 0
dev.mpr.1.debug_level: 0x3,info,fault
dev.mpr.1.%domain: 0
dev.mpr.1.%parent: pci3
dev.mpr.1.%pnpinfo: vendor=0x1000 device=0x0097 subvendor=0x1000 subdevice=0x30e0 class=0x010700
dev.mpr.1.%location: slot=0 function=0 dbsf=pci0:3:0:0
dev.mpr.1.%driver: mpr
dev.mpr.1.%desc: Avago Technologies (LSI) SAS3008
dev.mpr.0.prp_page_alloc_fail: 0
dev.mpr.0.prp_pages_free_lowwater: 0
dev.mpr.0.prp_pages_free: 0
dev.mpr.0.use_phy_num: 1
dev.mpr.0.dump_reqs_alltypes: 0
dev.mpr.0.spinup_wait_time: 3
dev.mpr.0.chain_alloc_fail: 0
dev.mpr.0.enable_ssu: 1
dev.mpr.0.max_io_pages: -1
dev.mpr.0.max_chains: 16384
dev.mpr.0.chain_free_lowwater: 10454
dev.mpr.0.chain_free: 16384
dev.mpr.0.io_cmds_highwater: 597
dev.mpr.0.io_cmds_active: 0
dev.mpr.0.msg_version: 2.5
dev.mpr.0.driver_version: 23.00.00.00-fbsd
dev.mpr.0.firmware_version: 16.00.01.00
dev.mpr.0.max_evtframes: 32
dev.mpr.0.max_replyframes: 2048
dev.mpr.0.max_prireqframes: 128
dev.mpr.0.max_reqframes: 2048
dev.mpr.0.msix_msgs: 1
dev.mpr.0.max_msix: 96
dev.mpr.0.disable_msix: 0
dev.mpr.0.debug_level: 0x3,info,fault
dev.mpr.0.%domain: 0
dev.mpr.0.%parent: pci2
dev.mpr.0.%pnpinfo: vendor=0x1000 device=0x0097 subvendor=0x1000 subdevice=0x30e0 class=0x010700
dev.mpr.0.%location: slot=0 function=0 dbsf=pci0:2:0:0
dev.mpr.0.%driver: mpr
dev.mpr.0.%desc: Avago Technologies (LSI) SAS3008
dev.mpr.%parent:
 
That's one of the reasons I'm moving data off this server. It doesn't have enough SSD slots to allow for mirroring of the special device.
I appreciate that you understand the issue... the ongoing viability of your entire 250TB service literally hangs on the survival of a single SSD... and we know that SSDs fail, often without warning...

If that were my beast to tend, I'd be addressing the single point of failure with the special device by:
  1. actively managing upwards to share responsibility for the ongoing risk of 100% permanent data loss;
  2. too scared to try to remove the special device, even if I could, because performance might fall off a cliff;
  3. looking at the smart data every day for any indication that the SSD might have any sort of problem; and
  4. looking for any way to create a (preferably triple) mirror for the special device, with the required SSDs spread over separate controllers.
I'd audit all slots, and consider all options to attach another SSD. I would sacrifice spares and use carrier(s) to provision the SSDs you need to mirror the special device. I'd then manage any loss of spares separately.

I see that your root is also not mirror'd. That's far from ideal on such a significant system. But a loss would probably be recoverable, with an outage (and accessible backups). Prudence suggests that the root backup and recovery plan should be kept under review. Mirror the root if you can, but mirroring the special device is infinitely more important.
 
Back
Top