ZFS SCSI Reservation Conflict Error

Hey All,

Mostly all the disks has received 'Reservation Conflict' error as mentioned below. Could see these types of errors in system log and then machine got rebooted automatically. Not sure what has caused the scsi error. Machine is build with freebsd 12.0 version, I know it is not supported but need to know what is causing the error. Experts please shed a bright light here to fix it, it will be grateful.

pool with mirror of 2TB disks, multipath configured. Disk health is fine, checked through smartctl.
Code:
Mar 17 15:59:52 DHAGKTMNODE2 kernel: (da21:mpr0:0:76:0): CAM status: SCSI Status Error
Mar 17 15:59:52 DHAGKTMNODE2 kernel: (da21:mpr0:0:76:0): SCSI status: Reservation Conflict
Mar 17 15:59:52 DHAGKTMNODE2 kernel: (da21:mpr0:0:76:0): Error 5, Unretryable error
Mar 17 15:59:52 DHAGKTMNODE2 kernel: (da21:mpr0:0:76:0): WRITE(10). CDB: 2a 00 d2 a1 86 10 00 00 08 00
boot time Wed Mar 17 16:07 (Took it from last command output)

From devd log, could see freezing devq log many of these:

Mar 17 16:59:19 DHAGKTMNODE2 kernel: mpr0: mprsas_action_scsiio: Freezing devq for target ID 202

During this scsi error, could see zfs was doing sync operation,
Code:
Mar 17 16:50:24 DHAGKTMNODE2 kernel: sync for DH_Storage_Pool2 took 20303ms, txg:11062506 passes: 20096, pass1: 19412, vdev config update: 65, sync done: 3
Mar 17 16:53:47 DHAGKTMNODE2 kernel: sync for DH_Storage_Pool2 took 29279ms, txg:11062553 passes: 20632, pass1: 19814, vdev config update: 70, sync done: 2
Mar 17 16:54:20 DHAGKTMNODE2 kernel: sync for DH_Storage_Pool2 took 22749ms, txg:11062561 passes: 22556, pass1: 19681, vdev config update: 123, sync done: 1
Mar 17 16:59:13 DHAGKTMNODE2 kernel: sync for DH_Storage_Pool2 took 15765ms, txg:11062681 passes: 15563, pass1: 15069, vdev config update: 109, sync done: 4
 
Adding another set of error from devlog.

Code:
Wed Mar 17 15:59:45 2021 : !system=ZFS subsystem=ZFS type=ereport.fs.zfs.probe_failure  class=ereport.fs.zfs.probe_failure ena=15043313915320608769 pool=DH_Storage_Pool2 pool_guid=4932231606721651745 pool_context=0 pool_failmode=continue vdev_guid=14843815605522272636 vdev_type=disk vdev_path=/dev/multipath/37674-xjW0 parent_guid=17651707294085808965 parent_type=mirror prev_state=0
Wed Mar 17 15:59:46 2021 : !system=ZFS subsystem=ZFS type=ereport.fs.zfs.probe_failure  class=ereport.fs.zfs.probe_failure ena=15043953545794551809 pool=DH_Storage_Pool2 pool_guid=4932231606721651745 pool_context=0 pool_failmode=continue vdev_guid=14204739948752811285 vdev_type=disk vdev_path=/dev/multipath/37672-xjW0 parent_guid=2016141952207378883 parent_type=mirror prev_state=0
Wed Mar 17 15:59:46 2021 : !system=ZFS subsystem=ZFS type=ereport.fs.zfs.probe_failure  class=ereport.fs.zfs.probe_failure ena=15044249244197200897 pool=DH_Storage_Pool2 pool_guid=4932231606721651745 pool_context=0 pool_failmode=continue vdev_guid=7398236981593665038 vdev_type=disk vdev_path=/dev/multipath/37666-xjW0 parent_guid=2476058577896223538 parent_type=mirror prev_state=0
Wed Mar 17 15:59:46 2021 : !system=ZFS subsystem=ZFS type=ereport.fs.zfs.probe_failure  class=ereport.fs.zfs.probe_failure ena=15044512437053175809 pool=DH_Storage_Pool2 pool_guid=4932231606721651745 pool_context=0 pool_failmode=continue vdev_guid=2908492550656596063 vdev_type=disk vdev_path=/dev/multipath/37683-xjW0 parent_guid=4124375689250620131 parent_type=mirror prev_state=0
Wed Mar 17 15:59:47 2021 : !system=ZFS subsystem=ZFS type=ereport.fs.zfs.probe_failure  class=ereport.fs.zfs.probe_failure ena=15044998976159881217 pool=DH_Storage_Pool2 pool_guid=4932231606721651745 pool_context=0 pool_failmode=continue vdev_guid=10918247806717777362 vdev_type=disk vdev_path=/dev/multipath/37661-xjW0 parent_guid=2842548206687464147 parent_type=mirror prev_state=0
 
My first question was going to be: Who the heck is using SCSI reservation? That's very unusual, and setting up and using SCSI reservations is a mine field. Then I saw one key word in your post: "multipath". You have some sort of multipath configured! And clearly whatever multipath you are using is either broken or misconfigured.

Mar 17 15:59:52 DHAGKTMNODE2 kernel: (da21:mpr0:0:76:0): CAM status: SCSI Status Error
Mar 17 15:59:52 DHAGKTMNODE2 kernel: (da21:mpr0:0:76:0): SCSI status: Reservation Conflict
Mar 17 15:59:52 DHAGKTMNODE2 kernel: (da21:mpr0:0:76:0): Error 5, Unretryable error
Mar 17 15:59:52 DHAGKTMNODE2 kernel: (da21:mpr0:0:76:0): WRITE(10). CDB: 2a 00 d2 a1 86 10 00 00 08 00
That is perfectly logical. All it says is that this host tried to write to the disk (it issued the command 2A, which is a 10-byte write command), and the disk replied with "reservation error", meaning another I_T nexus (probably the other multipath port?) has the disk reserved.

Debug your multipath configuration.
 
Back
Top