Since upgrade from freebsd11.2 to 12 loads of zfs failure !?

Ofloo · Jan 29, 2019

Code:

mfisyspd2: 2861588MB (5860533168 sectors) SYSPD volume (deviceid: 11)
mfisyspd2:  SYSPD volume attached
mfisyspd3 on mfi0
mfisyspd3: 9537536MB (19532873728 sectors) SYSPD volume (deviceid: 12)
mfisyspd3:  SYSPD volume attached
mfisyspd4 on mfi0
mfisyspd4: 9537536MB (19532873728 sectors) SYSPD volume (deviceid: 13)
mfisyspd4:  SYSPD volume attached
mfisyspd5 on mfi0
mfisyspd5: 9537536MB (19532873728 sectors) SYSPD volume (deviceid: 14)
mfisyspd5:  SYSPD volume attached
mfisyspd6 on mfi0
mfisyspd6: 3815447MB (7814037168 sectors) SYSPD volume (deviceid: 15)
mfisyspd6:  SYSPD volume attached
mfisyspd7 on mfi0
mfisyspd7: 3815447MB (7814037168 sectors) SYSPD volume (deviceid: 16)
mfisyspd7:  SYSPD volume attached
mfi0: 2721 (boot + 27s/0x0002/info) - Inserted: PD 09(e0x3e/s2) Info: enclPd=3e, scsiType=0, portMap=06, sasAddr=4433221105000000,0000000000000000
ipmi0: mfi0: 2722 (boot + 27s/0x0002/info) - Inserted: PD 0a(e0x3e/s0)
IPMI device rev. 1, firmware rev. 3.27, version 2.0, device support mask 0xbf
ipmi0: Number of channels 2
mfi0: 2723 (boot + 27s/0x0002/info) - Inserted: PD 0a(e0x3e/s0) Info: enclPd=3e, scsiType=0, portMap=05, sasAddr=4433221106000000,00ipmi0: 00000000000000
Attached watchdog
ipmi0: Establishing power cycle handler
mfi0: 2724 (boot + 27s/0x0002/info) - Inserted: PD 0b(e0x3e/s1)
mfi0: 2725 (boot + 27s/0x0002/info) - Inserted: PD 0b(e0x3e/s1) Info: enclPd=3e, scsiType=0, portMap=07, sasAddr=4433221107000000,0000000000000000
mfi0: 2726 (boot + 27s/0x0002/info) - Inserted: PD 0c(e0x3e/s7)
mfi0: 2727 (boot + 27s/0x0002/info) - Inserted: PD 0c(e0x3e/s7) Info: enclPd=3e, scsiType=0, portMap=00, sasAddr=4433221100000000,0000000000000000
mfi0: 2728 (boot + 27s/0x0002/info) - Inserted: PD 0d(e0x3e/s3)
mfi0: 2729 (boot + 27s/0x0002/info) - Inserted: PD 0d(e0x3e/s3) Info: enclPd=3e, scsiType=0, portMap=03, sasAddr=4433221104000000,0000000000000000
mfi0: 2730 (boot + 27s/0x0002/info) - Inserted: PD 0e(e0x3e/s6)
mfi0: 2731 (boot + 27s/0x0002/info) - Inserted: PD 0e(e0x3e/s6) Info: enclPd=3e, scsiType=0, portMap=04, sasAddr=4433221101000000,0000000000000000
mfi0: 2732 (boot + 27s/0x0002/info) - Inserted: PD 0f(e0x3e/s4)
Trying to mount root from zfs:zroot/ROOT/default []...
mfi0: 2733 (boot + 27s/0x0002/info) - Inserted: PD 0f(e0x3e/s4) Info: enclPd=3e, scsiType=0, portMap=01, sasAddr=4433221102000000,0000000000000000
mfi0: 2734 (boot + 27s/0x0002/info) - Inserted: PD 10(e0x3e/s5)
mfi0: 2735 (boot + 27s/0x0002/info) - Inserted: PD 10(e0x3e/s5) Info: enclPd=3e, scsiType=0, portMap=02, sasAddr=4433221103000000,0000000000000000
mfi0: 2736 (boot + 28s/0x0020/info) - Controller operating temperature within normal range, full operation restored
mfi0: 2737 (602080052s/0x0020/info) - Time established as 01/29/19 12:27:32; (28 seconds since power on)

Ofloo · Jan 29, 2019

No prior errors today upgrade to freebsd12, zfs upgrade -a now nothing but !!!!!!

Code:

  pool: tank
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
    attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
    using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: scrub repaired 0 in 3 days 12:43:04 with 0 errors on Wed Jan 23 16:42:31 2019
config:

    NAME           STATE     READ WRITE CKSUM
    tank           ONLINE       0     0     0
      mirror-0     ONLINE       0     0     0
        gpt/tank0  ONLINE       0     0    12
        gpt/tank1  ONLINE       0     0     6
        gpt/tank2  ONLINE       0     0     3
    logs
      mirror-1     ONLINE       0     0     0
        gpt/log0   ONLINE       0     0     0
        gpt/log1   ONLINE       0     0     0
    cache
      gpt/cache0   ONLINE       0     0     0
      gpt/cache1   ONLINE       0     0     0

errors: No known data errors

Ofloo · Jan 29, 2019

Something is off i do a scrub and look that's 3 sata drives 2.30G/s !?

Code:

zpool status tank
  pool: tank
state: ONLINE
  scan: scrub in progress since Tue Jan 29 14:34:57 2019
    106G scanned at 2.30G/s, 1.11M issued at 24.8K/s, 3.28T total
    0 repaired, 0.00% done, no estimated completion time

Ofloo · Jan 30, 2019

Code:

mfi0@pci0:2:0:0:    class=0x010400 card=0x04561014 chip=0x005f1000 rev=0x02 hdr=0x00
    vendor     = 'LSI Logic / Symbios Logic'
    device     = 'MegaRAID SAS-3 3008 [Fury]'
    class      = mass storage
    subclass   = RAID

Code:

mfisyspd2: hard error cmd=read 4845273709-4845274260
mfi0: I/O error, cmd=0xfffffe00042b47f8, status=0x3c, scsi_status=0
mfi0: sense error 0, sense_key 0, asc 0, ascq 0
mfisyspd2: hard error cmd=read 4845274261-4845274812
mfi0: I/O error, cmd=0xfffffe00042b7c08, status=0x3c, scsi_status=0
mfi0: sense error 0, sense_key 0, asc 0, ascq 0
mfisyspd2: hard error cmd=read 4845274813-4845275364
done
Waiting (max 60 seconds) for system thread `bufspacedaemon-4' to stop... mfi0: I/O error, cmd=0xfffffe00042b4000, status=0x3c, scsi_status=0
mfi0: sense error 0, sense_key 0, asc 0, ascq 0
mfisyspd3: hard error cmd=read 15841268517-15841269068
mfi0: I/O error, cmd=0xfffffe00042b6640, status=0x3c, scsi_status=0
mfi0: sense error 0, sense_key 0, asc 0, ascq 0
mfisyspd3: hard error cmd=read 15841269069-15841269604
mfi0: I/O error, cmd=0xfffffe00042b4a18, status=0x3c, scsi_status=0
mfi0: sense error 0, sense_key 0, asc 0, ascq 0
mfisyspd3: hard error cmd=read 15841271349-15841271900
done
Waiting (max 60 seconds) for system thread `bufspacedaemon-5' to stop... mfi0: I/O error, cmd=0xfffffe00042b42a8, status=0x3c, scsi_status=0
mfi0: sense error 0, sense_key 0, asc 0, ascq 0
mfisyspd4: hard error cmd=read 19230759640-19230760191
mfi0: I/O error, cmd=0xfffffe00042b4dd0, status=0x3c, scsi_status=0
mfi0: sense error 0, sense_key 0, asc 0, ascq 0
mfisyspd5: hard error cmd=read 19231135032-19231135543
mfi0: I/O error, cmd=0xfffffe00042b8158, status=0x3c, scsi_status=0
mfi0: sense error 0, sense_key 0, asc 0, ascq 0
mfisyspd3: hard error cmd=read 15841313037-15841313588

Ofloo · Jan 30, 2019

At boot i see

Code:

zfs: i/o error - all block copies unavailable

i did

Code:

gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada0
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada1

Jiri Smejkal · Feb 2, 2019

I have similar problem. After one month upgrade from 11.2 to 12.0 (SuperMicro, two procs, 128GB RAM machine) four of six disk start failing.
There are three mirror zfs pools. I coultn't reboot machine (TRAP12 after reboot). I changed AVAGO 3108 controler (all disks JBOD) to another. After boot, scrub all pools. There was many errors on all three pools, all repaired without any issue. One day run after change controller, there are no problem.
I'm not sure, if the controller is bad. Tests will follow.

Jiri

mfisyspd4: hard error cmd=read 4970160086-4970160637
mfi0: I/O error, cmd=0xfffffe00f96102a8, status=0x3c, scsi_status=0
mfi0: sense error 59, sense_key 4, asc 0, ascq 0
mfisyspd4: hard error cmd=read 4970160638-4970161189
mfi0: I/O error, cmd=0xfffffe00f96128e8, status=0x3c, scsi_status=0
mfi0: sense error 65, sense_key 1, asc 77, ascq 133
mfisyspd4: hard error cmd=write 2416654514-2416655065
mfi0: I/O error, cmd=0xfffffe00f9610880, status=0x3c, scsi_status=0
mfi0: sense error 72, sense_key 2, asc 102, ascq 65
mfisyspd4: hard error cmd=write 2416655066-2416655617
mfi0: I/O error, cmd=0xfffffe00f9613d18, status=0x3c, scsi_status=0
mfi0: sense error 10, sense_key 4, asc 0, ascq 0
mfisyspd3: hard error cmd=write 3780939470-3780940021
mfi0: I/O error, cmd=0xfffffe00f9613058, status=0x3c, scsi_status=0
mfi0: sense error 65, sense_key 1, asc 233, ascq 73
mfisyspd3: hard error cmd=write 3605797064-3605797615
mfi0: I/O error, cmd=0xfffffe00f9610dd0, status=0x3c, scsi_status=0
mfi0: sense error 64, sense_key 9, asc 0, ascq 0
mfisyspd4: hard error cmd=write 2624874302-2624874853
mfi0: I/O error, cmd=0xfffffe00f96103b8, status=0x3c, scsi_status=0
mfi0: sense error 11, sense_key 8, asc 72, ascq 59
mfisyspd2: hard error cmd=write 3780940726-3780941277
mfi0: I/O error, cmd=0xfffffe00f9612530, status=0x3c, scsi_status=0
mfi0: sense error 71, sense_key 6, asc 86, ascq 72
mfisyspd2: hard error cmd=write 3780941278-3780941829
mfi0: I/O error, cmd=0xfffffe00f9613b80, status=0x3c, scsi_status=0
mfi0: sense error 1, sense_key 7, asc 0, ascq 0
mfisyspd2: hard error cmd=write 3605802140-3605802691
mfi0: I/O error, cmd=0xfffffe00f96135a8, status=0x3c, scsi_status=0
mfi0: sense error 3, sense_key 4, asc 52, ascq 64
mfisyspd2: hard error cmd=write 3780942845-3780943396
mfi0: I/O error, cmd=0xfffffe00f9611430, status=0x3c, scsi_status=0
mfi0: sense error 73, sense_key 11, asc 0, ascq 0
mfisyspd2: hard error cmd=write 3780943660-3780944211
mfi0: I/O error, cmd=0xfffffe00f9611dc0, status=0x3c, scsi_status=0
mfi0: sense error 88, sense_key 11, asc 192, ascq 2
mfisyspd3: hard error cmd=write 3780943660-3780944211

ralphbsz · Feb 2, 2019

Both are not ZFS problems, but problems in the underlying communication between the computer (more accurately: the SCSI=SAS HBA) and the disk. In both cases, you are using the mfi(4) driver for a LSI controller; you might want to try switching to the mrsas(4) driver. If you search this forum, there is lots of discussion about using mfi versus mrsas, and I don't remember the details.

In Ofloo's case, the IO problem is strange: The SCSI stack does not report an actual error (ASC/ASCQ are both zero), and I don't know what status=0x3c means. In Jiri's case it's even strange: while ASC/ASCQ are also zero, there is a huge variety of crazy sense codes and keys. I don't think these are real IO errors, but some sort of communication problem.

Ofloo · Feb 2, 2019

Add hw.mfi.mrsas_enable=1 to /boot/loader.conf reboot and see if it gets better. One of my pools got destroyed because of it. Because after the upgrade somehow, scrub started up.

Jiri Smejkal · Feb 6, 2019

As I expected, old AVAGO 3108 controller was tested without any problems. Errors probably produced by driver/firmware issues. Also tried to change hw.mfi.mrsas_enable=1 (change swap at fstab from mfisyspd0p3,mfisyspd1p3 to da0p3 and da1p3), after reboot all zfs pools are O.K. without any issue. Now running on mrsas driver.

Ofloo · Feb 6, 2019

For me was the exact same problem, because of the driver issues, .. some pools got damaged, .. all is restored now though.