Dear readers,
I got another crash, with the
mpslsi driver.
NFS services were disrupted, so people noticed it whatever the failure was. On previous crashes, with losing one disk, there was no disruption of service, except when other issues were combined, such as:
-doing [cmd=]ls /some/nfs/mount/.zfs/snapshot[/cmd] which would hang whichever NFS mounted file system you are listing.
-the [cmd=]gpart recover ...[/cmd] causing a panic mentioned above.
This time someone else ended up rebooting it, so I don't know if it lost a disk or what. But the root system was online, so someone could log in and run [cmd=]shutdown -r now[/cmd] And when the system came back up, zfs had 1 checksum error on a root disk, just like other crashes.
And note that this was the third crash, all three of which were on a Friday night. So on Monday, I'll try to figure out what it is about Fridays.
@Sebulon
Maybe you can tell me more about what periodic was doing for you and what I should change to fix it like you did. I already added
Code:
daily_status_security_chksetuid_enable="NO"
to my
/etc/periodic.conf
long before the crash, but didn't reboot, or restart periodic or anything. Is a restart needed?
Unlike the last time with
mpslsi, there is no line like this (which was the last line, on a second server, after which things seemed fine):
Code:
Nov 3 09:17:17 bcnas1bak kernel: (da0:mpslsi0:0:10:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Here is a bunch of the log, including the shutdown and the annoying "shutdown terminated abnormally":
Code:
Nov 18 16:58:46 bcnas1 kernel: mpslsi0: mpssas_scsiio_timeout checking sc 0xffffff800f629000 cm 0xffffff800f669c40
Nov 18 17:10:42 bcnas1 kernel: (da10:mpslsi0:0:21:0): WRITE(10). CDB: 2a 0 13 32 a1 2a 0 1 0 0 length 131072 SMID 888 command timeout cm
0xffffff800f669c40 ccb 0xffffff0037d1e000
Nov 18 17:10:42 bcnas1 kernel: mpslsi0: mpssas_alloc_tm freezing simq
Nov 18 17:10:42 bcnas1 kernel: mpslsi0: timedout cm 0xffffff800f669c40 allocated tm 0xffffff800f6340f8
Nov 18 17:10:42 bcnas1 kernel: mpslsi0: mpssas_scsiio_timeout checking sc 0xffffff800f629000 cm 0xffffff800f668218
Nov 18 17:10:42 bcnas1 kernel: (da10:mpslsi0:0:21:0): WRITE(10). CDB: 2a 0 13 32 a2 2a 0 1 0 0 length 131072 SMID 861 command timeout cm
0xffffff800f668218 ccb 0xffffff03d58a9800
Nov 18 17:10:42 bcnas1 kernel: mpslsi0: queued timedout cm 0xffffff800f668218 for processing by tm 0xffffff800f6340f8
Nov 18 17:10:42 bcnas1 kernel: mpslsi0: mpssas_scsiio_timeout checking sc 0xffffff800f629000 cm 0xffffff800f670d98
Nov 18 17:10:42 bcnas1 kernel: (da10:mpslsi0:0:21:0): WRITE(10). CDB: 2a 0 13 32 a3 2a 0 1 0 0 length 131072 SMID 1005 command timeout cm
0xffffff800f670d98 ccb 0xffffff0026b7a800
Nov 18 17:10:42 bcnas1 kernel: mpslsi0: queued timedout cm 0xffffff800f670d98 for processing by tm 0xffffff800f6340f8
Nov 18 17:10:42 bcnas1 kernel: mpslsi0: mpssas_scsiio_timeout checking sc 0xffffff800f629000 cm 0xffffff800f6457f8
Nov 18 17:10:42 bcnas1 kernel: (da10:mpslsi0:0:21:0): WRITE(10). CDB: 2a 0 13 32 a4 2a 0 1 0 0 length 131072 SMID 289 command timeout cm
0xffffff800f6457f8 ccb 0xffffff003d4da000
Nov 18 17:10:42 bcnas1 kernel: mpslsi0: queued timedout cm 0xffffff800f6457f8 for processing by tm 0xffffff800f6340f8
Nov 18 17:10:42 bcnas1 kernel: mpslsi0: mpssas_scsiio_timeout checking sc 0xffffff800f629000 cm 0xffffff800f662ae8
Nov 18 17:10:42 bcnas1 kernel: (da10:mpslsi0:0:21:0): SYNCHRONIZE CACHE(10). CDB: 35 0 0 0 0 0 0 0 0 0 length 0 SMID 771 command timeout cm
0xffffff800f662ae8 ccb 0xffffff0031968800
Nov 18 17:10:42 bcnas1 kernel: mpslsi0: queued timedout cm 0xffffff800f662ae8 for processing by tm 0xffffff800f6340f8
Nov 18 17:10:42 bcnas1 kernel: (da10:mpslsi0:0:21:0): WRITE(10). CDB: 2a 0 2 41 a5 bf 0 0 18 0 length 12288 SMID 345 completed cm
0xffffff800f648e38 ccb 0xffffff03d5887000 during recovery ioc 804b scsi 0 state c xfer 0
Nov 18 17:10:42 bcnas1 kernel: (da10:mpslsi0:0:21:0): WRITE(10). CDB: 2a 0 2 41 a5 bf 0 0 18 0 length 12288 SMID 345 terminated ioc 804b scsi
0 state c xfer 0
Nov 18 17:10:42 bcnas1 kernel: (da10:mpslsi0:0:21:0): SYNCHRONIZE CACHE(10). CDB: 35 0 0 0 0 0 0 0 0 0 length 0 SMID 771 completed timedout cm
0xffffff800f662ae8 ccb 0xffffff0031968800 during recovery ioc 804b scsi 0 state(da10:mpslsi0:0:21:0): SYNCHRONIZE CACHE(10). CDB: 35 0 0 0 0 0
0 0 0 0 length 0 SMID 771 terminated ioc 804b scsi 0 state c xfer 0
Nov 18 17:10:42 bcnas1 kernel: (da10:mpslsi0:0:21:0): WRITE(10). CDB: 2a 0 13 32 a1 2a 0 1 0 0 length 131072 SMID 888 completed timedout cm
0xffffff800f669c40 ccb 0xffffff0037d1e000 during recovery ioc 8048 scsi 0 state c (da10:mpslsi0:0:21:0): WRITE(10). CDB: 2a 0 13 32 a3 2a 0 1
0 0 length 131072 SMID 1005 completed timedout cm 0xffffff800f670d98 ccb 0xffffff0026b7a800 during recovery ioc 804b scsi 0 state
c(da10:mpslsi0:0:21:0): WRITE(10). CDB: 2a 0 13 32 a3 2a 0 1 0 0 length 131072 SMID 1005 terminated ioc 804b scsi 0 state c xfer 0
Nov 18 17:10:42 bcnas1 kernel: (da10:mpslsi0:0:21:0): WRITE(10). CDB: 2a 0 13 32 a2 2a 0 1 0 0 length 131072 SMID 861 completed timedout cm
0xffffff800f668218 ccb 0xffffff03d58a9800 during recovery ioc 804b scsi 0 state c (da10:mpslsi0:0:21:0): WRITE(10). CDB: 2a 0 13 32 a2 2a 0 1
0 0 length 131072 SMID 861 terminated ioc 804b scsi 0 state c xfer 0
Nov 18 17:10:42 bcnas1 kernel: (da10:mpslsi0:0:21:0): WRITE(10). CDB: 2a 0 13 32 a4 2a 0 1 0 0 length 131072 SMID 289 completed timedout cm
0xffffff800f6457f8 ccb 0xffffff003d4da000 during recovery ioc 804b scsi 0 state c (da10:mpslsi0:0:21:0): WRITE(10). CDB: 2a 0 13 32 a4 2a 0 1
0 0 length 131072 SMID 289 terminated ioc 804b scsi 0 state c xfer 0
Nov 18 17:10:42 bcnas1 kernel: (noperiph:mpslsi0:0:21:0): SMID 1 abort TaskMID 888 status 0x4a code 0x0 count 6
Nov 18 17:10:42 bcnas1 kernel: (noperiph:mpslsi0:0:21:0): SMID 1 finished recovery after aborting TaskMID 888
Nov 18 17:10:42 bcnas1 kernel: mpslsi0: mpssas_free_tm releasing simq
Nov 18 17:10:42 bcnas1 kernel: mpslsi0: mpssas_scsiio_timeout checking sc 0xffffff800f629000 cm 0xffffff800f6513e8
Nov 18 17:10:42 bcnas1 kernel: (da10:mpslsi0:0:21:0): READ(10). CDB: 28 0 2 40 3b 14 0 0 21 0 length 16896 SMID 483 command timeout cm
0xffffff800f6513e8 ccb 0xffffff0026bd2000
Nov 18 17:10:42 bcnas1 kernel: mpslsi0: mpssas_alloc_tm freezing simq
...
Nov 18 17:39:46 bcnas1 kernel: (da10:mpslsi0:0:21:0): WRITE(10). CDB: 2a 0 a 12 79 0 0 0 88 0 length 69632 SMID 252 completed timedout cm
0xffffff800f643420 ccb 0xffffff05b6b56800 during recovery ioc 804b scsi 0 state c xf(da10:mpslsi0:0:21:0): WRITE(10). CDB: 2a 0 a 12 79 0 0 0
88 0 length 69632 SMID 252 terminated ioc 804b scsi 0 state c xfer 0
Nov 18 17:39:46 bcnas1 kernel: (da10:mpslsi0:0:21:0): WRITE(10). CDB: 2a 0 a 12 7a 0 0 0 88 0 length 69632 SMID 159 completed timedout cm
0xffffff800f63da08 ccb 0xffffff05ada40000 during recovery ioc 804b scsi 0 state c xf(da10:mpslsi0:0:21:0): WRITE(10). CDB: 2a 0 a 12 7a 0 0 0
88 0 length 69632 SMID 159 terminated ioc 804b scsi 0 state c xfer 0
Nov 18 17:39:46 bcnas1 kernel: (da10:mpslsi0:0:21:0): WRITE(10). CDB: 2a 0 a 12 7b 0 0 0 88 0 length 69632 SMID 572 completed timedout cm
0xffffff800f656a20 ccb 0xffffff0026be3000 during recovery ioc 804b scsi 0 state c xf(da10:mpslsi0:0:21:0): WRITE(10). CDB: 2a 0 a 12 7b 0 0 0
88 0 length 69632 SMID 572 terminated ioc 804b scsi 0 state c xfer 0
Nov 18 17:39:46 bcnas1 kernel: (noperiph:mpslsi0:0:21:0): SMID 39 abort TaskMID 304 status 0x4a code 0x0 count 32
Nov 18 17:39:46 bcnas1 kernel: (noperiph:mpslsi0:0:21:0): SMID 39 finished recovery after aborting TaskMID 304
Nov 18 17:39:46 bcnas1 kernel: mpslsi0: mpssas_free_tm releasing simq
Nov 18 17:39:56 bcnas1 shutdown: reboot by uwe:
Nov 18 17:39:58 bcnas1 ntpd[68834]: ntpd exiting on signal 15
Nov 18 17:40:28 bcnas1 rc.shutdown: 30 second watchdog timeout expired. Shutdown terminated.
Nov 18 17:40:28 bcnas1 init: /bin/sh on /etc/rc.shutdown terminated abnormally, going to single user mode
Nov 18 17:40:28 bcnas1 syslogd: exiting on signal 15
Peter