ZFS Root mount waiting for CAM

Hi,
My server has problems booting :
Code:
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
ahcich7: AHCI reset: device not ready after 31000ms (tfd = 00000080)
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
ahcich7: Poll timeout on slot 1 port 0
ahcich7: is 00000000 cs 00000002 ss 00000000 rs 00000002 tfd 80 serr 00000000 cmd 0004c117
(aprobe2:ahcich7:0:0:0): SOFT_RESET. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
(aprobe2:ahcich7:0:0:0): CAM status: Command timeout
(aprobe2:ahcich7:0:0:0): Error 5, Retries exhausted

But it eventually boots. I can't find what device is attached to ahcich7 :
Code:
[root@numenor ~]# grep ahcich7 /var/log/dmesg.boot-20231030
ahcich7: <AHCI channel> at channel 1 on ahci2
ahcich7: AHCI reset: device not ready after 31000ms (tfd = 00000080)
ahcich7: Poll timeout on slot 1 port 0
ahcich7: is 00000000 cs 00000002 ss 00000000 rs 00000002 tfd 80 serr 00000000 cmd 0004c117
(aprobe2:ahcich7:0:0:0): SOFT_RESET. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
(aprobe2:ahcich7:0:0:0): CAM status: Command timeout
(aprobe2:ahcich7:0:0:0): Error 5, Retries exhausted
Any idea on what is happening ? All pools are healthy, anyway
Thanks,
Regards,
Xavier
 
Thanks SirDice but how to identify the faulty drive ?
Code:
[root@numenor ~]# for I in $(seq 1 5) ; do smartctl -H /dev/ada$I; done
smartctl 7.4 2023-08-01 r5530 [FreeBSD 13.2-STABLE amd64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, [URL="http://www.smartmontools.org"]www.smartmontools.org[/URL]

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

smartctl 7.4 2023-08-01 r5530 [FreeBSD 13.2-STABLE amd64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, [URL="http://www.smartmontools.org"]www.smartmontools.org[/URL]

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

smartctl 7.4 2023-08-01 r5530 [FreeBSD 13.2-STABLE amd64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, [URL="http://www.smartmontools.org"]www.smartmontools.org[/URL]

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

smartctl 7.4 2023-08-01 r5530 [FreeBSD 13.2-STABLE amd64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, [URL="http://www.smartmontools.org"]www.smartmontools.org[/URL]

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

smartctl 7.4 2023-08-01 r5530 [FreeBSD 13.2-STABLE amd64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, [URL="http://www.smartmontools.org"]www.smartmontools.org[/URL]

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
 
Looks like the drive is so bad it doesn't even want to attach any more. It's going to be tricky finding it. Tick off the disks you are able to access, then the drive that's left over would be the broken one. I've also just checked the disk cabinets and looking for activity lights, the drives that are still good should have activity, the broken drive would not.
 
Code:
[root@numenor ~]# zpool status -v
  pool: backup
 state: ONLINE
  scan: scrub repaired 0B in 00:00:05 with 0 errors on Mon Oct  2 07:37:51 2023
config:

        NAME        STATE     READ WRITE CKSUM
        backup      ONLINE       0     0     0
          ada5p2    ONLINE       0     0     0

errors: No known data errors

  pool: multimedia
 state: ONLINE
  scan: scrub repaired 0B in 03:06:08 with 0 errors on Sun Oct 15 10:08:44 2023
config:

        NAME        STATE     READ WRITE CKSUM
        multimedia  ONLINE       0     0     0
          ada4p2    ONLINE       0     0     0

errors: No known data errors

  pool: nextcloud-data
 state: ONLINE
  scan: scrub repaired 0B in 00:06:16 with 0 errors on Mon Oct  2 07:44:04 2023
config:

        NAME            STATE     READ WRITE CKSUM
        nextcloud-data  ONLINE       0     0     0
          ada5p1        ONLINE       0     0     0

errors: No known data errors

  pool: timemachine
 state: ONLINE
  scan: scrub repaired 0B in 02:33:28 with 0 errors on Sun Oct 15 09:36:15 2023
config:

        NAME         STATE     READ WRITE CKSUM
        timemachine  ONLINE       0     0     0
          ada4p1     ONLINE       0     0     0

errors: No known data errors

  pool: zroot
 state: ONLINE
  scan: scrub repaired 0B in 09:40:25 with 0 errors on Sun Oct  1 16:29:11 2023
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            ada1p3  ONLINE       0     0     0
            ada2p3  ONLINE       0     0     0
            ada3p3  ONLINE       0     0     0
        logs
          ada0p1    ONLINE       0     0     0
        cache
          ada0p2    ONLINE       0     0     0

errors: No known data errors
All pools are healthy, I don't understand what happens, I have no other disk...
 
Found ! I removed and checked all disks against model list/ zpool status
One of my SSD is dead, completely dead. The pool zroot has it's second member moved on a HDD : ada2p3 -> WDC WD10EFRX-68JCSN0
Thanks SirDice !
Regards
Xavier
 
Back
Top