Lost Device

I have found that I have a disk drive that is being lost and then reconnected. It is reconnected in about seven seconds after being lost. As a result there has never been a problen when the drive is accessed. I found the message in /var/log/messages. This has happened a couple of times that I am now aware off. I have run fsck /ad24 on the device and it returns clean. The device is used for backups and each time the error occurs the disk has been in an idle state and not being accessed so there has been no errors that was manifested. I have not been able to find any adequate information as to a cause or fix.

What is the cause of this?
Do I have a bad drive that needs to be replaced?

Any ideas?

Here is the environment
Code:
Copyright (c) 1992-2012 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 9.1-RELEASE #0 r243825: Tue Dec  4 09:23:10 UTC 2012
    root@farrell.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64
CPU: AMD FX(tm)-4130 Quad-Core Processor             (3817.46-MHz K8-class CPU)
  Origin = "AuthenticAMD"  Id = 0x600f12  Family = 15  Model = 1  Stepping = 2
  Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x1e98220b<SSE3,PCLMULQDQ,MON,SSSE3,CX16,SSE4.1,SSE4.2,POPCNT,AESNI,XSAVE,OSXSAVE,AVX>
  AMD Features=0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM>
  AMD Features2=0x1c9bfff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,XOP,SKINIT,WDT,LWP,FMA4,NodeId,Topology,<b23>,<b24>>
  TSC: P-state invariant, performance statistics
real memory  = 8589934592 (8192 MB)
avail memory = 8189784064 (7810 MB)
Event timer "LAPIC" quality 400
ACPI APIC Table: <ALASKA A M I>
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 1 package(s) x 4 core(s)
 cpu0 (BSP): APIC ID: 16
 cpu1 (AP): APIC ID: 17
 cpu2 (AP): APIC ID: 18
 cpu3 (AP): APIC ID: 19

Here is the problem
Code:
Jun 19 03:21:41 Phaedra kernel: (ada5:(pass6:ahcich9:0:ahcich9:0:0:0:0): lost device
Jun 19 03:21:41 Phaedra kernel: 0): passdevgonecb: devfs entry is gone
Jun 19 03:21:41 Phaedra kernel: (ada5:ahcich9:0:0:0): removing device entry
Jun 19 03:21:48 Phaedra kernel: ada5 at ahcich9 bus 0 scbus10 target 0 lun 0
Jun 19 03:21:48 Phaedra kernel: ada5: <WDC WD1002FAEX-00Z3A0 05.01D05> ATA-8 SATA 3.x device
Jun 19 03:21:48 Phaedra kernel: ada5: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
Jun 19 03:21:48 Phaedra kernel: ada5: Command Queueing enabled
Jun 19 03:21:48 Phaedra kernel: ada5: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
Jun 19 03:21:48 Phaedra kernel: ada5: Previously was known as ad24

Here is /etc/fstab
Code:
# Device        Mountpoint      FStype  Options Dump    Pass#
/dev/ada0p2     /               ufs     rw      1       1
/dev/ada0p3     none            swap    sw      0       0
/dev/ad18s1     /storage1       ufs     rw      1       1
/dev/ad20s1     /storage2       ufs     rw      1       1
/dev/ad14s1     /storage3       ufs     rw      1       1
/dev/ad16s1     /storage4       ufs     rw      1       1
/dev/ad24s1     /storage5       ufs     rw      1       1
/dev/cd0        /cdrom          cd9660  ro,noauto 0     0
/dev/da0s1      /flash          msdosfs rw,noauto 0     0
/dev/da0s1      /flash0         msdosfs rw,noauto 0     0
/dev/da1s1      /flash1         msdosfs rw,noauto 0     0
/dev/da2s1      /flash2         msdosfs rw,noauto 0     0
/dev/da3s1      /flash3         msdosfs rw,noauto 0     0
//nobody@Prometheus/public /Prometheus_public smbfs rw,noauto -N  0  0
 
I have disabled the power management so let's see what that does.

One question. Could the drive be dropped while it is being accessed?
 
If there is a problem with the drive or controller, probably. But the most common cause of disconnect problems is the drive going to sleep and the controller or operating system overreacting.
 
An update, if anybody is interested.

The drive has been disconnecting so I tried the following; Change cable - same, Attached to different SATA port on motherboard - same.

I then noticed that I had a reboot.
Code:
Aug  1 10:08:46 Phaedra kernel: Trying to mount root from ufs:/dev/ada0p2 [rw]...
Aug  1 10:08:46 Phaedra kernel: WARNING: / was not properly dismounted
Aug  1 10:08:46 Phaedra kernel: WARNING: /storage1 was not properly dismounted
Aug  1 10:08:46 Phaedra kernel: WARNING: /storage2 was not properly dismounted
Aug  1 10:08:46 Phaedra kernel: WARNING: /storage3 was not properly dismounted
Aug  1 10:08:46 Phaedra kernel: WARNING: /storage4 was not properly dismounted
Aug  1 10:08:46 Phaedra kernel: WARNING: /storage5 was not properly dismounted
Aug  1 10:08:46 Phaedra savecore: reboot after panic: softdep_deallocate_dependencies: unrecovered I/O error
Aug  1 10:08:46 Phaedra savecore: writing core to vmcore.1
Aug  1 10:08:49 Phaedra kernel: re0: link state changed to UP
Aug  1 10:09:18 Phaedra dbus[1836]: [system] Activating service name='org.freedesktop.ConsoleKit' (using servicehelper)
Aug  1 10:09:18 Phaedra dbus[1836]: [system] Activating service name='org.freedesktop.PolicyKit1' (using servicehelper)
Aug  1 10:09:18 Phaedra dbus[1836]: [system] Successfully activated service 'org.freedesktop.PolicyKit1'
Aug  1 10:09:18 Phaedra dbus[1836]: [system] Successfully activated service 'org.freedesktop.ConsoleKit'
Aug  1 10:09:28 Phaedra ntpdate[2019]: ntpdate 4.2.4p5-a (1)
Aug  1 10:09:23 Phaedra ntpdate[2019]: step time server 198.123.30.132 offset -5.318068 sec
Aug  1 10:09:23 Phaedra ntpdate[2020]: ntpdate 4.2.4p5-a (1)
Aug  1 10:09:23 Phaedra ntpdate[2020]: adjust time server 132.163.4.102 offset 0.001558 sec
Aug  1 10:14:03 Phaedra fsck: /dev/ad18s1: 96 files, 12093644 used, 228257286 free (70 frags, 28532152 blocks, 0.0% fragmentation)
Aug  1 10:18:43 Phaedra fsck: /dev/ad20s1: 181472 files, 20941437 used, 97320790 free (7038 frags, 12164219 blocks, 0.0% fragmentation)
Aug  1 10:19:05 Phaedra fsck: /dev/ad14s1: 150 files, 7674 used, 35470811 free (235 frags, 4433822 blocks, 0.0% fragmentation)
Aug  1 10:37:21 Phaedra kernel: (ada5:ahcich9:0:0:0): lost device
Aug  1 10:37:21 Phaedra kernel: (pass6:ahcich9:0:0:0): passdevgonecb: devfs entry is gone
Aug  1 10:37:21 Phaedra kernel: (ada5:ahcich9:0:0:0): removing device entry
Aug  1 10:37:27 Phaedra kernel: ada5 at ahcich9 bus 0 scbus10 target 0 lun 0
Aug  1 10:37:27 Phaedra kernel: ada5: <WDC WD1002FAEX-00Z3A0 05.01D05> ATA-8 SATA 3.x device
Aug  1 10:37:27 Phaedra kernel: ada5: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
Aug  1 10:37:27 Phaedra kernel: ada5: Command Queueing enabled
Aug  1 10:37:27 Phaedra kernel: ada5: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
Aug  1 10:37:27 Phaedra kernel: ada5: Previously was known as ad24
Aug  1 10:51:01 Phaedra fsck: /dev/ad16s1: 27665 files, 369402426 used, 576622732 free (1668 frags, 72077633 blocks, 0.0% fragmentation)
Aug  1 10:51:01 Phaedra kernel: (ada5:(pass6:ahcich9:0:ahcich9:0:0:0:0): lost device
Aug  1 10:51:01 Phaedra kernel: 0): passdevgonecb: devfs entry is gone
Aug  1 10:51:02 Phaedra kernel: (ada5:ahcich9:0:0:0): removing device entry
Aug  1 10:51:06 Phaedra kernel: ada5 at ahcich9 bus 0 scbus10 target 0 lun 0
Aug  1 10:51:06 Phaedra kernel: ada5: <WDC WD1002FAEX-00Z3A0 05.01D05> ATA-8 SATA 3.x device
Aug  1 10:51:06 Phaedra kernel: ada5: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
Aug  1 10:51:06 Phaedra kernel: ada5: Command Queueing enabled
Aug  1 10:51:06 Phaedra kernel: ada5: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
Aug  1 10:51:06 Phaedra kernel: ada5: Previously was known as ad24

I did a df -H and found that the capacity was 100%. This indicated a clean drive. No data!
I did a ls -la /storage5 and found no files or directories, not even the . files. Oh no I lost everything!
I unmounted the drive and then mounted it. Message was that is was not properly dismounted.
I ran fsck /storage5 and it ended with a mess of errors that was fixed.
I mounted the drive and it created a lost+found.
Now I can see data stored on the drive, but I am not sure of the integrity of it.

As I said before this is a backup drive only. The panic occurred when doing a backup, writes to the disk.
Bad disk! Bad, bad disk!

That is unless anybody has any other ideas.

Keith
 
Back
Top