Hi -
Over the last couple years or so, I've been having issues every month or so (sometimes every week) where one of the two SATA disks in my FreeBSD box becomes detached. In the case of the disk that contains the swap partition, this results in a panic and the box reboots. In the case of the disk that contains /usr/home, I've been able to recover without a reboot by using atacontrol to reattach the ata device, and remount the filesystem.
This started around FreeBSD 6.x, and the box is now running FreeBSD 7.2-p3 with the same hardware. In fact, all hardware in the box (including disks) has been replaced since I initially thought this was hardware related, but apparently it is not. It's gone through 6.0, 6.1, 6.2, 7.0, 7.1, 7.2 upgrades, and all have experienced the same issues.
In the case where the disk holding /usr/home (ad12) detaches, here's the kernel messages:
I'm assuming the inodes error is just the kernel becoming confused since the filesystem is still mounted, but the disk has disappeared.
In the case of the disk holding the swap partition (ad8) detaching, it's a little different type of error. Sometimes the box hangs after tons of g_vfs_done errors:
Or it will try to write a dump file, error out, then reboot:
The hardware in the box is fairly standard. Core 2 Duo w/Intel ICH and 2x WD SATA disks.
Recently, I was told to try and change the disks to SATA150 from SATA300 via jumpers on the disks, citing SATA chipset firmware incompatibilities. I tried this, and it didn't resolve the issue.
Here's a dmesg.boot from the box, showing hardware, etc.:
http://www.prolixium.com/share/txt/freebsd/dmesg.boot.20100628.txt
Troubleshooting this is difficult because this is a dedicated server at a hosting provider a few states away. I do have serial console access (that is logged), though, and this is actually the only way I was able to see the reasons for the panics and hangs, as no logs could be written to the disk(s) when they're detached.
I submitted a bug for this quite awhile back, and it hasn't been touched:
http://www.freebsd.org/cgi/query-pr.cgi?pr=129426
I suspect this is due to the lack of information I'm able to provide, and no way of reproducing the problem on demand.
This box performs a variety of tasks: web/DNS/jabber server, IPv6 router, IPv4/IPv6 firewall, VPN termination, etc.
Any pointers where I should look, next? Ideas?
Thanks in advance!
- Mark
Over the last couple years or so, I've been having issues every month or so (sometimes every week) where one of the two SATA disks in my FreeBSD box becomes detached. In the case of the disk that contains the swap partition, this results in a panic and the box reboots. In the case of the disk that contains /usr/home, I've been able to recover without a reboot by using atacontrol to reattach the ata device, and remount the filesystem.
This started around FreeBSD 6.x, and the box is now running FreeBSD 7.2-p3 with the same hardware. In fact, all hardware in the box (including disks) has been replaced since I initially thought this was hardware related, but apparently it is not. It's gone through 6.0, 6.1, 6.2, 7.0, 7.1, 7.2 upgrades, and all have experienced the same issues.
In the case where the disk holding /usr/home (ad12) detaches, here's the kernel messages:
Code:
Device ad12s1d went missing before all of the data could be written to it; expect data loss.
Jun 26 01:36:41 dax kernel: pid 57353 (httpd), uid 80 inumber 19642453 on /usr/home: out of inodes
Jun 26 01:41:10 dax kernel: pid 44038 (rtorrent), uid 1000 inumber 12718081 on /usr/home: out of inodes
Jun 26 01:44:36 dax kernel: pid 8034 (httpd), uid 80 inumber 19642453 on /usr/home: out of inodes
Jun 26 01:44:46 dax kernel: pid 38672 (httpd), uid 80 inumber 19642453 on /usr/home: out of inodes
Jun 26 01:44:56 dax kernel: pid 21014 (httpd), uid 80 inumber 19642453 on /usr/home: out of inodes
Jun 26 01:45:07 dax kernel: pid 57353 (httpd), uid 80 inumber 19642453 on /usr/home: out of inodes
Jun 26 01:45:17 dax kernel: pid 8034 (httpd), uid 80 inumber 19642453 on /usr/home: out of inodes
[...]
I'm assuming the inodes error is just the kernel becoming confused since the filesystem is still mounted, but the disk has disappeared.
In the case of the disk holding the swap partition (ad8) detaching, it's a little different type of error. Sometimes the box hangs after tons of g_vfs_done errors:
Code:
subdisk8: detached
ad8: detached
g_vfs_done():ad8s3d[READ(offset=27569928192, length=2048)]error = 6
swap_pager: I/O error - pagein failed; blkno 9694,size 4096, error 6
g_vfs_done():ad8s3d[READ(offset=27569930240, length=2048)]error = 6
vm_fault: pager read error, pid 685 (devd)
g_vfs_done():ad8s1a[READ(offset=423264256, length=32768)]error = 6
g_vfs_done():ad8s3d[READ(offset=27569932288, length=2048)]error = 6
vnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 685 (devd)
g_vfs_done():ad8s1a[READ(offset=98304, length=16384)]error = 6
g_vfs_done():ata4: FAILURE - already active DMA on this device
unknown: setting up DMA failed
ata4: FAILURE - already active DMA on this device
unknown: setting up DMA failed
ad8s1a[READ(offset=192741376, length=16384)]error = 6
g_vfs_done():ad8s3d[WRITE(offset=26008551424, length=16384)]error = 6
g_vfs_done():ad8s4d[WRITE(offset=118730604544, length=12288)]error = 6
g_vfs_done():ad8s3d[READ(offset=27569934336, length=2048)]error = 6
[...]
Or it will try to write a dump file, error out, then reboot:
Code:
g_vfs_done():ad8s1f[WRITE(offset=99211411456, length=16384)]error = 6
g_vfs_done():ad8s1f[WRITE(offset=99211673600, length=16384)]error = 6
g_vfs_done():ad8s1f[WRITE(offset=99211853824, length=16384)]error = 6
g_vfs_done():ad8s1f[WRITE(offset=100559978496, length=16384)]error = 6
g_vfs_done():ad8s1f[WRITE(offset=103449427968, length=16384)]error = 6
g_vfs_done():ad8s1f[WRITE(offset=103449493504, length=16384)]error = 6
/dev: got error 6 while accessing filesystem
panic: softdep_deallocate_dependencies: unrecovered I/O error
cpuid = 1
Uptime: 2d11h2m39s
Physical memory: 999 MB
Dumping 292 MB: 277 261 245 229 213 197 181 165 149 133 117 101 85 69 53 37 21 5Attempt to write outside dump device boundaries.
** DUMP FAILED (ERROR 6) **
Automatic reboot in 15 seconds - press a key on the console to abort
Rebooting...
The hardware in the box is fairly standard. Core 2 Duo w/Intel ICH and 2x WD SATA disks.
Recently, I was told to try and change the disks to SATA150 from SATA300 via jumpers on the disks, citing SATA chipset firmware incompatibilities. I tried this, and it didn't resolve the issue.
Here's a dmesg.boot from the box, showing hardware, etc.:
http://www.prolixium.com/share/txt/freebsd/dmesg.boot.20100628.txt
Troubleshooting this is difficult because this is a dedicated server at a hosting provider a few states away. I do have serial console access (that is logged), though, and this is actually the only way I was able to see the reasons for the panics and hangs, as no logs could be written to the disk(s) when they're detached.
I submitted a bug for this quite awhile back, and it hasn't been touched:
http://www.freebsd.org/cgi/query-pr.cgi?pr=129426
I suspect this is due to the lack of information I'm able to provide, and no way of reproducing the problem on demand.
This box performs a variety of tasks: web/DNS/jabber server, IPv6 router, IPv4/IPv6 firewall, VPN termination, etc.
Any pointers where I should look, next? Ideas?
Thanks in advance!
- Mark