panic and auto reboot

Crotalus · May 24, 2013

I think I know the answer to this, but I need some verification. When running tarto make a backup of some files from ad8s1 to ad3s1, the following occurred. The ad3 is a Seagate ST3500630A 500 GB ATA drive and the ad8 is a Seagate ST2000L003 2 TB drive and both are on motherboard connection, ATA slave and SATA 3. The 2 TB Seagate drive is a new drive, about 6 months old. The messages leads me to believe that the source drive is defective. Maybe? However, I can access the drive from a Windows machine using Samba without any problems. Searching the internet there are several comments about the error but no real fixes for hard drives. I don't know what the message for the destination drive indicates.

fsck runs fine now on the drive. I have done it a couple of times.

Code:

Prometheus# df -H
Filesystem                 Size    Used   Avail Capacity  Mounted on
/dev/ad4s2a                960G    8.6G    875G     1%    /
devfs                      1.0k    1.0k      0B   100%    /dev
//NOBODY@PHAEDRA/PUBLIC    193G     47G    145G    25%    /Phaedra_public
/dev/ad3s1                 492G    319G    133G    71%    /storage1
/dev/ad14s1                484G     94G    351G    21%    /storage3
/dev/ad8s1                 1.9T    120G    1.7T     7%    /storage4
/dev/ad10s1                968G    359G    532G    40%    /storage5
/dev/ad12s1                290G    165G    101G    62%    /storage2
Prometheus#

Prometheus.group1 kernel log messages:

Code:

+++ /tmp/security.3CHKEnTP      2013-05-23 03:03:24.000000000 -0600
+g_vfs_done():ad8s1[READ(offset=4611686757124882432, length=16384)]error = 5
+g_vfs_done():ad8s1[READ(offset=4611686757124882432, length=16384)]error = 5
+dev = ad3s1, block = 20536560, fs = /storage1
+panic: ffs_blkfree: freeing free block
+cpuid = 0
+KDB: stack backtrace:
+#0 0xffffffff8063dcbe at kdb_backtrace+0x5e
+#1 0xffffffff8060aed7 at panic+0x187
+#2 0xffffffff80829438 at ffs_blkfree_cg+0x668
+#3 0xffffffff8082953f at ffs_blkfree+0xff
+#4 0xffffffff80830072 at ffs_indirtrunc+0x3c2
+#5 0xffffffff80830031 at ffs_indirtrunc+0x381
+#6 0xffffffff808311bb at ffs_truncate+0xc1b
+#7 0xffffffff8084e58c at ufs_inactive+0x20c
+#8 0xffffffff80698251 at vinactive+0x71
+#9 0xffffffff8069d898 at vputx+0x2d8
+#10 0xffffffff806a09b2 at kern_unlinkat+0x1a2
+#11 0xffffffff809000c4 at amd64_syscall+0x1f4
+Copyright (c) 1992-2012 The FreeBSD Project.
+Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
+       The Regents of the University of California. All rights reserved.
+FreeBSD is a registered trademark of The FreeBSD Foundation.
+FreeBSD 8.3-RELEASE #0: Mon Apr  9 21:23:18 UTC 2012
+    [email]root@mason.cse.buffalo.edu[/email]:/usr/obj/usr/src/sys/GENERIC amd64
+Timecounter "i8254" frequency 1193182 Hz quality 0
+CPU: AMD Athlon(tm) 64 Processor 3500+ (2210.20-MHz K8-class CPU)
+  Origin = "AuthenticAMD"  Id = 0xf7a  Family = f  Model = 7  Stepping = 10
+  Features=0x78bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2>
+  AMD Features=0xe0500800<SYSCALL,NX,MMX+,LM,3DNow!+,3DNow!>
+real memory  = 4294967296 (4096 MB)
+avail memory = 4060057600 (3871 MB)

cpm@ · May 25, 2013

FYI, read the possible causes of panic(9)() that you reported:
http://lists.freebsd.org/pipermail/freebsd-current/2011-October/028860.html.

Run # fsck -y twice in "single-user" mode to fix corrupt disk. See fsck(8)() man page for details.

Also check for errors using dd(1)() as follow:
% dd if=/dev/ad8s1 of=/dev/null bs=128k

If bad sectors are found during the dd you should see error messages spewing to the console and written in /var/log/messages.

Terri_Kennedy · May 25, 2013

Crotalus said:

Code:

+++ /tmp/security.3CHKEnTP      2013-05-23 03:03:24.000000000 -0600
+g_vfs_done():ad8s1[READ(offset=4611686757124882432, length=16384)]error = 5
+g_vfs_done():ad8s1[READ(offset=4611686757124882432, length=16384)]error = 5

Well, 5 is EIO, "Input/output error", so that would normally indicate a drive problem. But the offset being displayed is way out of bounds, so I suspect either memory corruption or a bug.

It isn't unreasonable to expect a filesystem-related panic when things are happening on the wrong places on the drive.

While I'm not one to say "test your memory with memtest86+ and check your power supply" as the first reaction to most cases, it is something I would suggest in this particular case.

fsck(8) does not check the entire partition for corruption - it only looks at some of the metadata, and if it looks OK, the partition gets a clean bill of health. On rare occasions, I've had to back up the contents of a corrupted partition, re-initialize it, and restore.

Crotalus · May 27, 2013

I did another fsckon the source drive and it was clean. Albert Einstein said "Insanity is doing the same same thing over and over again and expecting different results."

I did check out the memory and it appears to be fine. Also I did not find any bad sectors on the drive.

Here is the interesting thing, I ran the same job and it successfully ran to completion. It must be gremlins playing with the FreeBSD little devil. Having a potential flaky drive is not a warm fuzzy feeling.

Question - When you run fsckand it repairs the file system does it change the physical location on the drive where data is stored? I would think so, and as a result the tarjob could now be reading different sectors of the drive.

Time will tell if I have another problem.

Keith

kpa · May 27, 2013

Time to run smartctl(8) from sysutils/smartmontools in one of the self test modes on the drive.

panic and auto reboot

Crotalus

cpm@

Moderator

Terri_Kennedy

Crotalus

kpa