My FreeBSD-9.1-RELEASE system panic'd the other night during my nightly rsync backup job. Here is the pertinent info:
The system restarted okay, and after fsck repaired the inconsistencies on the / filesystem, the system completed the reboot normally. Since the panic was caused by a mangled directory on the external USB drive, I ran fsck on it to clean up any problems. This completed normally. I then re-ran my nightly rsync backup job, and this too completed normally. I have had no problems since. I have run fsck on the drive filesystem each day since the panic and the filesystem has checked out clean each time.
I ran various smartctl commands against the external USB drive and although the overall health of the drive was reported as okay, I did see one item in the drive attributes that concerned me:
Based on my research the seek error rate can be caused by servo or head issues, or by high temperature problems. It is also approaching its threshold value. Should I be thinking of replacing the drive based on this data?
Code:
panic: ufs_dirbad: /mnt/backup: bad dir ino 5060196 at offset 0: mangled entry
cpuid = 1
KDB: stack backtrace:
#0 0xffffffff809208a6 at kdb_backtrace+0x66
#1 0xffffffff808ea8be at panic+0x1ce
#2 0xffffffff80b2774f at ufs_dirbad+0x4f
#3 0xffffffff80b28e49 at ufs_lookup_ino+0x6a9
#4 0xffffffff8096ceb8 at vfs_cache_lookup+0xf8
#5 0xffffffff80c68880 at VOP_LOOKUP_APV+0x40
#6 0xffffffff80974554 at lookup+0x464
#7 0xffffffff80975669 at namei+0x4e9
#8 0xffffffff80986993 at kern_statat_vnhook+0xb3
#9 0xffffffff80986b55 at kern_statat+0x15
#10 0xffffffff80986c1a at sys_lstat+0x2a
#11 0xffffffff80bd7ae6 at amd64_syscall+0x546
#12 0xffffffff80bc3447 at Xfast_syscall+0xf7
Uptime: 14h4m38s
The system restarted okay, and after fsck repaired the inconsistencies on the / filesystem, the system completed the reboot normally. Since the panic was caused by a mangled directory on the external USB drive, I ran fsck on it to clean up any problems. This completed normally. I then re-ran my nightly rsync backup job, and this too completed normally. I have had no problems since. I have run fsck on the drive filesystem each day since the panic and the filesystem has checked out clean each time.
I ran various smartctl commands against the external USB drive and although the overall health of the drive was reported as okay, I did see one item in the drive attributes that concerned me:
Code:
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-- 100 253 006 - 0
3 Spin_Up_Time PO---- 098 098 000 - 0
4 Start_Stop_Count -O--CK 092 092 020 - 8889
5 Reallocated_Sector_Ct PO--CK 100 100 036 - 0
[color="Red"] 7 Seek_Error_Rate POSR-- 066 060 030 - 4383733[/color]
9 Power_On_Hours -O--CK 100 100 000 - 848
10 Spin_Retry_Count PO--C- 100 100 034 - 0
12 Power_Cycle_Count -O--CK 100 100 020 - 435
187 Reported_Uncorrect -O--CK 100 100 000 - 0
189 High_Fly_Writes -O-RCK 100 100 000 - 0
190 Airflow_Temperature_Cel -O---K 067 041 045 Past 33 (Min/Max 33/33)
192 Power-Off_Retract_Count -O--CK 100 100 000 - 315
193 Load_Cycle_Count -O--CK 092 092 000 - 17455
194 Temperature_Celsius -O---K 033 059 000 - 33 (0 18 0 0 0)
195 Hardware_ECC_Recovered -O-RC- 081 057 000 - 11118736
197 Current_Pending_Sector -O--C- 100 100 000 - 0
198 Offline_Uncorrectable ----C- 100 100 000 - 0
199 UDMA_CRC_Error_Count -OSRCK 200 200 000 - 0
200 Multi_Zone_Error_Rate ------ 100 253 000 - 0
202 Data_Address_Mark_Errs -O--CK 100 253 000 - 0