Received this email from one of my servers:
Okay, so I have a bad block somewhere on disk0 of the 3ware array.
Ah, the bad block is at LBA 106612099! Need a bit more info:
Okay, so all sectors are sized 512 bytes. More info:
So the bad block is on /dev/da0s1g.
Reading off a post at http://forums.freenas.org/threads/fix-bad-blocks.3708/, it looks like I need to do a 'dd':
Do I need to take this drive offline- i.e. reboot into safe mode and boot up using a LiveCD and go into shell and perform a dd? I want to make sure I don't overwrite any data or lose it some other way! What's the best way to repair this particular sector? This is the first bad sector I've seen on this server and it's been running since 2007!
~Doug
Code:
This message was generated by the smartd daemon running on:
host name: pisces
DNS domain: example.com
The following warning/error was logged by the smartd daemon:
Device: /dev/twa0 [3ware_disk_00], ATA error count increased from 0 to 1
Device info:
WDC WD7500AYYS-01RCA0, S/N:WD-WCAPT0349997, WWN:5-0014ee-2ab2c4ece, FW:30.04G30, 750 GB
For details see host's SYSLOG.
You can also use the smartctl utility for further investigation.
No additional messages about this problem will be sent.
Okay, so I have a bad block somewhere on disk0 of the 3ware array.
Code:
root@pisces:/# smartctl -a /dev/twa0 -d 3ware,0
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.1-RELEASE-p10 i386] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital RE2 Serial ATA
Device Model: WDC WD7500AYYS-01RCA0
Serial Number: WD-WCAPT0349997
LU WWN Device Id: 5 0014ee 2ab2c4ece
Firmware Version: 30.04G30
User Capacity: 750,156,374,016 bytes [750 GB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA/ATAPI-7 (minor revision not indicated)
Local Time is: Tue Jan 28 12:48:50 2014 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x05) Offline data collection activity
was aborted by an interrupting command from host.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (15960) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 198) minutes.
Conveyance self-test routine
recommended polling time: ( 6) minutes.
SCT capabilities: (0x303f) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0003 207 184 021 Pre-fail Always - 6633
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 49
5 Reallocated_Sector_Ct 0x0033 192 192 140 Pre-fail Always - 60
7 Seek_Error_Rate 0x000e 198 198 051 Old_age Always - 22
9 Power_On_Hours 0x0032 030 030 000 Old_age Always - 51115
10 Spin_Retry_Count 0x0012 100 253 051 Old_age Always - 0
11 Calibration_Retry_Count 0x0012 100 253 051 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 46
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 426
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 428
194 Temperature_Celsius 0x0022 114 101 000 Old_age Always - 38
196 Reallocated_Event_Count 0x0032 172 172 000 Old_age Always - 28
197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 1
198 Offline_Uncorrectable 0x0010 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 051 Old_age Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 50951 -
# 2 Extended offline Completed: read failure 90% 50939 106612099
# 3 Extended offline Completed without error 00% 50774 -
# 4 Extended offline Completed without error 00% 50606 -
# 5 Extended offline Completed without error 00% 50438 -
# 6 Extended offline Completed without error 00% 50270 -
# 7 Extended offline Completed without error 00% 50103 -
# 8 Extended offline Completed without error 00% 49935 -
# 9 Extended offline Completed without error 00% 49767 -
#10 Extended offline Completed without error 00% 49600 -
#11 Extended offline Completed without error 00% 49432 -
#12 Extended offline Completed without error 00% 49264 -
#13 Extended offline Completed without error 00% 49096 -
#14 Extended offline Completed without error 00% 48927 -
#15 Extended offline Completed without error 00% 48760 -
#16 Extended offline Completed without error 00% 48592 -
#17 Extended offline Completed without error 00% 48424 -
#18 Extended offline Completed without error 00% 48256 -
#19 Extended offline Completed without error 00% 48089 -
#20 Extended offline Completed without error 00% 47921 -
#21 Extended offline Completed without error 00% 47753 -
1 of 1 failed self-tests are outdated by newer successful extended offline self-test # 1
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
root@pisces:/#
Ah, the bad block is at LBA 106612099! Need a bit more info:
Code:
root@pisces:/# diskinfo -v da0
da0
512 # sectorsize
2197949513216 # mediasize in bytes (2T)
4292870143 # mediasize in sectors
0 # stripesize
0 # stripeoffset
267218 # Cylinders according to firmware.
255 # Heads according to firmware.
63 # Sectors according to firmware.
T0349997CC0B5000EB8E # Disk ident.
root@pisces:/root#
Okay, so all sectors are sized 512 bytes. More info:
Code:
root@pisces:/# gpart show
=> 63 4292870080 da0 MBR (2T)
63 4292869329 1 freebsd [active] (2T)
4292869392 751 - free - (375k)
=> 0 4292869329 da0s1 BSD (2T)
0 1048576 1 freebsd-ufs (512M)
1048576 8310592 2 freebsd-swap (4G)
9359168 6250496 4 freebsd-ufs (3G)
15609664 1048576 5 freebsd-ufs (512M)
16658240 18628608 6 freebsd-ufs (8.9G)
35286848 4257582481 7 freebsd-ufs (2T)
=> 63 101595074 da1 MBR (48G)
63 101594997 1 freebsd [active] (48G)
101595060 77 - free - (38k)
=> 0 101594997 da1s1 BSD (48G)
0 101594997 4 freebsd-ufs (48G)
root@pisces:/# df
Filesystem 1K-blocks Used Avail Capacity Mounted on
/dev/da0s1a 507630 266638 200382 57% /
devfs 1 1 0 100% /dev
/dev/da0s1g 2061818022 1271149146 625723436 67% /backup
/dev/da1s1d 49199012 37754490 7508602 83% /home
/dev/da0s1e 507630 108 466912 0% /tmp
/dev/da0s1f 9018222 3753794 4542972 45% /usr
/dev/da0s1d 3024526 1947222 835342 70% /var
root@pisces:/#
So the bad block is on /dev/da0s1g.
Reading off a post at http://forums.freenas.org/threads/fix-bad-blocks.3708/, it looks like I need to do a 'dd':
Code:
root@pisces:/# dd bs=512 seek=106612099 if=/dev/zero of=/dev/twa0 -d 3ware,0 count=1
dd: unknown operand -d
root@pisces:/# dd bs=512 seek=106612099 if=/dev/zero of=/dev/3ware,0 count=1
dd: /dev/3ware,0: Operation not supported
root@pisces:/# dd bs=512 seek=106612099 if=/dev/zero of=/dev/da0 count=1
dd: /dev/da0: Operation not permitted
root@pisces:/# dd bs=512 seek=106612099 if=/dev/zero of=/dev/da0s7 count=1
dd: /dev/da0s7: Operation not supported
root@pisces:/# dd bs=512 seek=106612099 if=/dev/zero of=/dev/da0s1g count=1
dd: /dev/da0s1g: Operation not permitted
root@pisces:/#
Code:
root@pisces:/# uname -a
FreeBSD pisces.dawnsign.com 9.1-RELEASE-p10 FreeBSD 9.1-RELEASE-p10 #0: Sun Jan 12 10:32:09 UTC 2014 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC i386
root@pisces:/#
Do I need to take this drive offline- i.e. reboot into safe mode and boot up using a LiveCD and go into shell and perform a dd? I want to make sure I don't overwrite any data or lose it some other way! What's the best way to repair this particular sector? This is the first bad sector I've seen on this server and it's been running since 2007!
~Doug