Help with bad disk?

  • Thread starter Deleted member 2077
  • Start date
D

Deleted member 2077

Guest
It's FreeBSD 9.0, with a UFS file system. It won't mount, so I ran fsck on it. Says it can't read some sectors so fsck fails. Is there any way to mark those sectors as bad and continue on?

Code:
[root]# fsck -y /u1
** /dev/ada1p1
** Last Mounted on /u1
** Phase 1 - Check Blocks and Sizes

CANNOT READ BLK: -407822336
UNEXPECTED SOFT UPDATE INCONSISTENCY

CONTINUE? yes

THE FOLLOWING DISK SECTORS COULD NOT BE READ: 3887145002, 3887145003,
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
853315 files, 405221026 used, 75482220 free (26956 frags, 9431908 blocks, 0.0% fragmentation)

***** FILE SYSTEM STILL DIRTY *****

***** PLEASE RERUN FSCK *****
 
Write data to them using dd(1). It should force the firmware to mark them as bad and allocate new ones from spare blocks (if it hasn't already run out of them).

Install sysutils/smartmontools (or run it from a live disk) to check on the status of the disk and know the progress of its deterioration (how many "corrected" bad blocks, how many spare left, how many uncorrectable, etc.).

But ultimately (and probably soon) you'll have to replace the disk. Make sure you have proper backups. Good luck!
 
There's already been a hardware failure, those sectors went bad.

So dd(1) yes, but use it to back up that entire drive to another drive or a file. Don't write to that drive until there's a backup.
 
Thanks guys, I'm using this to copy:
[cmd=]dd if=/dev/twed0 of=/dev/twed1 bs=4096 noconv=noerror,sync[/cmd]

both are 2TB disks of the same make/model, etc.

iostat says it's going at 10MB/s, if my math is right this is going to take 56 hours? Ouch.

Is there any way to monitor dd to see its progress or time left?
 
feralape said:
Thanks guys, I'm using this to copy:
Code:
dd if=/dev/twed0 of=/dev/twed1 bs=4096 noconv=noerror,sync

A bs of 64k or 128k will speed up the copy. More will usually not make it any faster.
 
Beastie said:

Neat trick!

I didn't see this in the man page, were is this and other [?] tricks documented.
 
wblock@ said:
A bs of 64k or 128k will speed up the copy. More will usually not make it any faster.

I could be mistaken, but the guide I read said that you want to use whatever is the native disk size. These are seagate drives, internally they use 4096, but show 512 to BIOS.

I could restart it, but already 5 hours into it.
 
Think of it as "buffer size", not block size. 64k or 128k are even multiples of both 512 and 4096 anyway. It should work even if they weren't, FreeBSD dropped block devices long ago.

With a better buffer size, I estimate the copy should be complete in somewhere under nine hours, possibly half that.

If you want to point out the guide you're following, I'd like to look at it.
 
feralape said:
Neat trick!

I didn't see this in the man page, were is this and other [?] tricks documented.

If dd receives a SIGINFO (see the status argument for stty(1)) signal,
the current input and output block counts will be written to the standard
error output in the same format as the standard completion message.

Guess what control sequence corresponds to SIGINFO. :)
 
wblock@ said:
If you want to point out the guide you're following, I'd like to look
at it.

Just a random page from a google search. It's for linux, but assuming dd is probably the same between the two:

http://www.cgsecurity.org/wiki/Damaged_Hard_Disk#The_classic_method_using_.27dd.27

Specifically this:

Disks should be copied on sector boundaries. T The sector size of most hard drives is currently 512 bytes but the industry is starting to move (post 1999) to a 4KB (4096 byte) sector size. Check your disk specifications to find out.

It's quite possible I'm misunderstanding it, but read it as to mean that you should use native sector/block size?
 
Not quite, they are trying to say that copying using a buffer size that's an even multiple of the disk sector size is better. I'm pretty sure it doesn't matter on FreeBSD, and probably not on Linux, but I do still use even multiples myself. The examples below that statement show that they are using multiples of the block size, like 256K.
 
This finally finished. I copied over with dd using this command (from a FreeBSD 9.0 i386 install disk, single user mode).

[cmd=]dd if=/dev/twed0 of=/dev/twed1 bs=4096 noconv=noerror,sync[/cmd]

That finished and put back in server, again 9.0 i386.

Code:
[root]# fsck -y /dev/ada1
fsck: Could not determine filesystem type
[root]# fsck -y /dev/ada1p1
fsck: Could not determine filesystem type

[root]# mount /dev/ada1p1 /u1
mount: /dev/ada1p1 : Operation not permitted

Do I need to rebuild partition table or something? Shouldn't it be already there?
 
Yes, the partition table was copied.
# gpart show ada1
will show it.

Typically, the p1 partition of a GPT disk is a boot partition. But your original was not, so the copy won't be either. Try:
# fsck -y -t ufs /dev/ada1p1

This assumes you're using the new disk.
 
Code:
fsck_ffs -y ...
fsck_ufs -y ...
Do not know how likely in this case, but many times with that exact same error from fsck'ing a thumbdrive, one of those two commands works each time. In fact I default to the former for all external disks/thumbdrives since then, even putting the command on a label attached to the thumbdrive...
 
wblock@ said:
Yes, the partition table was copied.
# gpart show ada1
will show it.

Typically, the p1 partition of a GPT disk is a boot partition. But your original was not, so the copy won't be either. Try:
# fsck -y -t ufs /dev/ada1p1

This assumes you're using the new disk.

The disk already has data on it (freebsd FreeBSD UFS, from 6.3). Did I need to format the disk first?

Code:
[root]# dmesg | grep ada1
ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
ada1: <ST32000542AS CC34> ATA-8 SATA 2.x device
ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)

[root]# ls -l /dev/ada1*
crw-r-----  1 root  operator    0,  74 Jun  4 12:23 /dev/ada1
crw-r-----  1 root  operator    0, 120 Jun  4 12:31 /dev/ada1p1


[root]# gpart show ada1
=>        34  3907029101  ada1  GPT  (1.8T)
          34        2014        - free -  (1M)
        2048  3907027087     1  freebsd-ufs  (1.8T)


[root]# fsck -y -t ufs /dev/ada1p1
** /dev/ada1p1
BAD SUPER BLOCK: VALUES IN SUPER BLOCK DISAGREE WITH THOSE IN FIRST ALTERNATE
ioctl (GCINFO): Inappropriate ioctl for device
fsck_ufs: /dev/ada1p1: can't read disk label

The gpart looks correct, the "bad" disk I had offset at 1M. Not understanding why the disk label is missing?
 
The idea was to copy the bad disk to a new disk, then put the bad disk away as a backup and work on the new disk. Putting anything on the new disk first won't matter, dd(1) will just overwrite everything on it.

fsck(8) is having trouble because blocks that it needs are missing. Now that you have a copy, there's no risk in trying things like sysutils/ffs2recov. If it fails, you can run the dd(1) copy again (quicker this time) and try something else.

This is also a good time to start thinking about backup.
 
Back
Top