[ZFS] zpool stalls when replace/clear/scrub is called

Hi, I've just started to play around with ZFS but rather quick got stuck with a strange system behaviour when I try to replace a disk in a ZFS mirror. My setup is an Atom board, two USB sticks mirrored with gmirror with FBSD 8.0 on and two SATA drives.

First I used two 400GB SATA drives but decided to switch to two 1.5TB instead. To do that I just plugged out one 400GB drive and replaced it with a 1.5TB drive and booted the system. Which resulted in that the mirror got degreded, the old drive marked as unavailable and the new one as online.

Code:
  pool: spool
 state: DEGRADED
 scrub: none requested
config:

	NAME           STATE     READ WRITE CKSUM
	spool          DEGRADED     0     0     0
	  mirror       DEGRADED     0     0     0
	    replacing  DEGRADED     0     0     0
	      ad4/old  UNAVAIL      0   233     0  cannot open
	      ad4      ONLINE       0     0     0
	    ad6        ONLINE       0     0     0

I've googled and tried all sorts of things here but with no success. What puzzles me is that when I try to run 'zpool replace/clear/scrub <pool>' the command stalls and seams to hang the "IO part" of the system. When I try to issue 'gmirror status' after that that stalls aswell. I have no idea what's going on, I really appreciate guidance in this matter I'm out of ideas.
 
This is a common problem when "replacing" working drives. If you don't tell ZFS to offline the working drive, it can get confused. The correct process for replacing a working drive with another larger drive is:
  1. zfs offline spool ad4
  2. shutdown -p now
  3. <replace drives>
  4. <boot>
  5. zfs replace spool ad4 ad4

(If the system supports hot-pluggable SATA, you can avoid the reboot.)

When the system ends up like this, you have to do some mucking around to get things working again.

Try detaching ad4 from the mirror (zfs detach spool ad4). If that works, then you can add the "new" ad4 to the mirror (zfs attach spool ad4).

If that doesn't work, you can try to booting to single-user mode, doing a "zfs export spool", then zero out the entire drive to remove all traces of ZFS ( dd if=/dev/zero of=/dev/ad4 bs=16M). Then import the pool, and try adding the drive to the pool.
 
phoenix said:
This is a common problem when "replacing" working drives. If you don't tell ZFS to offline the working drive, it can get confused.

My thought was more or less to test the scenario when a drive had failed and I had to replace it with a new one, but I realised that I don't stated that in the post, my bad.

If I do get a failed disk shouldn't I just be able to replace that one with a disk that is the same size or bigger without any hazel?

Anyway, last night I manage to "solve" the lock up. First I exported the pool and after I imported it again I was able to do a scrub and then the "old ghost disk" disappeared. Tonight I'll reinstall the whole system with the old 400GB disks and try to see if I can manage to do this without getting some kind of lock up.
 
Back
Top