ZFS I/O error after replacing disk

daveted · Jul 6, 2010

Hello,

I have a ZFS storage on a bay composed of 16 disks.
One of my disks (da13) has crashed it was seen as UNAVAILABLE (in zpool status)
So before actions my zpool status output was like that:

Code:

vino# zpool status 
  pool: syspool
 state: ONLINE
 scrub: scrub completed with 0 errors on Thu Jul  1 05:41:39 2010
config:

        NAME        STATE     READ WRITE CKSUM
        syspool     ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            aacd1   ONLINE       0     0     0
            aacd2   ONLINE       0     0     0
        spares
          da15      AVAIL   

errors: No known data errors

  pool: vol01
 state: ONLINE
 scrub: resilver completed with 0 errors on Tue Jul  6 10:37:04 2010
config:

        NAME        STATE     READ WRITE CKSUM
        vol01       ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            da0     ONLINE       0     0     0
            da1     ONLINE       0     0     0
            da2     ONLINE       0     0     0
            da3     ONLINE       0     0     0
            da4     ONLINE       0     0     0
            da5     ONLINE       0     0     0
            da6     ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            da7     ONLINE       0     0     0
            da8     ONLINE       0     0     0
            da9     ONLINE       0     0     0
            da10    ONLINE       0     0     0
            da11    ONLINE       0     0     0
            da12    ONLINE       0     0     0
            da13    UNAVAILABLE
          spare
            da14    AVAILABLE       0     0     0

errors: No known data errors

So i manually do a :

zpool replace vol01 da14 da13

At this time the system instead of removing da13 and using the da14 spare disk added a "sub spare" to the raidz2 mirror containg da13 (unavailable) and da14...

After that i cam back to my office and removed the crashed disk:
So the "fake" spare disappear and now i have:

Code:

 zpool status 
  pool: syspool
 state: ONLINE
 scrub: scrub completed with 0 errors on Thu Jul  1 05:41:39 2010
config:

        NAME        STATE     READ WRITE CKSUM
        syspool     ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            aacd1   ONLINE       0     0     0
            aacd2   ONLINE       0     0     0
        spares
          da15      AVAIL   

errors: No known data errors

  pool: vol01
 state: ONLINE
 scrub: resilver completed with 0 errors on Tue Jul  6 10:37:04 2010
config:

        NAME        STATE     READ WRITE CKSUM
        vol01       ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            da0     ONLINE       0     0     0
            da1     ONLINE       0     0     0
            da2     ONLINE       0     0     0
            da3     ONLINE       0     0     0
            da4     ONLINE       0     0     0
            da5     ONLINE       0     0     0
            da6     ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            da7     ONLINE       0     0     0
            da8     ONLINE       0     0     0
            da9     ONLINE       0     0     0
            da10    ONLINE       0     0     0
            da11    ONLINE       0     0     0
            da12    ONLINE       0     0     0
            da14    ONLINE       0     0     0

errors: No known data errors

Coool

But the problem begin

Now i have a new hard drive for da13, so i pushed it in the bay and i want da13 to be part of the second raidz2 mirror and da14 used as a separate spare (like at the origin)

But any command i type containg "da13" give me the same result: (for sample)

Code:

vino# zpool replace vol01 da14 da13
cannot replace da14 with da13: I/O error
vino# zpool add vol01 spare da13
cannot add to 'vol01': I/O error

And in kernel messages: (many messages like this one)

Code:

(da13:mpt0:0:2:13): Retrying Command (per Sense Data)
(da13:mpt0:0:2:13): READ(6). CDB: 8 0 0 0 1 0 
(da13:mpt0:0:2:13): CAM Status: SCSI Status Error
(da13:mpt0:0:2:13): SCSI Status: Check Condition
(da13:mpt0:0:2:13): HARDWARE FAILURE asc:0,0
(da13:mpt0:0:2:13): No additional sense information
(da13:mpt0:0:2:13): Retries Exhausted

What should i do to resolve the problem ??? and reenable my da13 in my pool as a spare or other but reenabling it

Thank you for your reading and your help.

Bets regards,

opodgorski · Jul 6, 2010

Yo BRo !
can you do that :
make a slice on your hard drive, make a new fs (ufs)
dd if=your device (/dev/da13) of=/dev/null count=1000
check the kernel logs,
and then if you have I/O errors you can check the backplane of your server.

Thank you.

phoenix · Jul 6, 2010

Sounds like a bad drive.

Try the following, and watch the logs/console for errors:

Code:

# dd if=/dev/da13 of=/dev/null bs=1M

I'm betting there will be a lot of them.

daveted · Jul 6, 2010

hi bro!

Here is the output of my dd command:

Code:

vino# dd if=/dev/da13 of=/dev/null count=1000
dd: /dev/da13: Input/output error
0+0 records in
0+0 records out
0 bytes transferred in 0.011337 secs (0 bytes/sec)
vino#

phoenix · Jul 6, 2010

Yeah, you have either a dead disk, a dead cable, or a dead port on the controller. Most likely the disk. Move it to another system and see if you get the same error.

ZFS I/O error after replacing disk

daveted

opodgorski

phoenix

daveted

phoenix