ZFS Non-existent disk error message

I am running ZFS on FreeBSD 8.2-RELEASE-p3

I have started to get these messages in /var/log/messages:
Code:
Jan 12 04:04:05 bsa kernel: ad6: FAILURE - SMART timed out LBA=12734214
Jan 12 04:34:05 bsa kernel: ad6: FAILURE - SMART timed out LBA=12734214
Jan 13 02:34:05 bsa kernel: ad6: FAILURE - SMART timed out LBA=12734214
I have had messages like this before and replacing the relevant drive has cured the problem. However drive ad6 does not exist!

The output of zpool status is: (I am currently doing a "scrub" operation on pool tank)
Code:
[root@bsa /var/log]# zpool status
  pool: tank
 state: ONLINE
 scrub: scrub in progress for 0h31m, 13.06% done, 3h31m to go
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            ad8     ONLINE       0     0     0
            ad10    ONLINE       0     0     0
            ad14    ONLINE       0     0     0
            ad16    ONLINE       0     0     0

errors: No known data errors

  pool: zroot
 state: ONLINE
 scrub: scrub completed after 0h3m with 0 errors on Tue Nov 25 11:23:45 2014
config:

        NAME           STATE     READ WRITE CKSUM
        zroot          ONLINE       0     0     0
          mirror       ONLINE       0     0     0
            gpt/disk0  ONLINE       0     0     0
            gpt/disk1  ONLINE       0     0     0

errors: No known data errors
[root@bsa /var/log]#
Does anybody have any idea what is happening here ? According to zpool(8) device ad6 does not exist but the kernel reports errors on that device.....weird and worrying....

Any help appreciated

Jerry
 
I suspect gpt/disk0 and gpt/disk1 are GPT labels on top of disks ad4 & ad6.

What happens if you run the following?
Code:
gpart show ad6
(Edit: It's 6 you have the problem with not 4.)
 
Code:
[root@bsa /var/log]# gpart show ad4
gpart: No such geom: ad4.

[root@bsa /var/log]# gpart show ad6
=>      34  78165293  ad6  GPT  (37G)
        34       128    1  freebsd-boot  (64K)
       162   8388608    2  freebsd-swap  (4.0G)
   8388770  69776557    3  freebsd-zfs  (33G)

[root@bsa /var/log]# gpart show ad12
=>      34  78165293  ad12  GPT  (37G)
        34       128     1  freebsd-boot  (64K)
       162   8388608     2  freebsd-swap  (4.0G)
   8388770  69776557     3  freebsd-zfs  (33G)
Umm......the only devices that have a GEOM are ad6 and ad12 which must be disks gpt/disk0 and gpt/disk1.

I should try a scrub on the zroot pool then?

So - any idea how I tell which drive is ad6 and which is ad12? They are SSDs mounted on a blank expansion slot card because the machine only has 4 drive bays all of which contain 2 TB SATA drives.

Why do ad8, ad10, ad14 and ad16 not show a GEOM with gpart(8)?

Cheers for the clue - any more help appreciated
 
Point taken about the BSD version and lack of support. My attitude is usually if it aint broke dont fix it. But I guess its broken now....
I'm a bit cautious about upgrading. The machine was originally configured by a chap who has since left the company and the documentation is very patchy. It acts as much more than just a file server and upgrading will almost certainly require purchase of a new machine to replace this one as I can't risk not having this machine (or its replacement) out of service for more than a few hours.
 
Ah OK. You must have the second SATA port unconnected (or have a CD ROM in it).

So your root pool is a mirror on top of ad6 and ad12.
You can use
Code:
gpart show -v da6
to actually show the labels on top of the disk. This should confirm which one has the disk0 label and which one has the disk1 label.

Running a scrub sounds reasonable. If you keep getting timeouts on this disk, then you can try changing the cable but it's possible the disk is starting to have problems.

When you installed this system, you (or whoever set it up) must of partitioned the disks (which is needed for the root pool as you need a partition to install boot code to), and then they added the labels. They then used the labels when creating the pool.

The second pool has just been created using the raw disk devices, no partitions or labels. Nothing particularly wrong with that (although these days SATA disks should ideally be accessed using adaX devices in AHCI mode if supported by the hardware).
 
The ad6 disk is most likely broken. I recently replaced a 3 TB Seagate that had the same issue. It appeared to be working fine but it would hang the whole system and overall performance of the pool was terrible. Trying to read SMART data ( smartctl -a /dev/ad6) resulted in time-outs. Don't run a scrub as that could make things worse. Replace the disk and let it resilver.
 
Why do ad8, ad10, ad14 and ad16 not show a GEOM with gpart(8)?
The whole disk has been given to ZFS. This is the best thing to do. But because ZFS is handling the whole disk there simply aren't any partitions on it. No partitioning means gpart(8) can't show it. It obviously can't show something that doesn't exist ;)
 
usdmatt - thanks very much for that.

On this version its actually gpart show -l ad6 (or ad12).
It turns out ad6 is disk0 and ad12 is disk1
How do I tell which physical disk is ad6(disk0) ? I want to identify which physical disk to replace.

To replace the disk is it just a matter of doing:

zpool offline zroot gpt/disk0
then replacing the physical disk
then doing
zpool replace zroot gpt/disk0

or do I use ad6 instead of gpt/disk0 in both commands ?

Jerry
 
I actually thought it was -v, tested it and confirmed it should be -l, then put -v in the forum post...

Identifying the physical disk can sometimes be more difficult that is should be. ad6 should be the second SATA port. (It's SATA port 1 that doesn't have a disk in it, I had it the wrong way round in an earlier post).

You can also run
Code:
diskinfo -v ad6
which may print a serial number which you can match to the disk label.

To replace the disk, while keeping the labels, you'll need to do the following:

Code:
# zpool offline zroot gpt/disk0
.. replace physical disk ..
.. recreate the partition table and partitions ..
# gpart create -s gpt ad6
# gpart add -s 64k -l boot0 -t freebsd-boot ad6
# gpart add -s 4G -l swap0 -t freebsd-swap ad6
# gpart add -l disk0 -t freebsd-zfs ad6
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ad6
.. put the zfs partition back in the pool ..
# zpool replace zroot gpt/disk0

Disclaimer: It's possible there could be errors in the commands above, all typed from memory. Also, I labelled the boot & swap partitions boot0 & swap0 respectively. Not so important with the boot partition, but if the swap partition had a label on the old disk (as seen in the gpart show -l output), you'll want to make sure you use the same label. That label could be specified in /etc/fstab in order to mount the swap.

This also of course relies on the original installer making sure they put boot code on both ad6 & ad12, and not just the one disk. Hopefully they did, and with ad6 being a new blank disk, your system should scan all disks and manage to find the boot code on ad12. Ideally the installer should of tested booting the machine with ad6 disconnected before going live, to make sure it'll actually boot with the first half of the mirror missing.
 
You need to partition ad6 the same way as ad12. ZFS would like to have the full disk but these disks are used to boot from. It therefor needs a GPT freebsd-boot partition and can't be completely assigned to ZFS.
 
usdmatt - thanks enormously for that help - its saved me potentially hours of heartache and documentation scouring...

But - I'm a little puzzled. Does this mean that partitions boot0 and swap0 are not mirrored ? It seems that ZFS is only mirroring gpt/disk0 - what mechanism mirrors the other 2 partitions or are they not mirrored ?

diskinfo does indeed return a string it called Disk indent. which I'm assuming (hoping) is printed on the disk somewhere....the string is CVGB00660198040GGN which feels like serial number

Jerry
 
You do not really need to mirror the boot partitions. Only to make sure that bootcode is installed on both, and if bootcode is ever upgraded, it is upgraded on both. Pretty much every FreeBSD Root-on-ZFS guide/script I have seen just installs bootcode to all disks manually and leaves it at that.

It's not uncommon for multiple swap partitions, like the two you have to just be mounted separately, giving 8GB in total. It's also possible that it was mirrored another way, using something like gmirror(). In some ways a simple gmirror makes more sense for swap than using ZFS. The output of
Code:
swapinfo
would should how the swap has been configured.
 
usdmatt - you are dead right - a quick check of /etc/fstab shows they are mounted seperately:

#Device Mountpoint FSType Options Dump Pass
/dev/gpt/swap0 none swap sw 0 0
/dev/gpt/swap1 none swap sw 0 0

linproc /compat/linux/proc linprocfs rw 0 0


One last question. The boot disks are SSDs which are now quite old and I may not be able to purchase the same drive. Can I replace the old one with a different SSD that is the same size or bigger than the old one or do they both need to be identical ?
 
Can I replace the old one with a different SSD that is the same size or bigger than the old one or do they both need to be identical ?
That shouldn't be a problem. As long as the new disk isn't smaller than the other disk. You do want to keep an eye on the transfer rates. Not all SSDs have the same speeds and you really want another disk that has comparable speeds.
 
The old disks appear to be ~40 GB. 64k for boot, 4 GB for swap and ~33 GB left over for ZFS.
As long as the new ZFS partition ends up bigger than 33 GB you'll have no problems.

In the commands I put above, you'll notice there's no size specified for the last partition, meaning that the freebsd-zfs partition will just use whatever space is left on the disk.
 
Back
Top