Solved zpool confusion. Where is my partition?

thedude22 · Dec 4, 2017

I started a zpool with three filesystems (in /dev/) ada0p4, ada1p1, ada2p1.

ZFS is very new to me so I probably made mistakes. I first made these partitions on the corresponding devices with a filesystem type freebsd-zfs (Is this step necessary?). Then I ran zpool create storage /dev/ada0p4 /dev/adad1p1 /dev/ada2p1. I'm not sure of the differences between partitions, slices, or even filesystems for that matter, but everything seemed to be working for the most part.

I think one of my disks is malfunctioning so I bought a 10tb hdd to copy everything over and then look for the faulty drive. I put this in the PC and connected it. I just now noticed that it seems one of my zpool partitions appears to not exist anymore.

In /dev, both ada0p4 and ada1p1 are present. However ada2p1 is not. I now have ada2 and ada3 and I'm not sure which one is the new hdd.

If I run gpart show ada2 I get gpart: No such geom: ada2 and similarly for ada3.

However the zpool appears to be fine! I haven't tried accessing any of the files yet but if I run zpool status and I get:

Code:

  pool: storage
 state: ONLINE
  scan: scrub repaired 0 in 7h2m with 0 errors on Wed Nov 22 07:04:03 2017
config:

    NAME                            STATE     READ WRITE CKSUM
    storage                         ONLINE       0     0     0
      ada0p4                        ONLINE       0     0     0
      ada1p1                        ONLINE       0     0     0
      diskid/DISK-S1XWJX0B300492p1  ONLINE       0     0     0

errors: No known data errors

So what is going on? Where is my 3rd partition? I see that diskid/DISK-S1XWJX0B300492p1 is in /dev, but why does gpart show nothing?

phoenix · Dec 4, 2017

First, you have a non-redundant pool (basically a RAID0 stripe across 3 drives), meaning replacing a drive will be difficult, and if any drive dies completely, your whole pool is gone. Replacing a drive before it's dead is doable, but requires multiple steps done in precisely the right order, with lots of ways to kill the entire pool.

Second, edit your /boot/loader.conf and add the following lines:

Code:

kern.geom.label.disk_ident.enable="0"           # Disable the auto-generated Disk IDs  for disks
kern.geom.label.gptid.enable="0"        # Disable the auto-generated GPT UUIDs for disks
kern.geom.label.ufsid.enable="0"        # Disable the auto-generated UFS UUIDs for filesystems

That will prevent the kernel from picking the /dev/diskid /dev/ufsid or /dev/gptid paths to access disks. Instead, it'll use the regular /dev/ada* /dev/da* or similar direct device nodes (or /dev/gpt if you use proper GPT labels for your partitions).

Reboot and your pool will be using the proper device names in the listing.

You're seeing the "geom doesn't exist" messages because you are trying to access a GEOM provider that is "hidden". Only a single GEOM provider can be active at a time. There are multiple providers made available during the boot process (/dev/ada2 /dev/diskid/DISK-S1XWJX0B300492 /dev/gpt/some-label-if-you-created-one and so on). Once the kernel picks one to use, the rest are hidden from view. If you try gpart show /dev/diskid/DISK-S1XWJX0B300492 you will see the output you're expecting for ada2.

You're seeing "geom doesn't exist" for ada3 because it doesn't have any partition scheme enabled on the drive yet. You need to create one first.

Third, you really should consider just destroying the pool and starting over with some level of redundancy to the pool. With 3 drives you could run with a raidz1 vdev that would give you two drives worth of storage, but allow you to survive the death of a drive without losing the entire pool: zpool create storage raidz1 ada0p4 ada1p1 ada2p1 If you can fit a fourth drive in the system, then you could run with multiple mirror vdevs for better performance: zpool create storage mirror ada0p4 ada1p1 mirror ada2p1 ada3p1 . That way, you could lose 1 drive from each mirror without losing the entire pool. Alternatively, you could run with a raidz2 vdev allowing you to lose any two drives without losing the pool, but you get worse performance than the multi-mirror setup: zpool create storage raidz2 ada0p4 ada1p1 ada2p2 ada3p1

Now would be a good time to read through the Handbook sections on ZFS, the zpool(8) man page, and other online ZFS resources that go over what a pool is, what vdevs are, how the redundancy works, etc. And be prepared to destroy and recreate your pool a few times before everything gels and you get the hang of things (I've had to destroy 24-drive pools due to horrible misconfiguration like doing 24-drive raidz2, 12-drive raidz2, and using dedupe with only 24 GB of RAM).

thedude22 · Dec 4, 2017

Yes I recently learned about this little problem with my pool. Alas my hdds are all different sizes and I can't afford more hdds at the moment, so my immediate plan is to copy the zpool data over to the new 10tb drive, delete the zpool, and use single partitions for each disk (except for the drive which has a separate root partition), and mount all of them at a single mount point with fuse. This way a disk failure only results in loss of the data on that particular disk, which is the least bad option for me now. It's not the end of the world if I lose a little bit. The future problem will be migrating what's left of the data to a raidz at some later point when I can afford to buy a bunch of hard drives (if such a time ever comes). I'm not sure how I'll do it since my motherboard is all out of sata ports, but that's another problem for another time.

I also just learned about glabel. I am currently trying to make labels for all my partitions so I can set up my fstab after I get this data/repartitioning business taken care of, but it won't let me put a label on my root partition. Maybe because it's mounted? [Edit] Found my own answer. You have to use tunefs for UFS filesystems.

thedude22 · Dec 4, 2017

Well I put the 3 lines into my /boot/loader.conf and reboot but zpool status is still the same. I think ada3 is probably the one that is part of the zpool and was previously ada2. I'm assuming that's why it stopped using the normal device name, because it had changed after installing the new hdd. But of course I want to verify this before I start rewriting partition tables.

thedude22 · Dec 4, 2017

Well for some reason it seems like my loader.conf file is not working.

Code:

kldload snd_driver
linux_load="YES"
nvidia_load="YES"
kern.geom.label.disk_ident.enable="0"           # Disable the auto-generated Disk IDs  for disks
kern.geom.label.gptid.enable="0"        # Disable the auto-generated GPT UUIDs for disks
kern.geom.label.ufsid.enable="0"        # Disable the auto-generated UFS UUIDs for filesystems

I am using the nvidia driver, so that seems to be working, but I can still see files in /dev/gptid and /dev/diskid.

Is there another way to verify the correspondence between the diskid and the device node?

thedude22 · Dec 4, 2017

I deleted the first three lines. There must have been an error there. I guess they were doing nothing anyway since the nvidia driver still seems to work, or at least I have video.

I can now see the device node name. Thanks for the help.

phoenix · Dec 4, 2017

thedude22 said:
I deleted the first three lines. There must have been an error there. I guess they were doing nothing anyway since the nvidia driver still seems to work, or at least I have video.

I can now see the device node name. Thanks for the help.

Your first line was wrong. You don't run programs (kldload) in the config file.

You just tell it to load modules: snd_driver_load="YES"

thedude22 · Dec 4, 2017

My fault. I misunderstood the handbook.

https://www.freebsd.org/doc/handbook/sound-setup.html

Oops. It seems like it doesn't do anything anyway. I still have the nvidia driver loaded and I still have audio without those lines.

thedude22 · Dec 5, 2017

Is there a way to change the label of my root partition (mounted at /) ?
I'm still using /dev/ada0p3 in my fstab and I want something that will never change if I change the drives again.

I tried tunefs -L rootfs /dev/ada0p3 and it says it can't write the superblock. I assume it's because it's mounted, but I can't exactly unmount it. It still gets mounted in single user mode, right?

thedude22 · Dec 5, 2017

So I started rsyncing my files from the zpool and I see my new hdd has a lot of unusable space:

Code:

Filesystem                         Size    Used   Avail Capacity  Mounted on
/dev/ufs/WDPurple10TB              8.8T    1.8T    6.3T      22%  /mnt/WDPurple10TB

8.8 - 1.8 shoud leave 7T available. But I only have 6.3T available. So why is there 700G of unusable space?
Is this normal? That seems huge.

When I run

 

ᐅ  gpart show /dev/ada2

I get

Code:

=>         40  19532873648  ada2  GPT  (9.1T)
           40  19532873640     1  freebsd-ufs  (9.1T)
  19532873680            8        - free -  (4.0K)

Here it's 9.1T not 8.8, so I'm missing another 300G for 1TB of total lost space.

Did I get a faulty drive? I did buy a refurb because it was cheaper. I hope it's just rsync holding some space hostage and not marking it as used because the output of df -h makes no sense to me. On my zpool the numbers add up correctly.

phoenix · Dec 5, 2017

Base10 vs Base2 math. Disk makers list the sizes using Base10 math (1,000,000 bytes in a Megabyte). OSes list sizes in Base2 math (1,048,576 bytes in a Megabyte). Scale that up to gigabytes and terabytes and you'll see why you're "missing" 700 GB.

IOW, you're not missing anything, it's just different ways of doing math. Blame the drive makers.

thedude22 · Dec 5, 2017

I understand the difference between 10**3 and 2**10, and that accounts for the difference between the 9.1TB shown by gpart show ada2 and the advertised 10TB, 10**12/2**40 = 0.909 and 10TB * 0.909 ~ 9.1TB.

This does not explain the 1TB of extra missing space shown by df -h. Does software keep a list of bad sectors? Is it possible I could have ~11% bad sectors? If that's true it sounds like I should return the drive. 300G is a lot but I could live with that. 1TB is ridiculous.

My hope is that since rsync is still running then df is incorrect. I'll check when it's done.

driesm · Dec 5, 2017

UFS file system creation defaults to "hide" 8% of usable space.
Look at the option "-m minfree" in tunefs(8).
tunefs -p /dev/ada2p1
Will show you the current "minimum percentage of free space".
The size shown as available to the user with df will be the size of the fileystem - 8%.

thedude22 · Dec 5, 2017

Wow. I didn't know that. I definitely want to disable that as much as possible. That explains the missing 700G. Once the drive is full files will not tend to change much so if it's only a matter of write performance I'm not worried about that. But the manual is confusing. What is the proper way to set minfree to 0? Can I do tunefs -m 0 /dev/ufs/WDPurple10TB ? I'm not sure what the +o part means in the explanation of the -m flag.

I'm not into the computer science of how filesystems work. I guess if it's constantly optimizing fragmentation then sure it's good to have some free space. But 8% seems like a lot, especially on larger drives. But I don't see how it would affect performance. Is there another filesystem I should be using if I want to maximize available space? How low can I set this thing? I want all the space available. It's just storing media files.

driesm · Dec 5, 2017

thedude22 said:
Can I do tunefs -m 0 /dev/ufs/WDPurple10TB?

Yes, in single-user mode.

thedude22 said:
I'm not sure what the +o part means in the explanation of the -m flag.

It's just meant as attention points it has nothing to do with the parameter passed as option with -m.

thedude22 said:
I'm not into the computer science of how filesystems work.

Neither am I

thedude22 said:
But I don't see how it would affect performance.

https://en.wikipedia.org/wiki/Fragmentation_(computing)
https://en.wikipedia.org/wiki/File_system_fragmentation

thedude22 said:
Is there another filesystem I should be using if I want to maximize available space?

You could look into ZFS, its integration is neat in FreeBSD.
Nevertheless, ZFS will also exhibit fragmentation.

thedude22 · Dec 5, 2017

That worked a charm. Thanks. Hopefully I don't run into performance issues, but it's just a private media server. Should be fine.