ZFS can't import zroot pool after failed freebsd-update (unresponsive tx->tx state)

Petr Fischer · Oct 6, 2018

I did freebsd-update from 11.1 to 11.2-RELEASE, but I did not have enough disk space and "freebsd-update install" stage failed (No disk space error) - so my system is screwed.

I booted from FreeBSD 11.1 install ISO (LiveCD) and I thought I would rollback the ZFS snapshots on the zroot pool and everything would be OK, but - no luck.

When I try to run (from FreeBSD LiveCD):

Code:

zpool import -f -R /tmp/z zroot

or:

Code:

zpool import -f -N zroot

the command never end, cpu utilization is 0% and zpool process state is "tx->tx" forever. I just can't import my zroot pool.

My partitions are OK, no problem, when I run:

Code:

zpool import

zroot pool is found. Thats OK.

When I run:

Code:

zdb -l /dev/ada0p3

also OK, I got info about zroot pool (5000 version, proper device paths, everything looks OK).

It's virtual server with 512MB RAM only - but so far it has run well.

So - is it really possible to cripple zpool by filling it completely? Can I heal the zroot pool?

bds · Oct 8, 2018

It sounds possible - though you typically encounter gradual performance degradation first. Try looking into the recovery options (-F, -X, -T) for zpool import.

ShelLuser · Oct 8, 2018

First off: you mention zdb which is a command normally not used for regular ZFS administration. So I have to wonder: did you use that on your pools to "optimize" or change stuff? Because that could explain something here.

Alas, try this: # zfs import -fNR /mnt -o readonly=on zroot, does that do anything for you? You won't be able to do much but at least you might be able to access your file systems and its data to see what could be wrong. For example I'd try zfs list -rt all zroot |less next to see where you're missing out on diskspace and if those snapshots are still usable (also how much space they consume).

You said you'd try to roll back your snapshot; which one and what command did you use? Did you free up space beforehand? I think I have a theory as to what caused your problems: snapshots don't take up much space, but as you keep changing things on your system then their disk usage will increase. So if you had several snapshots then the disk amount required could basically be doubled or tripled, depending on the setup (the previous list command should show as much).

I'd say your first priority should be to gain access to your data, then you can worry about salvaging the system. What kind of ZFS pool do you use anyway? Mirror, raidz or just one volume?

(edit)

PS: 512Mb is seriously stretching it. With that amount of memory it would be much safer to rely on UFS.

Petr Fischer · Oct 8, 2018

ShelLuser thanks. I am not doing any tricks with zdb. I just wanted to make sure that zdb will show the correct basic information about zroot and the output will not resemble broken partitions etc.
It's simple virtualized sata single zpool on /dev/ada0p3.
There is still 300MB RAM Free.

Next, I tried:

Code:

zpool import -fR /tmp/z -o readonly=on zroot

and in readonly mode, zpool was completely mounted to /tmp/z (when I running from FreeBSD Install ISO Live mode, I can't mount to /mnt).
And yes, I can list all snapshots with

Code:

zfs list -rt all

with no problem.
ZFS shows AVAIL 0 (completelly full).

The problem is, that I can't delete snapshots in read-only mode. Game over?

ShelLuser · Oct 8, 2018

I don't see why you can't mount stuff on /mnt to be honest. That is, I think I understand why you think you can't do that: you probably get an error about a read-only filesystem? That doesn't mean you can't mount stuff. This is the result of a - in my opinion - rather stupid default setting done by the installer.

Small sidestep: if you set up this system using the installer then your root filesystem (from mind: zroot/root/DEFAULT I think) isn't automatically mounted after you import the pool. That's because of the changed canmount property which is set to false in order to cater to sysutils/beadm. As a result the other filesystem(s) which can be automatically mounted are now being mounted directly onto /mnt, which doesn't have any mountpoints. As such the system tries to create those mountpoints which obviously fails. The solution is to simply trigger the error message, then mount your root filesystem manually using # zfs mount zroot/root/DEFAULT after which you can mount the rest of the filesystem(s) using # zfs mount -a. Or you can simply set canmount to true after which everything will behave normal again: # zfs set canmount=on zroot.

Alas, if all this works as normal then I'd try # zfs export zroot followed by another attempt at an import with recovery options: # zfs import -FnNR /mnt zroot.

This command doesn't do anything yet but will test to see if it would be feasible to import & fix the pool. If you don't get any weird error messages then remove the -n and let this run for real.

(edit): If this works then definitely expect data loss because it'll ignore / roll-back the last transactions. I hope that by doing so you'll end up with a responsive pool again. Then you can free up space and see where to go from there. I'd start by removing snapshots which you no longer need.

(edit2): Instead of trying to rollback a snapshot it might be more feasible to try using # freebsd-update rollback. Not sure if this will still work after all this mess, but I think it'll be much more reliable than just rolling back a snapshot and (possibly) risk creating inconsistencies (assumption on my end, I don't know for sure because I don't know your filesystem layout).

Petr Fischer · Oct 8, 2018

Code:

zpool import -FnNR

same situation as normal (read-write) import - command never ends. Even "zpool list" from another console never ends. System is 100% idle (no interrupts). Still 240MB of RAM Free. The zpool process is in "tx->tx" state and 0% CPU utilization.

Code:

freebsd-update rollback

I can't run this command because I am unable to boot from broken zroot pool, I am booting from FreeBSD Install Live CD.
Root filesystem is cd9660, mounted as read-only.

ShelLuser · Oct 9, 2018

Petr Fischer said:
Code:

zpool import -FnNR

same situation as normal (read-write) import - command never ends.

Then I'm running out of ideas.

What kind of swap space did you set up and what is its size? Is it embedded within the ZFS pool or did you set up a separate partition? If it is the latter then can you verify if it's being picked up and used by your rescue environment ( swapinfo -h)?

If it isn't being used then try enabling it (see swapon(8)) before you try to mount the ZFS pool. This is just a theory but I'm wondering if the system isn't running out of memory somehow, also because 512Mb isn't exactly much and ZFS is very resource intensive.

I'm aware of your comment about free RAM but even so... Can't hurt to verify.

Petr Fischer said:
Code:

freebsd-update rollback

I can't run this command because I am unable to boot from broken zroot pool, I am booting from FreeBSD Install Live CD.

I know, this is something to consider once you re-gained access to your system.

Petr Fischer · Oct 9, 2018

Swap turned on (swap partition, 1GB) - no luck. I can fortunately restore from backup. Thanks for your assistance (a lot of text).

Lessons learned: never fill the zpool to 100%. It looks like ZFS can not do any commit anymore.

Petr Fischer · Oct 10, 2018

Bugreport posted: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=232152