System panic

_martin · Dec 12, 2022

Perfect, almost there. Assuming you have enough free space in /var run this from single mode: /etc/rc.d/savecore start and check the /var/crash afterwards.
edit: actually in single mode your fileset(s) may be in readonly mode. Do you know how to remount it read-write?

Tracker · Dec 12, 2022

_martin said:
Perfect, almost there. Assuming you have enough free space in /var run this from single mode: /etc/rc.d/savecore start and check the /var/crash afterwards.
edit: actually in single mode your fileset(s) may be in readonly mode. Do you know how to remount it read-write?

Ok , so I set readonly=off and ran savecore.

I am finally able to see /var/crash/core.txt.0 !!! Alongside vmcore.last plus a couple of other new files

However less on that file doesn't work. What do I do with it and how do I access it feom say a Ubuntu stick?

_martin · Dec 12, 2022

What does it mean it doesn't work ? That's the text summary of the crash, it should be readable. Copy the whole contents of the /var/crash to that usb stick and share it from there.

_martin · Dec 12, 2022

While I never did this before as there was no need you can save some time and avoid manual file copying. After crash once you are in single mode mount the usb key to, let's say /a and run the savecore command manually: savecore /a /dev/ada0p2 - it will save it to that directory directly and hence will be on USB key right away.

Tracker · Dec 12, 2022

When

_martin said:
What does it mean it doesn't work ? That's the text summary of the crash, it should be readable. Copy the whole contents of the /var/crash to that usb stick and share it from there.

When I tried less on one of the files it said there was no debugger or something like that. Now trying to copy the files, one of them is 450+ mb as well. Will post files sooner.

_martin · Dec 12, 2022

Ok, you don't have gdb installed. Not a problem. Along with that please can you do cksum /boot/kernel/kernel and post what version of FreeBSD you're running exactly?

Tracker · Dec 12, 2022

_martin said:
Ok, you don't have gdb installed. Not a problem. Along with that please can you do cksum /boot/kernel/kernel and post what version of FreeBSD you're running exactly?

Oops logged out now - really need to focus on recoering data and getting my system back now.

Here are the files:
"bounds" contains only

1

core.txt.0 contains only

Unable to find a kernel debugger.
Please install the devel/gdb port or gdb package.

info.0 contains only

Dump header from device: /dev/ada0p2
Architecture: amd64
Architecture Version: 2
Dump Length: 477515776
Blocksize: 512
Compression: none
Dumptime: 2022-12-12 12:21:49 +0400
Hostname: toaster
Magic: FreeBSD Kernel Dump
Version String: FreeBSD 13.1-RELEASE-p3 GENERIC
Panic String: VERIFY3(0 == zap_add_int(zfsvfs->z_os, zfsvfs->z_unlinkedobj, zp->z_id, tx)) failed (0 == 97)

Dump Parity: 3470720052
Bounds: 0
Dump Status: good

info.last contains only

Dump header from device: /dev/ada0p2
Architecture: amd64
Architecture Version: 2
Dump Length: 477515776
Blocksize: 512
Compression: none
Dumptime: 2022-12-12 12:21:49 +0400
Hostname: toaster
Magic: FreeBSD Kernel Dump
Version String: FreeBSD 13.1-RELEASE-p3 GENERIC
Panic String: VERIFY3(0 == zap_add_int(zfsvfs->z_os, zfsvfs->z_unlinkedobj, zp->z_id, tx)) failed (0 == 97)

Dump Parity: 3470720052
Bounds: 0
Dump Status: good

and then there's vmcore binary files I think which are 450+ mb

Let me know if this helps fix the system please?

_martin · Dec 12, 2022

The text is not good enough and full trace should be provided. But this is very important:

Panic String: VERIFY3(0 == zap_add_int(zfsvfs->z_os, zfsvfs->z_unlinkedobj, zp->z_id, tx)) failed (0 == 97)

Issue you are having is related to ZFS (cause of the panic is ZFS) and you need somebody with ZFS internals to tell you more (hence PR).

freebsd-version -kru
HW details - at least some description.
+stack backtrace and we have all info needed

Tracker · Dec 12, 2022

_martin said:
freebsd-version -kru
HW details - at least some description.
+stack backtrace and we have all info needed

Please check image

Tracker · Dec 12, 2022

So anyone still following this thread - seems like the culprit IS zfs, as many suspected, see this message: https://forums.freebsd.org/threads/system-panic.87387/post-591355

Now trying to recover data:
I'm trying to boot into a later BE (p4) and activated it via beadm activate - however it seems to be chrooting me into p3 only (as shown by uname -a, even after reboot). What am I doing wrong?

(Later BE to recover latest data)

See image for reference

Tracker · Dec 12, 2022

Strange. BE is set to p4 but single user mode login (which is the only thing I can do rn) uname says p3 version login

Please see image below for this

Why is this happening?

SirDice · Dec 12, 2022

p4 didn't involve the kernel, it only had some userland updates. P5 is also just a couple of userland updates. So a p3 kernel is perfectly normal.

_martin · Dec 12, 2022

I opened PR 268333.

Tracker · Dec 12, 2022

SirDice said:
p4 didn't involve the kernel, it only had some userland updates. P5 is also just a couple of userland updates. So a p3 kernel is perfectly normal.

So p3 is a month old, I'd like to backup from p4/p5 .... How can I make that happen?

Should I try running freebsd-update fetch/install?

Edit: Sorry I'm a bit confused about this. I guess the files/data don't really depend upon p3/4/5 , or do they?

_martin said:
I opened PR 268333.

Thank you! Please let me know if there's anything else you need from me or if there's a solution to my issue!

Tracker · Dec 12, 2022

Ok _martin's advice finally cracked it!!

Basically now zroot/tmp is not mounted.

Next what should I do? Is there a way to fix this zroot/tmp issue for good or do I need to still go ahead and backup coz this might blow up soon?

I see system is panicing during /tmp cleanup. Idea is to either disable this fileset or create a new one. The thing is I don't want to touch ZFS too much as we don't know what state is it in. Disabling it, however, should be ok.
In single mode do zfs set mountpoint=none zroot/tmp and reboot. This dataset would not be mounted but rather /tmp in / would be used. This could be the convenience you need to get to the full system and do backup from there.

SirDice · Dec 12, 2022

Can you rename it? Or would that blow up ZFS? zfs rename zroot/tmp zroot/tmp.broken If that works I would create a new tmp; zfs create -o mountpoint=/tmp zroot/tmp
After it's been mounted make sure to chmod 1777 /tmp as it needs the sticky(7) bit there.

Or you can just leave it as-is. It just means /tmp ends up in zroot/ROOT/default

_martin · Dec 12, 2022

I told him not to touch ZFS as much as possible. Maybe there are more issues there anyway but this way he's in full environment and can do a backup. Setting mount point to none is the least invasive approach.

He doesn't need to do anything to /tmp directory. FreeBSD "self-healing" /etc/rc.d/tmp does take care of it.

Jose · Dec 12, 2022

_martin said:
The thing is - we are all guessing. We don't know what's happening.

Yup. I've gleaned from reading too many of the OP's posts that it's an older system prone to overheating. My stab in the dark is that some aging component has started to fail, but only shows symptoms when the system overheats. Not too many paths forward besides new hardware.

I admire the time and effort you and others have spent trying to save the OP's data, though.

Tracker · Dec 12, 2022

Somehow I'm not able to mount the other disk that I need to backup to, after doing

 geli attach /dev/da0p3

Enter passphrase:

sudo mount /dev/da0p3.eli /mnt

mount: /dev/da0p3.eli: No such file or directory

I see the eli active though.

Also the data seems to have taken a hit

, Firefox won't start without asking me to create a new profile when I had multiple windows running. And Chrome won't even start. That's where I had some of my important stuff.

Jose said:
OP's posts that it's an older system prone to overheating.

What specifically gives it away that it has overheating issues?

Jose said:
I admire the time and effort you and others have spent trying to save the OP's data, though.

Definitely. All of them are rockstars for having gone out of their way to help me

Even though my data seems to have taken a hit

Jose · Dec 12, 2022

Tracker said:
What specifically gives it away that it has overheating issues?

Tracker said:
...this machine is pretty old so I guess that should be reasonable? Was already having temp issues when overloaded...

Also the reboots during compilations are pretty typical effects of overheating.

covacat · Dec 12, 2022

looks a lot like https://support.oracle.com/knowledge/Sun Microsystems/2421977_1.html
just i can't remember my larry support account

_martin · Dec 12, 2022

covacat: Sounds interesting, even more so that the KB is not that old. Sadly I don't have valid MOS either.

It's up to you how you decide to do a backup, there are more ways to skin a cat. I would opt for filesystem backup using rsync and would not do zfs send. I mean as you do have corrupted pool issue is there one way or the other. It would be my personal preference though.

In a private chat you mentioned this disk you're using is somewhat backup of the original one. Pay attention you don't have pools with the same name on both disks.
If da0p3.eli doesn't exist after you entered passphrase you didn't enter a proper one then. Syslog (/var/log/messages) might give you more information about that.

_martin · Dec 12, 2022

Actually, I do have MOS support. I won't blindly copy-paste the contents of the link here though.
Suggested solutions were mentioned here actually (scrub). if that fails restore is needed.

I went through your pictures you shared here again. One where you share zpool status -v (those 3 errors) is important. This picture is what lead me to the suggestion to disable rpool/tmp dataset in the first place. I suggest you attemp to clean it this way.

a) chromium: I'm not sure how much data you have there (bookmarks, saved passwords, etc.) but I'd rather have chromium recreate everything from scratch. As root (without chromium running do this), purposely split into two commands:

Code:

cp -rp /usr/home/c1utt4r/.config/chromium /var/crash
rm -rf /usr/home/c1utt4r/.config/chromium

b) zroot/tmp .. As mention before you could probably remove zroot/tmp and recreate it again. But this is something I'd do rather _after_ you have backup done.
Interesting point: if you can't fix metadata on a dataset you should restore the whole pool, i.e. don't trust the pool at all.

You had reported issues only on a) and b) so I'd say your data are still safe. And as you don't have any other means of backup this is the only option for you.

Tracker · Dec 12, 2022

covacat said:
looks a lot like https://support.oracle.com/knowledge/Sun Microsystems/2421977_1.html
just i can't remember my larry support account

This is the link that the result which shows error points to https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A/ .... seems like a metadata level corruption .... not sure how to get rid of it .... but first I guess I need to salvage whatever data remains

_martin said:
If da0p3.eli doesn't exist after you entered passphrase you didn't enter a proper one then. Syslog (/var/log/messages) might give you more information about that.

Passphrase is correct - it attaches itself but it doesn't mount. Here is the output to show

 

sudo zdb -l /dev/da0p3.eli 

------------------------------------

LABEL 0

------------------------------------

    version: 5000

    name: 'zroot'

    state: 0

    txg: 3557598

    pool_guid: 10535025700179738651

    hostid: 2647270205

    hostname: ''

    top_guid: 1525963299974165836

    guid: 1525963299974165836

    vdev_children: 1

    vdev_tree:

        type: 'disk'

        id: 0

        guid: 1525963299974165836

        path: '/dev/ada0p3.eli'

        phys_path: 'id1,enc@n3061686369656d30/type@0/slot@3/elmdesc@Slot_02/p3/eli'

        whole_disk: 1

        metaslab_array: 67

        metaslab_shift: 31

        ashift: 12

        asize: 311476617216

        is_log: 0

        DTL: 284

        create_txg: 4

    features_for_read:

        com.delphix:hole_birth

        com.delphix:embedded_data

    labels = 0 1 2 3

_martin said:
It's up to you how you decide to do a backup, there are more ways to skin a cat. I would opt for filesystem backup using rsync and would not do zfs send. I mean as you do have corrupted pool issue is there one way or the other. It would be my personal preference though.

I was hoping to use zfs for file permissions, etc being the same, and possibly easier. If data is corrupted (as it seems) maybe zfs is a better option than rsync ?

_martin · Dec 12, 2022

If that eli is zfs pool you need to import it, you can't mount it as a regular FS (you are mounting it as FFS actually as that's the default fs for FreeBSD).
Also, as I had suspected, that pool is also named zroot. If you run zpool import you should see the pool.

I'm not particularly proud of editing my posts but I noticed this:

Code:

zdb -l /dev/da0p3.eli 

path: '/dev/ada0p3.eli'

You didn't explain how you got to that disk but pay attention. It seems those are clones of some sort -- you can make a mess if you try to import it.

System panic

_martin

Tracker

_martin

_martin

Tracker

_martin

Tracker

_martin

Tracker

Attachments

Tracker

Attachments

Tracker

Attachments

SirDice

Administrator

_martin

Tracker

Tracker

SirDice

Administrator

_martin

Jose

Tracker

Jose

covacat

_martin

_martin

Tracker

_martin