System panic

Ok so memtest86 results after running for 5 hrs+ ..... Think that's enough.

Was hoping to find some errors with RAM. The theory of memory hog Chrome instance starting to crash + ports compiling causing reboot made some sense. But the memtest86 doesn't seem to show any issues with memory. Check image below

Is this definite enough to conclude RAM is fine?

Will try some other steps and report back
 

Attachments

  • IMG20221210211858.jpg
    IMG20221210211858.jpg
    294.8 KB · Views: 51
Try to comment out the line
Code:
kld_list="/boot/modules/i915kms.ko"
Then
Code:
pkg update -f
pkg install -f gpu-firmware-kmod
Update as I promised on this - was able to install and comment it out. Same issue. Panic and not booting.

Will try to have an empty/zfs enable rc.conf next
 
Dear Tracker,
Before buying a new machine I would at least try a reinstall of 13.1 - you never know.
I would follow the FreeBSD-12 instead. FreeBSD-12.4 will be supported for quite some time. Then the smoke on ZFS might have settled. There is nothing wrong tracking a mature release - if there is no other show stopper. At least I will do so.
Kind regards,
Christoph
 
Will try to have an empty/zfs enable rc.conf next
Ok this is interesting! A blank rc.conf allowed me to login as normal user! But without zfs enabled I don't think I'll be able to mount the filesystem. Will try zfs enabled again and report. Need to go back again into single user mode FML 🤣

But seems like we are getting closer to finding the cause?
I would follow the FreeBSD-12 instead. FreeBSD-12.4 will be supported for quite some time. Then the smoke on ZFS might have settled. There is nothing wrong tracking a mature release - if there is no other show stopper. At least I will do so.
Yes this makes sense, although with the above blank rc.conf I was able to login. So maybe will keep this option as backup if I hit a dead end later.
 
Ok very weird. A blank rc.conf let's me login into multiuser mode

While a rc.conf with only
zfs_enable="YES"

Causes the panic and doesn't boot!

Could zfs be the culprit? Seems like it but not sure how to go ahead
 
Switched rc.conf.backup earlier to rc.conf as earlier with all the values. Then went onto _comment out_ zfs_enable="YES" and now I can even see the GUI login screen as earlier! Woohoo.

But it doesn't allow me to login because zfs is the filesystem :/ (failed to execute login)

Now I need to figure out how to fix zfs I guess? Any pointers?
 
Just some random commands, check if nothing unusual.
Code:
zpool status -x
zpool list -v
zfs list
I was actually using `zfs list` output, all this while, to set readonly=off variable to be able to edit files in single user mode.

This command actually gives some errors that are related to Chrome!!! Had a sneaky feeling something had to do with Chromium

See attached image below
zpool status -v
It asks to restore the file on question if possible or to restore entire pool from backup. What should I do?🤔

Did you install a ZFS development version, perchance? Either sysutils/openzfs-kmod or sysutils/openzfs.

(Thanks to Erichans for teaching me about these.)
No I had the standard zfs, might have switched automatically if freebsd changed it with versions 12.x to 13.1.

I however remember doing some operations with zfs and asking about it when it wasn't working earlier. Maybe I messed up something then (however it worked fine for a couple of months) that's come back to bite me now 🤔
 

Attachments

  • IMG20221210223013.jpg
    IMG20221210223013.jpg
    316.2 KB · Views: 45
Code:
zpool scrub -w zroot
Make take alot of time ...
Thanks. Started the scrub without w option but can see it progressing under `zpool status`

Quick question:
1) It shows Chromium files and locations in the image in my last reply. Would the scrub basically be deleting those files? Could I have done whatever scrub would do with those files manually?

2) How does this kind of problem arise in the first place? Chrome(ium) interfering and corrupting zfs pool? 🤔
 
Also most important question: after the scrub setting the zfs_enable="YES" in rc.conf should get things back to
normal? I hope

UPDATE: Scrub finished, but it's still showing errors: Permanent errors have been detected in the following files.....like earlier

Should I do zfs_enable again or do I need to do something else?
 
So I tried enabling `zfs_enable="YES"` in rc.conf again after scrub.

Tried rebooting and same issue again! Won't boot in multiuser mode, panic! ☹️
 
Memtest was successful, you did let it run for more than enough time.

Looking at the stacktrace you provided it's a bit interesting. It seems picture doesn't show all information (screen was scrolled), there seems to be an issue before frame 5. When reading the output from bottom up after warning you can see vpanic() function already. Also what's interesting is that the process that triggered it is rm. That's weird.

I tried to think about it but without dump to give more information I don't know. When in single mode, are there any other messages prior to the crash? Any MCA errors, etc.
 
If you configure the dump and you're willing to share it I can do the bureaucracy, I'll open a PR. If you are up to it we should trigger the crash as close to GENERIC without any additional modules as possible. First:
a) can you confirm you're running GENERIC kernel? i.e. you didn't compile it yourself.

If yes we can move to point 2, and that is triggering the crash without compiled drm module, with zfs only. Can you comment everything in rc.conf and leave it with the zfs_enable="YES" only? Are you able to trigger crash this way ?If so, please share the stacktrace of the failure.

Third point: configure dump device. Do you have swap partition on your system? If you're not sure please show us the gpart show so we can check. If yes we need to do what I mentioned above. Once you have a crash in /var/crash we are ready to open PR.
 
Ok so memtest86 results after running [....]

Is this definite enough to conclude RAM i

No, memtest doesn't stress the RAM enough. I use SuperPi to establish that the RAM has the right timings etc (Linux binary, but runs in Linuxulator).

And mprime/prime95 to establish the CPU is OK.
 

Seems multiple runs of scrub and clear may be required.
Hmm. So I rebooted the system again in single user mode and the chromium files that were showing as permanent errors have disappeared automatically. Now the only permanent errors is this

zroot/tmp:<0x3>

Trying running scrub again now. Not sure about clear - should I be running it?
 
After mount of zroot/tmp
Everything which is in /tmp can be safely deleted.
Code:
rm -fR /tmp/* /tmp/.??*
& run scrub again.
Perform
Code:
zpool import zroot
zfs mount -a
verify
/boot/loader.conf
/etc/rc.conf
 
No, memtest doesn't stress the RAM enough. I use SuperPi to establish that the RAM has the right timings etc (Linux binary, but runs in Linuxulator).

And mprime/prime95 to establish the CPU is OK.
This is interesting. Will try to keep in mind. Hopefully I solve this issue for now by scrubbing but yes crashing upon compiling earlier does make me still wonder if it could be a RAM issue.
If you configure the dump and you're willing to share it I can do the bureaucracy, I'll open a PR. If you are up to it we should trigger the crash as close to GENERIC without any additional modules as possible. First:
a) can you confirm you're running GENERIC kernel? i.e. you didn't compile it yourself.

If yes we can move to point 2, and that is triggering the crash without compiled drm module, with zfs only. Can you comment everything in rc.conf and leave it with the zfs_enable="YES" only? Are you able to trigger crash this way ?If so, please share the stacktrace of the failure.

Third point: configure dump device. Do you have swap partition on your system? If you're not sure please show us the gpart show so we can check. If yes we need to do what I mentioned above. Once you have a crash in /var/crash we are ready to open PR.
Thanks for this offer. I'll see if the issue doesn't get resolved then maybe will go down this route. For now my primary objective is just to get it running.

Yes I think I'm using GENERIC although the boot menu shows 2 options, probably something I must have tried to compile long back as a learning exercise.

The zfs_enable causes issues and panic when enabled in rc.conf as the only line in it as you suggested.
 
After mount of zroot/tmp
Everything which is in /tmp can be safely deleted.
Code:
rm -fR /tmp/* /tmp/.??*
& run scrub again.
Perform
Code:
zpool import zroot
zfs mount -a
verify
/boot/loader.conf
/etc/rc.conf
Ok this is very strange. I just mentioned that the tmp file was the only one showing. However it seems like the Chromium files have reappeared under permanent errors, unsure if it happened after the scrub but as far as I can tell it seemed to not be there before as I mentioned earlier. Strange because Chromium was obviously not run since the system is unable to run.

Should I just delete the Chromium files along with the tmp file.as.you suggested earlier? Although that Chromium instance might have my new data
 
Ok this is very strange. I just mentioned that the tmp file was the only one showing. However it seems like the Chromium files have reappeared under permanent errors, unsure if it happened after the scrub but as far as I can tell it seemed to not be there before as I mentioned earlier. Strange because Chromium was obviously not run since the system is unable to run.

Should I just delete the Chromium files along with the tmp file.as.you suggested earlier? Although that Chromium instance might have my new data
I am not a zfs pro, but from the link I sent you, I have the impression that the 'clear' is required to reset the error stats.
Anyhow if it is just a Chrome directory, why not delete the whole directory and scrub/clear once after this?
 
It's apparently oscillating from showing permanent error files from zroot/tmp:<0x3> + Chromium files to just tmp::0x3 again

That suggestion to delete everything in tmp - When I do LS on tmp it seems to show a couple of files with BE names matching recent BEs as well.

And that 0x3 in brackets just isn't there at all under tmp directory.
 
Back
Top