synth/pkg causing kernel panics and reboots

Well this is something I've not run across before, yesterday I decided to do a synth upgrade-system then went out for awhile. When I came back my system had reboot, just assumed there was a power outage. But trying to do anything resulted in a bunch of this and that not found errors. So I run synth again and sit there and watch, when it comes to installing the packages everything looks like then it starts installing a package and BAM kernel panic and instant reboot...too fast for me to see.

So I try and pkg add each new package, but each package says it is already installed, but executing any thing give you missing lib errors. So I have to force add, pkg add -f <pkg name>, so everything goes well for 30 or 40 packages then BAM I force add a package and kernel panic reboot. So I assume that is the problem package I move it and try and install try and install the remaining packages. Now because I'm lazy and don't want to type pkg add -f for 322 packages (the number of new packages synth built), I just pkg add -f * in the synth/live_packages/All dir. Again, everything is going fine...then BAM kernel panic and reboot, but this time it was on a package that had previously installed fine. So now at this point in time, I'm stuck. Other than a fresh install I'm out of ideas. It been about 20 something hours now.

Systems. FreeBSD 11 amd64, root file system is UFS no ZFS, because I thought the overhead would be too much, root is only a 128GB SSD drive. /home and /usr/ports are on a different
non-SSD drive. AMD FX-6100 processor, 16GB RAM
 
Random lockups or panics are typically caused by bad memory. As long as the bad bit isn't used things work but as soon as something tries to read/write to the bad memory things go haywire. If you get a panic during a disk write the filesystem could end up being corrupted. This can result in the weird missing libraries (filesystem checks can only repair fairly simple errors, so you often end up with missing files).
 
Tested the RAM and it fine...but I was thinking it might be the SSD drive. Any suggestions for testing an SSD drive. I did by it used so not sure how many hours were on it then.
 
Testing the SSD is a bit tricky because you don't want to write a lot of random data to it. But reading the disk shouldn't be too much of a burden; something like dd if=/dev/da0 of=/dev/null will read the whole disk end-to-end. Any read errors would obviously be bad. You'll also want to run smartctl(8) (sysutils/smartmontools) to view the disk's SMART data.
 
Yes ran smartmontools and the manufacturer utility and every test is saying hardware is fine. So guess am stuck with the reinstall option. Thanks
 
Also make sure it's not some timing problem due to using wrong settings in the BIOS/UEFI. Especially when it comes to DRAM timing settings and clock multipliers. Those could also result in random errors.
 
I would still try to do a write test to that SSD. It would be a shame to reinstall only to find out it still had problems. You're not sacrificing much of the device's life, they are pretty resilient.
 
Well...every test I did said hardware is ok, SSD included. So I installed TrueOS on a 2nd SSD I had laying around just to get up a running. Hating TrueOS...probably try and get the FreeBSD drive working tomorrow...gonna go watch some 3rd division US soccer today to clear my mind :)
 
Random lockups or panics are typically caused by bad memory.
True. But if it is a trap 12 always at the same address it isn't random. It is either a defective memory module (but normally several addresses will be affected, so this isn't likely) or you've hit a kernel bug. Any place the kernel expects that data might not be valid has either an ASSERT or a test for NULL before it tries to use the data. Even a corrupted filesystem shouldn't panic an already-booted kernel (the loader and friends get a pass because of the environment they're in). If it does, and it is reproducible - even with a corrupted filesystem - it is a bug and should be reported so the kernel can add it to the list of places it checks for invalid data.
 
The Kernel panics happen on the install of the packages that have been built. Synth builds 322 packages successfully. I have manually installed the created packages using pkg and the panic happens on different packages. So given that it would seem hardware related but all test says everything is fine. I'll enable kernel debugging tonight and see if I can get any useful info from that before I wipe this drive and start from scratch.
 
Back
Top