"All buffers synced" hang on FreeBSD9

chrcol · May 9, 2012

I updated a server to 9 and perhaps shouldn't have done the installworld as now I can't roll back.

On a 9.x kernel it hangs for ages on

Code:

All buffers synced

when it does eventually reboot it then gets

Code:

improper unmount on /

On 8.x no such issue.

It uses a 3ware device for hardware RAID1.

If I hit ctrl-alt-del during the hang it seems to rerun the shutdown script generating various errors as everything is already unmounted. Errors include going into single user mode due to

Code:

/bin/sh on /etc/rcv.shutdown terminated abnormally

and

Code:

some processes would not die ps axl advised

Then very long waits after for system processes to stop vnlru, bufdaemon, syncer etc. It doesn't actually drop into single user mode so can't run the ps command.

Terry_Kennedy · May 14, 2012

chrcol said:
It uses a 3ware device for hardware RAID1.

This could be a twa(4), twe(4), or tws(4) device, depending on the controller model. You might want to check the LSI site to see if there's a firmware update available for your controller. Also, I believe tws is new in FreeBSD 9.0 (on 8.2-RELEASE a driver from LSI needed to be downloaded and installed). A quick look doesn't show any critical post-9.0 changes to tws, though.

Ben · Jun 13, 2012

I have the same problem but no special controller. But it seems it hangs randomly. Any ideas?

kpa · Jun 13, 2012

I've seen this too on recent versions of 9-STABLE. My first suspect was that I was using USB memory sticks as root filesystem but I then switched to Root on ZFS without any USB mass storage devices and it still kept happening. I just today rebooted my system to install latest updates to OS and it mysteriously rebooted just fine. Usually there's a pattern, if you quickly reboot again after bringing the system up it reboots without a hang, with longer uptimes it usually hangs.

Ben · Jun 13, 2012

I have 9-RELEASE on ZFS. As you describe, immediate reboot works, after long uptime it won't.

Have you tried working on ACPI settings? I want to try

Code:

hw.acpi.disable_on_reboot

Remilia · Aug 10, 2012

Bumping this one. Just updated from 8-stable to 9-stable, and my server doesn't reboot normally any more (it only reboots normally if the uptime was negligible â€” less than 15 or even 10 minutes). Half of the time it doesn't even react to Ctrl-Alt-Del or Ctrl-C and never gets to the watchdog timer finishing, requiring a hardware reset.

This is extremely inconvenient as every reboot (OS update, kernel rebuild, etc) now requires contacting the data-centre staff so they actually press the reset button.

Hardware is a Q9400 on a Q33 motherboard by Gigabyte with AHCI enabled (using /dev/ada* for SATA drives).

Software is FreeBSD 9-stable r239151 (9.1-PRERELEASE), ZFS-enabled, but everything except /usr/src and jails is on UFS.

The system was built clean from a fresh svn checkout with make buildworld / make kernel / make installworld, mergemaster done, jails recreated from scratch (I thought it was my jail setup ruining the shutdown process, so I took the time to remake and rebuild all of them).

I didn't merge the new hast user, though, but I doubt it should affect the shutdown process.

xtaz · Aug 11, 2012

I recently built a server from scratch, updated it to the very latest RELENG_9_1 code which currently says 9.1-PRERELEASE. And I see something slightly different. When mine tries to unmount / it fails and comes up with some error text which mentions "dirty" in it and it then hangs the system on a "press a key to reboot" message or something like that. If I then power cycle it it says root wasn't dismounted properly, and recovers from journal etc.

I'm using 9.1 amd64 with softupdates journalling enabled. I was going to copy down the error text from it and post it into the mailing lists but I haven't had time to do it yet.

Stochastix · May 29, 2013

Just adding an entry to say that this problem hasn't gone away. I had frequent reboot hangs of this nature this morning while testing an upgrade from 9.0 to 9.1. This is fairly generic box. The board is GA-EP45-UD3L (CoreDuo) with an Intel 330 SSD system drive (ZFS root), and two SATA enterprise drives in a ZFS mirror. There's no UFS whatsoever. Network interfaces are re0 and fxp0. The box is running bind and Postgres/Apache/PHP. That's about it.

I had one such freeze after just two minutes of uptime: the kernel update portion of /usr/sbin/freebsd-update.

The system drive was actually cloned off another installation from a ZFS mirror. Never saw this freeze on the parent box, a low-end Xeon on a Supermicro board with 32 GB ECC, running the same drives and the same stack. On that box the interfaces are em0 and fxp0.

I'm guessing it's some kind of BIOS quirk.

xtaz · May 29, 2013

Old thread, but just noticed my post I made at the bottom of this thread last year. I found that my specific problem was related to the mail/postgrey port. If that had been running on the system at any time, even if it was stopped, then the OS wouldn't shut down properly with a clean unmount. The others in this thread probably have a different issue to this though.

Terry_Kennedy · May 30, 2013

xtaz said:
I found that my specific problem was related to the mail/postgrey port. If that had been running on the system at any time, even if it was stopped, then the OS wouldn't shut down properly with a clean unmount.

I suspect that's just triggering the actual low-level problem (which means there are probably other ways to trigger it). By the time the system gets to the "syncing buffers" state (particularly if it did not report "some processes would not die; ps axl advised"), there should be any user activity / user dirty buffers left to impede shutdown / restart.