reboot hang with 'All buffers synced' after freebsd-update FreeBSD9 to p3 on ZFS root

Hello FreeBSD community,


I have a problem with FreeBSD 9-RELEASE installed on ZFS mirror file system using tips from here:
http://wiki.freebsd.org/RootOnZFS/GPTZFSBoot/9.0-RELEASE


Immediately after installation I'm inovking:
Code:
# freebsd-update fetch
# freebsd-update install


Update is always successfull, but when I invoke:
Code:
# reboot / # shutdown -r / # halt / # poweroff / etc.
system will try to reboot, but will stop on the
Code:
All buffers synced
message.


The funny thing is that the server is responding to ping's, prints messages if I attach USB stick etc.


This problem doesn't take place if FreeBSD is installed "normally", on UFS.

Also it doesn't occurs when I remove 'world' from Components in /etc/freebsd-update.conf
(so the line looks like this):
Code:
Components src kernel


The problem also occurs when I invoke
Code:
freebsd-update rollback


---

I searched a lot, but found nothing related to freebsd-update.


Thank you for your time,
Mark
 
Two more things:

1) The interesting fact is that after invoking the freebsd-update fetch we get list of files to be updated with information:
Code:
The following files will be updated as part of updating to 9.0-RELEASE-p4
but after reboot we've go the -p3 version.


2) The problem is very easy to reproduce: Just create Virtual Machine in VirtualBox with two disks, install using instructions I have used (http://wiki.freebsd.org/RootOnZFS/GPTZFSBoot/9.0-RELEASE), after installation reboot, login as root, invoke:
Code:
freebsd-update fetch
freebsd-update install
reboot
VM will hang at 'All buffers synced' message. Maybe it is a bug?


I will try now with 9.0-RC1 and share results.
 
Last edited by a moderator:
OK, tried with 9.1-RC1, but there are no updates, so don't know if the problem exist.

I would really post all my posts in one message, but I have no ability to do this [no edit mode], so feel free to concatenate them.
 
The problem is still open! (just to be clear)

Please test it in your environment or tell where I should report this bug.

Thank you.
 
I am also experiencing the same symptoms where the reboot process hangs at All buffers synced, but this is on a physical machine.

I am able to get to this stage as I re-installed my FreeBSD box so that I can have ZFS on root, from the HOWTO below:

http://www.freebsdwiki.net/index.php/ZFS,_booting_from

This is also occurring when performing a freebsd-update fetch, freebsd-update install, and a reboot following that.

System is running 9.0-RELEASE on an Intel Xeon.
 
As a follow up, I find it ironic that I was performing an update from 9.0-RELEASE to 9.0-p4 (as per freebsd-update fetch output)... and upon restarting the system, the output in uname -a states that the system is running at 9.0-p3, not p4.
 
I am also experiencing this on reboot running -CURRENT. Must be something MFC'd recently?

I did get a stack trace on shutdown (which ironically resulted in a reboot).
Too quick to catch the contents, I think I saw some locking related stuff in the backtrace.
 
I believe it's resolved on current as of r241022.

I just rebuilt world and kernel to today's sources and the problem is gone.
It says MFC after 1 week on the commit (9/28).
 
I'm glad that I'm not the only one with this problem :)

deepdish said:
but this is on a physical machine.
deepdish said:
System is running 9.0-RELEASE on an Intel Xeon.
Of course I'm experiencing this on physical machine too (and it is Xeon based). I just wanted to point that it is probalby not the hardware problem because of same behaviour on VM.

I have reported this problems (hang and update to p3, not p4) to FreeBSD developers, waiting for reply now.


Thank you guys for reply to this topic!
 
Broke my FreeBSD install

Excuse my newness - I am generally a Linux user and don't have much experience with FreeBSD yet.

I have experienced this very same issue installing FreeBSD on my new file server. It is an E3 Xeon system with a mirrored ZFS root.

I too ran freebsd-update fetch and freebsd-update install and noticed my system had hung on All buffers synced. After hard rebooting, BIOS reports there are no valid boot partitions anymore.

Could either of you tell me what could have gone wrong, if either of you experienced this same issue, and if there is an easy way to fix this?

Thanks in advance,

JG
 
Last edited by a moderator:
johnnygear said:
Excuse my newness - I am generally a Linux user and don't have much experience with FreeBSD yet.

I have experienced this very same issue installing FreeBSD on my new file server. It is an E3 Xeon system with a mirrored ZFS root.

I too ran freebsd-update fetch and freebsd-update install and noticed my system had hung on All buffers synced. After hard rebooting, BIOS reports there are no valid boot partitions anymore.

Could either of you tell me what could have gone wrong, if either of you experienced this same issue, and if there is an easy way to fix this?

Thanks in advance,

JG

Jumped back onto this tonight to begin troubleshooting this issue.

Looks like my system is booting again; Can't say why my system failing to find a boot device coincided with running updates and rebooting, but the two issues may have been unrelated as it is now finding a boot device again.

On the other hand, I get taken to mountroot prompt, which currently I am unfamiliar with; Currently reading up on what I need to do to mount my ZFS root and boot into FreeBSD

JG
 
AlexJ said:
There's something wrong...IMHO. Lets hope that it isn't Colin's health/family related problems that take delay on maintaining freebsd-update(8).

As csup/cvsup are soon to be deprecated for ports/src or both, people may wish to get upto speed on svn (/subversion) if they don't wish to run into the above linked PR's. (I've never used freebsd-update so that is all I can suggest, not knowing enough details about the PR's, nor usage.)
 
jb_fvwm2 said:
As csup/cvsup are soon to be deprecated for ports/src or both, people may wish to get upto speed on svn (/subversion) if they don't wish to run into the above linked PR's.
I'm not sure if I got what do you mean. Are you talking about compiling everything versus grabbing precompiled ones? Well if it the case, then it is just a personal preference.
If one is geeky enough to review ALL source code before installing something or willing to change something inside of a source, then - yes, it can recoup, but if you have on hand more then hundred servers then I(and owner/managers of companies) don't think it is a good idea to recompile kernels and world on all servers. There is a mechanism to precompile once and redistribute it across computers and this must work.
 
Personal preference maybe, but one may want to consider whether one is fully informed of all possible errors that could occur with svn vs all that could occur with freebsd-update... (Csup imho being more tolerant of error recovery in such instances (as far as I know. I could be wrong.)
 
jb_fvwm2 said:
Personal preference maybe, but one may want to consider whether one is fully informed of all possible errors that could occur with svn

I afraid I didn't explain my thoughts enough. The point is - use cvs/svn to download the source code, review/change then compile - ONCE, and redistribute update across server's farm as binary update.
There's no any point to shutdown bunch of servers for a long period of time by giving them useless THE SAME job. I can't see any point to do that if I have pretty powerful machine that can prepare update for particular group of servers with the same parameters and quickly redistribute it with minimum downtime.

jb_fvwm2 said:
Csup imho being more tolerant of error recovery in such instances (as far as I know. I could be wrong.)

csup(1) IMHO should be dropped a long time ago(very LONG) since CVS doesn't have mechanism to check integrity(actually not csup, but CVS and primary repository). Beside of that both of them CVS and SVN are centralized server/client model that is by its nature vulnerable to DDoS and have single point of failure.
 
m6tt said:
I believe it's resolved on current as of r241022.

I just rebuilt world and kernel to today's sources and the problem is gone.
It says MFC after 1 week on the commit (9/28).

I have the same in 9.1-RELEASE (zfs-boot). But this patch does not succeed on 9.1 ..

Any suggestions?
 
It seems there are no patches for 9.1-RELEASE.. I've tried to apply the patches manually (since some suggested it just might be offset errors), but the system became highly instable. Poudriere is a nice tool to test stability :)
 
Well, count me in too!

After updating one of my servers from 9.0-RELEASE (p4) to 9.1-RELEASE I've suffered the same problem. I use GENERIC kernel binary updated via freebsd-update(8). After installing the Kernel (first pass of freebsd-update install) the system rebooted after noticeable delay. After installing world (second pass) it didn't reboot. Luckily this server is in the same building so when I went to the console the last message was "all buffers synced"...

It's not ZFS-root related since my root partition is UFS and I use ZFS only for /var. I think this is hardware-related because I've already updated some of my servers with no such problems. So I disabled USB3 and AMD C6 Pstate min from Bios with no effect at all. I think my dmesg output may be useful:
Code:
Copyright (c) 1992-2012 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 9.1-RELEASE #0 r243825: Tue Dec  4 09:23:10 UTC 2012
    root@farrell.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64
CPU: AMD A6-3650 APU with Radeon(tm) HD Graphics (2600.00-MHz K8-class CPU)
  Origin = "AuthenticAMD"  Id = 0x300f10  Family = 12  Model = 1  Stepping = 0
  Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x802009<SSE3,MON,CX16,POPCNT>
  AMD Features=0xee500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!>
  AMD Features2=0x37ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,SKINIT,WDT>
  TSC: P-state invariant, performance statistics
real memory  = 17179869184 (16384 MB)
avail memory = 16262606848 (15509 MB)
Event timer "LAPIC" quality 400
ACPI APIC Table: <GBT    GBTUACPI>
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 1 package(s) x 4 core(s)
cpu0 (BSP): APIC ID:  0
cpu1 (AP): APIC ID:  1
cpu2 (AP): APIC ID:  2
cpu3 (AP): APIC ID:  3
ioapic0: Changing APIC ID to 2
ioapic0 <Version 2.1> irqs 0-23 on motherboard
kbd1 at kbdmux0
ctl: CAM Target Layer loaded
acpi0: <GBT GBTUACPI> on motherboard
acpi0: Power Button (fixed)
acpi0: reservation of 0, a0000 (3) failed
acpi0: reservation of 100000, cfca0000 (3) failed
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
cpu2: <ACPI CPU> on acpi0
cpu3: <ACPI CPU> on acpi0
attimer0: <AT timer> port 0x40-0x43 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff irq 0,8 on acpi0
Timecounter "HPET" frequency 14318180 Hz quality 950
atrtc0: <AT realtime clock> port 0x70-0x73 on acpi0
Event timer "RTC" frequency 32768 Hz quality 0
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <32-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
acpi_button0: <Power Button> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
vgapci0: <VGA-compatible display> port 0xf800-0xf8ff mem 0xd0000000-0xdfffffff,0xfdfc0000-0xfdffffff irq 18 at device 1.0 on pci0
pcib1: <ACPI PCI-PCI bridge> irq 16 at device 4.0 on pci0
pci1: <ACPI PCI bus> on pcib1
re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F PCIe Gigabit Ethernet> port 0xde00-0xdeff mem 0xfd9ff000-0xfd9fffff,0xfd9f8000-0xfd9fbfff irq 16 at device 0.0 on pci
1
re0: Using 1 MSI-X message
re0: Chip rev. 0x2c800000
re0: MAC rev. 0x00000000
miibus0: <MII bus> on re0
rgephy0: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 1 on miibus0
rgephy0:  none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-F
DX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow
re0: Ethernet address: 50:e5:49:bd:35:4f
pcib2: <ACPI PCI-PCI bridge> irq 17 at device 5.0 on pci0
pci2: <ACPI PCI bus> on pcib2
re1: <RealTek 8168/8111 B/C/CP/D/DP/E/F PCIe Gigabit Ethernet> port 0xee00-0xeeff mem 0xfdeff000-0xfdefffff irq 17 at device 0.0 on pci2
re1: Using 1 MSI message
re1: Chip rev. 0x38000000
re1: MAC rev. 0x00000000
miibus1: <MII bus> on re1
rgephy1: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 1 on miibus1
rgephy1:  none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow
re1: Ethernet address: 00:1d:0f:bf:1c:5b
ahci0: <AHCI SATA controller> port 0xff00-0xff07,0xfe00-0xfe03,0xfd00-0xfd07,0xfc00-0xfc03,0xfb00-0xfb0f mem 0xfe02f000-0xfe02f7ff irq 19 at device 17.0 on pci0
ahci0: AHCI v1.30 with 6 6Gbps ports, Port Multiplier supported
ahcich0: <AHCI channel> at channel 0 on ahci0
ahcich1: <AHCI channel> at channel 1 on ahci0
ahcich2: <AHCI channel> at channel 2 on ahci0
ahcich3: <AHCI channel> at channel 3 on ahci0
ahcich4: <AHCI channel> at channel 4 on ahci0
ahcich5: <AHCI channel> at channel 5 on ahci0
ohci0: <OHCI (generic) USB controller> mem 0xfe02e000-0xfe02efff irq 18 at device 18.0 on pci0
usbus0 on ohci0
ehci0: <EHCI (generic) USB 2.0 controller> mem 0xfe02d000-0xfe02d0ff irq 17 at device 18.2 on pci0
usbus1: EHCI version 1.0
usbus1 on ehci0
ohci1: <OHCI (generic) USB controller> mem 0xfe02c000-0xfe02cfff irq 18 at device 19.0 on pci0
usbus2 on ohci1
ehci1: <EHCI (generic) USB 2.0 controller> mem 0xfe02b000-0xfe02b0ff irq 17 at device 19.2 on pci0
usbus3: EHCI version 1.0
usbus3 on ehci1
pci0: <serial bus, SMBus> at device 20.0 (no driver attached)
isab0: <PCI-ISA bridge> at device 20.3 on pci0
isa0: <ISA bus> on isab0
pcib3: <ACPI PCI-PCI bridge> at device 20.4 on pci0
pci3: <ACPI PCI bus> on pcib3
rl0: <RealTek 8139 10/100BaseTX> port 0xce00-0xceff mem 0xfdbff000-0xfdbff0ff irq 20 at device 6.0 on pci3
miibus2: <MII bus> on rl0
rlphy0: <RealTek internal media interface> PHY 0 on miibus2
rlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
rl0: Ethernet address: 00:1d:0f:c2:e4:a5
ohci2: <OHCI (generic) USB controller> mem 0xfe02a000-0xfe02afff irq 18 at device 20.5 on pci0
usbus4 on ohci2
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
orm0: <ISA Option ROM> at iomem 0xc0000-0xcefff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
ppc0: cannot reserve I/O port range
hwpstate0: <Cool`n'Quiet 2.0> on cpu0
Timecounters tick every 1.000 msec
ipfw2 (+ipv6) initialized, divert loadable, nat loadable, rule-based forwarding disabled, default to deny, logging disabled
DUMMYNET 0 with IPv6 initialized (100409)
load_dn_sched dn_sched FIFO loaded
load_dn_sched dn_sched QFQ loaded
load_dn_sched dn_sched RR loaded
load_dn_sched dn_sched WF2Q+ loaded
load_dn_sched dn_sched PRIO loaded
usbus0: 12Mbps Full Speed USB v1.0
usbus1: 480Mbps High Speed USB v2.0
usbus2: 12Mbps Full Speed USB v1.0
usbus3: 480Mbps High Speed USB v2.0
usbus4: 12Mbps Full Speed USB v1.0
ugen0.1: <AMD> at usbus0
uhub0: <AMD OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
ugen1.1: <AMD> at usbus1
uhub1: <AMD EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1
ugen2.1: <AMD> at usbus2
uhub2: <AMD OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2
ugen3.1: <AMD> at usbus3
uhub3: <AMD EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus3
ugen4.1: <AMD> at usbus4
uhub4: <AMD OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus4
uhub4: 2 ports with 2 removable, self powered
uhub0: 5 ports with 5 removable, self powered
uhub2: 5 ports with 5 removable, self powered
uhub1: 5 ports with 5 removable, self powered
uhub3: 5 ports with 5 removable, self powered
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <Hitachi HDS723020BLA642 MN6OA800> ATA-8 SATA 3.x device
ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada0: Previously was known as ad4
ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
ada1: <Hitachi HDS723020BLA642 MN6OA800> ATA-8 SATA 3.x device
ada1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada1: Previously was known as ad6
SMP: AP CPU #2 Launched!
SMP: AP CPU #1 Launched!
SMP: AP CPU #3 Launched!
Timecounter "TSC-low" frequency 10156236 Hz quality 800
GEOM_MIRROR: Device mirror/leo0rootm launched (2/2).
GEOM_MIRROR: Device mirror/leo0tmpm launched (2/2).
GEOM_MIRROR: Device mirror/leo0usrm launched (2/2).
Trying to mount root from ufs:/dev/ufs/leo0root [rw]...
WARNING: / was not properly dismounted
ZFS filesystem version 5
ZFS storage pool version 28
re0: link state changed to UP
re1: link state changed to UP
rl0: link state changed to UP

Now I have two options: go back to 9.0-RELEASE or reboot the system via the UPS (NUT)...
I'm willing to test any solution.
 
Last edited by a moderator:
I forgot to mention that after reset or power off/on the machine I get messages for not properly dismounted UFS filesystems. It seems that system hangs before dismounting filesystems but after buffer sync. When the system locks up there is no visible (indicated with HDD LED) disk activity.
 
I think the problem becomes bigger for me...
Recently I upgraded another server to FreeBSD 9.1-RELEASE via freebsd-update(8) and it refused to reboot in the same manner as my above mentioned one!
The last message was
Code:
All buffers synced

So if we suppose that freebsd-update causes the problem recompiling world and kernel should solve it.
I updated src/* via subversion, rebuilt world and the Kernel (GENERIC in my case) and reinstalled it as I've done for many years before announced replacement of CVSup...
I even tried mergemaster(8) to find if any file is not correctly replaced.

The result is the same - no reboot or power off, the machine can be rebooted only via direct hardware access (to the power switch/reset).
As a last resort I tried to replace /etc/rc.shutdown with my older one (from 9.0-RELEASE) which rebooted fine... No effect at all...
I'm a bit disturbed since not all my servers have individual UPSes for hardware reboot as the following:
Code:
upsmon -c fsd

Just to remind: I use gptboot, my root is GPT and /var is ZFS.
After the hardware reboot UFS+S checks volume integrity because an unclean shutdown.

Thanks in advance for any help on this problem!
 
Last edited by a moderator:
Thank you very much for your reply! I should review mainly the mailing lists despite those comfortably interfaced forums.

I can confirm that after updating to 9-STABLE (via SVN) system reboots properly. I'm not brave enough to patch just one file putting the integrity of the system at risk this way. Staying about a decade on RELEASE branch now I have some concerns about the security and stability of my system. As the Handbook (25.5.2:1) says:
This is still a development branch, however, and this means that at any given time, the sources for FreeBSD-STABLE may or may not be suitable for any particular purpose

But as I guess I have no choice until the next Release...

On the topic: as far as I understand the problem is not hardware but ZFS related? Should it be mentioned in Errata or issued as official update to the affected release? The patch is issued long before 9-1 RELEASE announcement.
 
von_Gaden said:
I'm not brave enough to patch just one file

As I can see, patch for this problem includes several commits to the dozen of different files. So the task is much harder than patching one file.

von_Gaden said:
But as I guess I have no choice until the next Release...

That is why I am still on the 8-SECURITY branch. :-\

von_Gaden said:
On the topic: as far as I understand the problem is not hardware but ZFS related? Should it be mentioned in Errata or issued as official update to the affected release? The patch is issued long before 9-1 RELEASE announcement.

Yes, it's a software-only problem, but I saw reports that this problem occurs on UFS-only computers. But patch itself definitely touch a ZFS code in the kernel.

And the final commit for aforementioned PR was made on 21 Jan, almost month after 9.1 release.
 
My UFS-only servers (with or without softupdates journaling) and those with gjounaled gmirrors (usually /var) work really fine.

The patched file from the above mentioned mailing list posts is /usr/src/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c
Its SVN log for 9-Stable branch shows Revision 243484, Modified Sat Nov 24 12:42:29 2012 UTC (3 months, 1 week ago) by avg. In src I compiled I have exactly this revision.
I was surprised that the same file in RELENG-9.1 branch has Revision 239080, Modified Sun Aug 5 23:54:33 2012 UTC (6 months, 4 weeks ago) by kensmith
 
Back
Top