Cannot power off by software or push buttons - Investigating why

Not wishing to hijack this other post:
I started this one...

I have a FreeBSD 15.0-RELEASE-p5 host that has an ASUS N3150I-C motherboard in a 1U rackmount case. The PSU has about six months of use from new. The 'server' runs four thick jails just fine. However, I have noticed that when I need to reboot it after for a freebsd-update install or any other reason after three days or more of running it no longer powers off with the 'poweroff' command or responds to the physical power on/off and rest buttons on the 1U case. The last output from the shutdown command executed by poweroff is about bridge0. To power off the machine the powerlead has to be pulled out of the PSU. The first restart afterwards requires saving settings in UEFI and that then confirms no settings have been changed. The server boot fines and runs normally. After all jails have started I can run poweroff and get the expected result. The physical buttons on the case also work for now.

I looked in /var/log/messages and found this:

Code:
kernel: bridge0: WARNING: Adding member interface re0 which has an IP address assigned is deprecated and will be unsupported in a future release.

So, my thoughts are, fix the known issue, then investigate further.
1. Get the bridge for vnet jails running in the preferred non-deprecated config
2. Try poweroff after four days, then five, then six etc to see if a pattern emerges with what runs from /etc/periodic

Starting with the bridge configuration​

My /etc/rc.conf has this:

Code:
cloned_interfaces=bridge0
ifconfig_bridge0="addm re0 up"
ifconfig_re0="-lro SYNCDHCP"

I tried this, to see if that cures the problem, but it wouldn't get an IPv4 address from DHCP:

Code:
cloned_interfaces=bridge0
ifconfig_bridge0="-lro SYNCDHCP addm re0"

So I tried the following. The static IP address was bound to bridge0, but bridge0 did not like the defaultrouter despite it being the correct one.
My jails should all DHCP from static reservations, but they refused to start as they couldn't find the DHCP server

Code:
cloned_interfaces=bridge0
ifconfig_bridge0="inet 10.26.4.10/24 addm re0"
defaultrouter="10.26.4.254"

Using the handbook instructions 34.8.1
This doesn't work either:

Code:
cloned_interfaces=bridge0
ifconfig_bridge0="addm re0 up"
ifconfig_re0="up"
ifconfig_bridge0="DHCP"

I have put it back to how it was.

If you have a known working FreeBSD 15.0-RELEASE bridge configuration for vnet jails that can DHCP, please post it here.
 
I will try:

Code:
cloned_interfaces=bridge0
ifconfig_bridge0="inet 10.26.4.10 netmask 255.255.255.0 addm re0 up"
defaultrouter="10.26.4.254"
 
That last config didn't work. I cannot ping the default gateway 10.26.4.254 from the host with the bridge IP 10.26.4.10 .

The config below does work, I can ping the default gateway, but it is also deprecated as the IP is assigned to the interface not the bridge:

Code:
cloned_interfaces=bridge0
ifconfig_bridge0="addm re0 up"
ifconfig_re0="inet 10.26.4.10 netmask 255.255.255.0"
defaultrouter="10.26.4.254"

Does anyone have a running config for FreeBSD 15.0-RELEASE that has IP assigned to the bridge and also has vnet jails on the host that can DHCP their IP addresses?
 
Does anyone have a running config for FreeBSD 15.0-RELEASE that has IP assigned to the bridge and also has vnet jails on the host that can DHCP their IP addresses?

The following test setup works for me. The servers, hosts and guests are configured as simple as possible, no firewall, all running in bhyve(8) guests.

The jail(8) host and the jail guest get their IP lease from another VM, running net/dhcpd.

DHCP server: dhcpd.conf
Code:
subnet 192.168.2.0 netmask 255.255.255.0 {
   range 192.168.2.20 192.168.2.30 ;
   option subnet-mask 255.255.255.0 ;
...

Jail host: /etc/rc.conf, configured according to handbook 7.5.3. Creating a VNET Jail:
Code:
defaultrouter="192.168.2.1"

cloned_interfaces="bridge0"
ifconfig_bridge0="SYNCDHCP addm vtnet0 up"
ifconfig_vtnet0="up"
No IP configured for vtnet0.

The jail configuration is copied from handbook 7.5.3. Creating a VNET Jail, green and red highlighted lines are edited.

/etc/jail.conf
Rich (BB code):
vnet {
  # STARTUP/LOGGING
  exec.consolelog = "/var/log/jail_console_${name}.log";

  # PERMISSIONS
  allow.raw_sockets;
  exec.clean;
  mount.devfs;
  devfs_ruleset = 6;

  # PATH/HOSTNAME
  path = "/Jails/15.0R-V1";
  host.hostname = "${name}";

  # VNET/VIMAGE
  vnet;
  vnet.interface = "${epair}b";

  # NETWORKS/INTERFACES
  $id = "154"; 
  #$ip = "192.168.2.${id}/24";
  $gateway = "192.168.2.1";
  $bridge = "bridge0"; 
  $epair = "epair${id}";

  # ADD TO bridge INTERFACE
  exec.prestart  = "/sbin/ifconfig ${epair} create up";
  exec.prestart += "/sbin/ifconfig ${epair}a up descr jail:${name}";
  exec.prestart += "/sbin/ifconfig ${bridge} addm ${epair}a up";
  #exec.start    += "/sbin/ifconfig ${epair}b ${ip} up";
  exec.start    += "/sbin/dhclient ${epair}b";
  exec.start    += "/sbin/ifconfig ${epair}b up";
  exec.start    += "/sbin/route add default ${gateway}";
  exec.start    += "/bin/sh /etc/rc";
  exec.stop     = "/bin/sh /etc/rc.shutdown";
  exec.poststop = "/sbin/ifconfig ${bridge} deletem ${epair}a";
  exec.poststop += "/sbin/ifconfig ${epair}a destroy";
}

Jail host: /etc/devfs.rules
Rich (BB code):
[devfsrules_jail_vnet=6]
add include $devfsrules_hide_all
add include $devfsrules_unhide_basic
add include $devfsrules_unhide_login
add include $devfsrules_jail
add path pf unhide
add path bpf* unhide
# without /dev/bpf* no dhcp lease of 'exec.start    += "/sbin/dhclient ${epair}b" '

Jail host, running "vnet" jail:
Rich (BB code):
root@BHY-Jails:~ # ifconfig
vtnet0: flags=1008b43<UP,BROADCAST,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
        options=880028<VLAN_MTU,JUMBO_MTU,LINKSTATE,HWSTATS>
        ether 58:9c:fc:0a:af:c8
        media: Ethernet autoselect (10Gbase-T <full-duplex>)
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
lo0: flags=1008049<UP,LOOPBACK,RUNNING,MULTICAST,LOWER_UP> metric 0 mtu 16384
        options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
        inet 127.0.0.1 netmask 0xff000000
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2
        groups: lo
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
bridge0: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
        options=10<VLAN_HWTAGGING>
        ether 58:9c:fc:10:cd:f1
        inet 192.168.2.22 netmask 0xffffff00 broadcast 192.168.2.255
        id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
        maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
        root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
        bridge flags=0<>
        member: epair154a flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                port 4 priority 128 path cost 2000 vlan protocol 802.1q
        member: vtnet0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                port 1 priority 128 path cost 2000 vlan protocol 802.1q
        groups: bridge
        nd6 options=9<PERFORMNUD,IFDISABLED>
epair154a: flags=1008943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
        description: jail:vnet
        options=200009<RXCSUM,VLAN_MTU,RXCSUM_IPV6>
        ether 58:9c:fc:10:d9:b3
        groups: epair
        media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        
root@BHY-Jails:~ # netstat -rn4
 Routing tables

Internet:
Destination        Gateway            Flags         Netif Expire
default            192.168.2.1        UGS         bridge0
127.0.0.1          link#2             UH              lo0
192.168.2.0/24     link#3             U           bridge0
192.168.2.22       link#2             UHS             lo0

Jail "vnet":
Rich (BB code):
root@vnet:~ # ifconfig
 lo0: flags=1008049<UP,LOOPBACK,RUNNING,MULTICAST,LOWER_UP> metric 0 mtu 16384
        options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
        inet 127.0.0.1 netmask 0xff000000
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x6
        groups: lo
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
epair154b: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
        options=200009<RXCSUM,VLAN_MTU,RXCSUM_IPV6>
        ether 58:9c:fc:10:b6:e9
        inet 192.168.2.26 netmask 0xffffff00 broadcast 192.168.2.255
        groups: epair
        media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        
root@vnet:~ # netstat -rn4
 Routing tables

Internet:
Destination        Gateway            Flags         Netif Expire
default            192.168.2.1        UGS       epair154b
127.0.0.1          link#6             UH              lo0
192.168.2.0/24     link#5             U         epair154b
192.168.2.26       link#6             UHS             lo0

root@vnet:~ # ping -c2 freebsd.org
PING freebsd.org (96.47.72.84): 56 data bytes
64 bytes from 96.47.72.84: icmp_seq=0 ttl=47 time=165.989 ms
64 bytes from 96.47.72.84: icmp_seq=1 ttl=47 time=164.807 ms

--- freebsd.org ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 164.807/165.398/165.989/0.591 ms
 
  • Thanks
Reactions: vmb
It's very unlikely that a simple warning concerning the network configuration prevents to poweroff your machine. That said, it's better to address this problem because it will bite you with the future FreeBSD versions (hence the warning).

That last config didn't work. I cannot ping the default gateway 10.26.4.254 from the host with the bridge IP 10.26.4.10 .
It should work, yet. What ifconfig says? Look also at your routing tables ( netstat -r).
 
I found a configuration for assigning an IP address to the bridge that works for my network.

My machine is connected to a switched network that has VLANs. The ethernet port that the host is connected to has VLANs assigned and the host uses the untagged VLAN present on that port. This is not a problem when the IPv4 is assigned to the interface. The network backbone has Link Aggregated connections and Spanning Tree is enabled to detect accidental loops between the switches.

Despite using an untagged VLAN with this host, if VLANs are enabled in /boot/loader.conf
Code:
if_vlan_load="YES"

Bringing up the physical interface in /etc/rc.conf enables the bridge to acquire an IPv4 address via DHCP
Code:
cloned_interfaces=bridge0
ifconfig_bridge0="SYNCDHCP addm re0 up"
ifconfig_re0="up"

Ifconfig now lists all of the VLANs present on the host interface, but DHCP works untagged, just as it did before. All vnet jails can DHCP again too.

I use static reservations for all of my DHCP clients and I had to change the jail host's reservation from the interface MAC address to a new 'FreeBSD' MAC address belonging to bridge0.

I am not sure if I need to enable STP on the host's bridge0, it works OK without it.
 
Rebooting and powering off by software and hardware switches has been working OK while trying to fix the bridge IP problem. Now that this has been achieved I will reboot in 2 days, 4 days, 8 days, 16 days to see if the machine still has a problem powering off. Previously, when a software poweroff was issued, the shutdown process would stop after outputting to the screen that bridge0 had been taken down.
 
I missed the 4-day power off interval. I found that executing 'poweroff' or 'reboot' has the desired effect after 1 or 2 days. However, after 8-days of up time the machine gets stuck in the shutdown process, with the console, network, poweroff and reboot buttons all unresponsive. Cutting the power is the only way to switch this machine off.

It did not happen on 12.x, 13.x or 14.x. I believe it is a new fault since installing 15.0.

I have run iocage with vnet jails on this machine for years with the iocage documented configuration of having the IP address assigned to the interface, not the bridge. I have managed to get the vnet jails running acceptably with the host's IP address assigned to the bridge instead of the interface. Sadly, I noticed a serious problem yesterday with my zrepl backups failing to complete.

I have been migrating jails from another host to this one so that the other host can be reinstalled with 15.0 . I have cleared the bookmarks and holds before using zfs send/recv so the transferred dataset appear to zrepl as new jails. Consequently, when zrepl runs it wants to do a full synchronisation of those datasets. The jail datasets are between 30GB and 250GB in size. I noticed that they were failing with between 1GB to 4GB transferred.

I started a ping on the destination (zrepl sink) back to the sender and it was soon apparent that the host with IP assigned to the bridge could not deal with sustained outbound traffic. The pings stop, an error is displayed on both consoles saying that the connection is broken. My SNMP console reports that the iocage host is down, not the zrepl sink.

I have put the configuration back to interface assigned IP and zrepl is working fine again. All of the recently transferred jail datasets have successfully replicated to the zrepl sink in one operation, approx 400GB with no glitches.

This could be an age related fault for the equipment, but that doesn't explain why putting the configuration back to interface assigned IP fixes the data transfer under load problem and the mysterious disconnection. Restarting netif does not restore the network connection, a reboot is necessary to do this.

This could be a Realtek driver issue. I am using the FreeBSD kernel source driver. If it is the driver, it works well with interface assigned IP, but not under load with bridge assigned IP.

The machine is passively cooled in a 1U case. The only other adapter choice I have right now is another Realtek device, a M.2 A+E key 2.5Gbps RTL8125B which can be plugged into the unused wifi adapter slot. I have never tried this NIC with FreeBSD.

From dmesg, Realtek NIC on motherboard:
Code:
re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port 0xe000-0xe0ff mem 0x91204000-0x91204fff,0x91200000-0x91203fff irq 18 at device 0.0 on pci1
re0: Using 1 MSI-X message
re0: ASPM disabled
re0: Chip rev. 0x54000000
re0: MAC rev. 0x00100000
miibus0: <MII bus> on re0
re0: Using defaults for TSO: 65518/35/2048
re0: Ethernet address: 2c:56:dc:78:ee:73
re0: netmap queues/slots: TX 1/256, RX 1/256

from ifconfig (IPv6 ommitted, one vnet jail listed for brevity)
Code:
re0: flags=1008943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
    options=8209b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,LINKSTATE>
    ether 2c:56:dc:78:ee:73
    inet 10.26.4.10 netmask 0xffffff00 broadcast 10.26.4.255

bridge0: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
    options=10<VLAN_HWTAGGING>
    ether 58:9c:fc:10:c6:87
    id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
    maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
    root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
    bridge flags=0<>
    member: vnet0.3 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
            port 11 priority 128 path cost 2000 vlan protocol 802.1q

I am just going to ignore the deprecation warning for interface assigned IP for the time being. I will put my iocage jail hosts back to interface assigned IP as I know this works under load.
 
responds to the physical power on/off and rest buttons on the 1U case.
Wait, what? It doesn't respond to the reset button?!? Doesn't matter how hard the software might lock up, a reset is hard-wired. This isn't some software interrupt that could be masked or ignored.
 
The physical switches for power on/off and reset are momentary contact push button switches. They are connected to the motherboard by jumper cables which are then connected to a microcontroller on the motherboard which can be controlled by ACPI to allow software to power off the machine. It also allows the BIOS/UEFI to switch it on via a RTC alarm setting.

It could be a hardware fault. I might connect my oscilloscope to pin 16 (PS-On) on the ATX power connector to monitor it going high. It has to go high to power off the PSU, ground to switch on. It's just awkward to do on this machine.

I will disable power management (powerd) for the time being to see if that makes any difference in the next 4-8 days.

I don't have any identical spare N3150I-C motherboards at the moment to swap out. I have a similar 2-core version N3050I-C that I could set up as an experimental machine to see if the power off behavior travels.
 
Back
Top