We've had a netboot setup in our co-location for some time now, it's not used often as it's mainly intended for new installs or rescue purposes. Last time I had to netboot something, I had no problems.
Tonight I'm seeing a box hang during the process of loading the kernel or the modules (we use mfsBSD, so ZFS, OpenSolaris, geom_uzip, and zlib kernel modules get loaded):
Note the serial console output is a bit garbled, I've come to expect this with most serial console redirection implementations. Just noting that I see similar junk on working setups.
The only way to recover here is to power cycle or reset. Even with a keyboard available locally the box appears locked up.
Another datapoint: we have older FreeBSD netboot NFS trees exported as well. The above is trying to boot 8.1. If I try to boot an 8.3 kernel, it doesn't even finish loading the kernel over NFS. The 8.1 kernel is a few megabytes smaller, which really makes me wonder if I'm exhausting some memory resource here.
The DHCP configuration is pretty simple, and the root path contains an mfsBSD mfsroot:
If I run tcpdump during the boot, I simply see the traffic stop. I believe the checksum errors shown here are just the result of the network card doing TX an RX checksum offloading. h11 is the NFS/TFTP/DHCP server, h21 is the host trying to netboot.
I have to dig around a bit to try another client since there's nothing there that I can just randomly pull out of service to test.
Tonight I'm seeing a box hang during the process of loading the kernel or the modules (we use mfsBSD, so ZFS, OpenSolaris, geom_uzip, and zlib kernel modules get loaded):
Code:
Intel(R) Boot Agent GE v1.2.28
Copyright (C) 1997-2005, Intel Corporation
CLIENT MAC ADDR: 00 E0 81 D0 15 85 GUID: 00000000 0000 0000 0000 000000000000
CLIENT IP: 10.0
Building the boot ler and the BTX
Star/boot/kernel/kernel text=0x63e133 data=0xc27a8+0xa3048 syms=[0x8+0xa8d68+0x8+0x9b5a0]
/boot/kernel/zfs.ko size 0x19eb18 at 0xae8000
loading required module 'opensolaris'
/boot/kernel/opensolaris.ko size 0x3868 at 0xc87000
/boot/kernel/geom_uzip.ko size 0x31d8 at 0xc8b000
loading required module 'zlib'
/boot/kernel/zlib.ko size 0xdc40 at 0xc8f000
/
Note the serial console output is a bit garbled, I've come to expect this with most serial console redirection implementations. Just noting that I see similar junk on working setups.
The only way to recover here is to power cycle or reset. Even with a keyboard available locally the box appears locked up.
Another datapoint: we have older FreeBSD netboot NFS trees exported as well. The above is trying to boot 8.1. If I try to boot an 8.3 kernel, it doesn't even finish loading the kernel over NFS. The 8.1 kernel is a few megabytes smaller, which really makes me wonder if I'm exhausting some memory resource here.
The DHCP configuration is pretty simple, and the root path contains an mfsBSD mfsroot:
Code:
host h21.i.xxx.com {
hardware ethernet 00:e0:81:d0:15:85;
fixed-address 10.99.88.121;
next-server 10.99.88.111;
filename "/freebsd83-64/boot/pxeboot";
option root-path "10.99.88.111:/tank1/exports/netboot/freebsd83-64";
}
If I run tcpdump during the boot, I simply see the traffic stop. I believe the checksum errors shown here are just the result of the network card doing TX an RX checksum offloading. h11 is the NFS/TFTP/DHCP server, h21 is the host trying to netboot.
Code:
h21.i.xxx.com.4031 > h11.i.xxx.com.nfs: 104 read [|nfs]
02:52:25.347065 IP (tos 0x0, ttl 64, id 4853, offset 0, flags [none], proto UDP (17), length 1180, bad cksum 0 (->9dae)!)
h11.i.xxx.com.nfs > h21.i.xxx.com.4031: reply ok 1152 read REG 555 ids 0/0 [|nfs]
02:52:25.349063 IP (tos 0x0, ttl 20, id 4204, offset 0, flags [none], proto UDP (17), length 132)
h21.i.xxx.com.4032 > h11.i.xxx.com.nfs: 104 read [|nfs]
02:52:25.349113 IP (tos 0x0, ttl 64, id 4854, offset 0, flags [none], proto UDP (17), length 1180, bad cksum 0 (->9dad)!)
h11.i.xxx.com.nfs > h21.i.xxx.com.4032: reply ok 1152 read REG 555 ids 0/0 [|nfs]
02:52:25.351111 IP (tos 0x0, ttl 20, id 4205, offset 0, flags [none], proto UDP (17), length 132)
h21.i.xxx.com.4033 > h11.i.xxx.com.nfs: 104 read [|nfs]
02:52:25.351162 IP (tos 0x0, ttl 64, id 4855, offset 0, flags [none], proto UDP (17), length 1180, bad cksum 0 (->9dac)!)
h11.i.xxx.com.nfs > h21.i.xxx.com.4033: reply ok 1152 read REG 555 ids 0/0 [|nfs]
I have to dig around a bit to try another client since there's nothing there that I can just randomly pull out of service to test.