Yesterday (30/03) after updating a server (Dell PE 1950) several things started to break. The server had been rock-solid for months, and after downgrading to the 8-STABLE from 01-01-2010 everything returned to normal.
I'm not quite sure what's wrong as there are several things that break and dumps don't get written on panics.
What broke:
It's a mail server, running Communigate Pro (tried 5.1, 5.2 & 5.3) and accepting connections on both ipv4 and ipv6. The server uses v6 sockets for v4 addresses, like this:
Directly after the upgrade I noticed that connections *out* to other ipv4 mailservers were no longer succeeding and ipfw was seeing some weird packets:
Obviously 1.23.2.0 is not a local IP, so I checked lsof:
Somehow the server was trying to connect to an ipv4 address from an ipv6 address, where the ipv6 address apparently overflows ipv4 storage and ends up being '1.23.2.0'...
For comparison, this is what it looks like when it works:
At first I assumed this was a problem in the daemon, so I temporarily disabled ipv6 for CGatePro (the ipv6 is not yet added as MX anyway) and forced it to use ipv4 sockets. That worked, until I tried to reload the ipfw rules and hit a panic:
It should have dumped (device is defined) and rebooted, but it hung there.
When I rebooted it my rules file (/etc/ipfw.rules.sh) was truncated to zero. Assuming this was an ipfw problem I left the rules out temporarily (I have IPFIREWALL_DEFAULT_TO_ACCEPT).
After ~4 minutes the server hung again, this time with the screen filled with 'ipfw: ouch!, skip past end of rules, denying packet' messages, these were also logged to /var/log/messages. This seemed a bit weird as only the default 'allow-all' rule was present.
So I decided to recompile the kernel without ipfw and reboot. After again ~4m I got the following panic:
Again the server didn't reboot but just froze (no num-lock LED action either).
I think a change to 8-STABLE between 01/01/2010 and 30/03/2010 seriously broke something, but I have no idea of how to go about finding out what exactly. There's no dumps and no logging other than the 'ipfw: ouch!' message.
Any ideas? dmesg & kernconf @ http://ra.phid.ae/dmesg.txt
I'm not quite sure what's wrong as there are several things that break and dumps don't get written on panics.
What broke:
It's a mail server, running Communigate Pro (tried 5.1, 5.2 & 5.3) and accepting connections on both ipv4 and ipv6. The server uses v6 sockets for v4 addresses, like this:
Code:
CGServer 826 root 45u IPv6 0xffffff00174c8a50 0t0 TCP [2001:610:xxx:xxx:xxx:xxx:xxx:200]:smtp (LISTEN)
CGServer 826 root 47u IPv6 0xffffff0001faf370 0t0 TCP [::217.xxx.xxx.xxx]:smtp (LISTEN)
Directly after the upgrade I noticed that connections *out* to other ipv4 mailservers were no longer succeeding and ipfw was seeing some weird packets:
Code:
Mar 30 06:27:34 adinava kernel: ipfw: 65530 Accept TCP 1.23.2.0:28859 65.55.92.152:25 out via bce0
Obviously 1.23.2.0 is not a local IP, so I checked lsof:
Code:
CGServer 824 root 49u IPv6 0xffffff00174b3a50 0t0 TCP [2001:610:xxx:xxx:xxx:xxx:xxx:200]:28859->[::65.55.92.152]:smtp (SYN_SENT)
For comparison, this is what it looks like when it works:
Code:
CGServer 105 root 94u IPv6 0xffffff00a4ccd000 0t0 TCP [::217.195.117.200]:14532->[::65.55.92.152]:smtp (ESTABLISHED)
At first I assumed this was a problem in the daemon, so I temporarily disabled ipv6 for CGatePro (the ipv6 is not yet added as MX anyway) and forced it to use ipv4 sockets. That worked, until I tried to reload the ipfw rules and hit a panic:
Code:
Fatal trap 9: general protection fault while in kernel mode
cpuid = 0; apic id = 00
instruction pointer = 0x20:0xffffffff803e3b77
stack pointer = 0x28:0xffffff8076d73890
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 1467 (ipfw)
trap number = 9
panic: general protection fault
cpuid = 0
Uptime: 4m24s
Cannot dump. Device not defined or unavailable.
panic: bufwrite: buffer is not busy???
cpuid = 0
Uptime: 4m24s
Cannot dump. Device not defined or unavailable.
Automatic reboot in 15 seconds - press a key on the console to abort
Automatic reboot in 15 seconds - press a key on the console to abort
ipfw: ouch!, skip past end of rules, denying packet
It should have dumped (device is defined) and rebooted, but it hung there.
When I rebooted it my rules file (/etc/ipfw.rules.sh) was truncated to zero. Assuming this was an ipfw problem I left the rules out temporarily (I have IPFIREWALL_DEFAULT_TO_ACCEPT).
After ~4 minutes the server hung again, this time with the screen filled with 'ipfw: ouch!, skip past end of rules, denying packet' messages, these were also logged to /var/log/messages. This seemed a bit weird as only the default 'allow-all' rule was present.
So I decided to recompile the kernel without ipfw and reboot. After again ~4m I got the following panic:
Code:
dev = mfid0s1f, block = 1, fs = /var
panic: ffs_blkfree: freeing free block
cpuid = 3
Uptime: 4m34s
Cannot dump. Device not defined or unavailable.
Automatic reboot in 15 seconds - press a key on the console to abort
Again the server didn't reboot but just froze (no num-lock LED action either).
I think a change to 8-STABLE between 01/01/2010 and 30/03/2010 seriously broke something, but I have no idea of how to go about finding out what exactly. There's no dumps and no logging other than the 'ipfw: ouch!' message.
Any ideas? dmesg & kernconf @ http://ra.phid.ae/dmesg.txt