Connections = disk write errors!

Hi there everyone!

First off, I posted this here as I couldn't quite find a place that best fit for it, I apologize if I made a mistake.

I have a dedicated server from a host in Denver, running FreeBSD 7.4 that runs an ircd, IRC services, an eggdrop, apache, inetd, sshd and Icecast. As Icecast begins receiving connections, the other apps are getting disk write "permission denied" errors and, in some cases, appear to be none-allowed disk access from that point on and eventually, the apps will crash! This includes IRC services, and even connections to the IRCD itself, the webserver, sshd, inetd and finally icecast itself will implode.

Although the issue begins at around 10 connections, it get progressively worse and worse as the connections increase; at around 260 connections, everything pukes and the server locks everything OUT- including SSH! The only remedy to this is to REBOOT the server to restore disk access again; that is, until icecast starts receiving connections again!

I have gone through all the tunables and tweaks that I can find.. I have had the host run tests on the hardware and they, too, can't figure it out, and then recommended that they replace the entire machine.. Which they did.. and lone behold.. the problem is STILL there.

I don't know how to describe it- its like a "memory leak, but for descriptors" I'm not even sure Icecast itself is the blame, however that surely aggravates the problem REAL QUICK.

Here's some info: (note the server is not running anything except sshd)

The Server itself is a Intel E800 (3 Ghz) with 4 GBs RAM and a 1 TB HDD, connected with 100 Base T (ethernet) to the net, Dedicated.

OS: FreeBSD 7.4-RELEASE (GENERIC) #0: Fri Feb 18 01:55:22 UTC 2011
Code:
Resource limits (current):
  cputime          infinity secs
  filesize         infinity kB
  datasize-cur       131072 kB
  stacksize-cur        8192 kB
  coredumpsize     infinity kB
  memoryuse-cur     3994576 kB
  memorylocked-cur  1331525 kB
  maxprocesses         5547
  openfiles           11095
  sbsize           infinity bytes
  vmemoryuse       infinity kB

top:
Code:
last pid:  4080;  load averages:  0.00,  0.00,  0.00                        up 0+22:42:57  00:57:40
12 processes:  1 running, 11 sleeping
CPU:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 6992K Active, 59M Inact, 297M Wired, 124K Cache, 417M Buf, 3559M Free

tunables:
Code:
security.bsd.see_other_uids=0
net.inet.ip.fw.enable=1
net.inet.ip.fw.verbose=0
net.inet.ip.fw.verbose_limit=1
net.inet.ip.fw.dyn_short_lifetime=5
net.inet.ip.fw.dyn_udp_lifetime=5
net.inet.ip.fw.dyn_rst_lifetime=1
net.inet.ip.fw.dyn_fin_lifetime=1
net.inet.ip.fw.dyn_syn_lifetime=5
net.inet.ip.fw.dyn_ack_lifetime=10

net.inet.tcp.nolocaltimewait=1
net.inet.tcp.msl=5000
net.inet.tcp.delayed_ack=0
net.inet.tcp.finwait2_timeout=5
net.inet.tcp.fast_finwait2_recycle=1
net.inet.tcp.blackhole=2
net.inet.udp.blackhole=1

net.inet.ip.fastforwarding=1
net.inet.ip.redirect=0
net.inet.ip.random_id=1
net.inet.ip.portrange.first=2048
net.inet.ip.portrange.last=63500
net.inet.ip.portrange.randomized=0

net.inet.icmp.icmplim=2000
net.inet.icmp.icmplim_output=0

kern.ipc.somaxconn=32500
kern.maxfiles=36958
kern.maxfilesperproc=18450
kern.coredump=0


ipfw:
Code:
/sbin/ipfw -q -f flush
/sbin/ipfw -q -f flush

/sbin/ipfw zero
/sbin/ipfw zero

/sbin/ipfw add 2 check-state
/sbin/ipfw add 4 check-state

/sbin/ipfw add 6 unreach 255 tcp from any to any tcpflags fin,psh,urg recv any
/sbin/ipfw add 8 deny tcp from any to any tcpflags fin,psh,urg recv any

/sbin/ipfw add 10 unreach 255 tcp from any to any tcpflags !fin,!syn,!rst,!psh,!ack,!urg recv any
/sbin/ipfw add 12 deny tcp from any to any tcpflags !fin,!syn,!rst,!psh,!ack,!urg recv any

/sbin/ipfw add 14 unreach 255 tcp from any to any tcpflags syn,fin recv any
/sbin/ipfw add 16 deny tcp from any to any tcpflags syn,fin recv any

/sbin/ipfw add 18 unreach 255 tcp from any to any tcpflags fin,rst recv any
/sbin/ipfw add 20 deny tcp from any to any tcpflags fin,rst recv any

/sbin/ipfw add 22 unreach 255 tcp from any to any ipoptions ssrr,lsrr,rr,ts recv any
/sbin/ipfw add 24 deny tcp from any to any ipoptions ssrr,lsrr,rr,ts recv any

/sbin/ipfw add 26 unreach 255 tcp from any to any tcpflags ack,rst recv any
/sbin/ipfw add 28 deny tcp from any to any tcpflags ack,rst recv any

/sbin/ipfw add 30 unreach 255 icmp from any to any via any
/sbin/ipfw add 32 deny icmp from any to any via any

### /sbin/ipfw add 34 unreach 255 all from any to any frag via any
### /sbin/ipfw add 36 deny all from any to any frag via any

/sbin/ipfw add 38 unreach 255 tcp from any to any established via any
/sbin/ipfw add 40 deny tcp from any to any established via any

/sbin/ipfw add 42 pass tcp from me to (DNS private IP) 53 out via re0 setup limit dst-addr 100
/sbin/ipfw add 44 pass udp from me to (DNS Private IP) 53 out via re0 limit dst-addr 100

/sbin/ipfw add 46 pass tcp from (private IP) to me 22 in via re0 setup limit src-addr 5

/sbin/ipfw add 48 pass udp from me to any 123 out via re0 limit dst-addr 2

/sbin/ipfw add 1000 pass tcp from any to any 80 in via re0 setup limit src-addr 5


/sbin/ipfw add 2000 pass tcp from any to me 843 in via re0 setup limit src-addr 5
/sbin/ipfw add 2002 pass tcp from any to me 6667 in via re0 setup limit src-addr 5
/sbin/ipfw add 2004 pass tcp from any to me 6697 in via re0 setup limit src-addr 5

/sbin/ipfw add 3000 pass tcp from 127.0.0.1 to 127.0.0.1 6670 out via lo0 keep-state
/sbin/ipfw add 3002 pass tcp from 127.0.0.1 to 127.0.0.1 8000 out via lo0 keep-state
/sbin/ipfw add 3004 pass tcp from me to (private IP) 10000 out via re0 limit dst-addr 5

/sbin/ipfw add 5000 pass tcp from (private IP) to me 8000 in via re0 setup limit src-addr 5

/sbin/ipfw add 5004 pass tcp from any to me 8000 in via re0 setup limit src-addr 3
/sbin/ipfw add 5006 pass tcp from any to me 8001 in via re0 setup limit src-addr 3

/sbin/ipfw add 5008 pass tcp from 127.0.0.1 to 127.0.0.1 8000 in via lo0 setup keep-state
/sbin/ipfw add 5010 pass tcp from me to any 80 out via re0 limit dst-addr 5

/sbin/ipfw add 5008 pass tcp from 127.0.0.1 to 127.0.0.1 8000 in via lo0 setup keep-state
/sbin/ipfw add 5010 pass tcp from me to any 80 out via re0 limit dst-addr 5


/sbin/ipfw add 65511 unreach 255 tcp from any to me via any
/sbin/ipfw add 65512 unreach 255 udp from any to me via any
/sbin/ipfw add 65513 unreach 255 icmp from any to me via any

/sbin/ipfw add 65514 unreach 255 tcp6 from any to me via any
/sbin/ipfw add 65515 unreach 255 udp6 from any to me via any
/sbin/ipfw add 65516 unreach 255 icmp6 from any to me via any


/sbin/ipfw add 65517 unreach 255 tcp from me to any via any
/sbin/ipfw add 65518 unreach 255 udp from me to any via any
/sbin/ipfw add 65519 unreach 255 icmp from me to any via any

/sbin/ipfw add 65520 unreach 255 tcp6 from me to any via any
/sbin/ipfw add 65521 unreach 255 udp6 from me to any via any
/sbin/ipfw add 65522 unreach 255 icmp6 from me to any via any


/sbin/ipfw add 65523 unreach 255 tcp from any to any via any
/sbin/ipfw add 65524 unreach 255 udp from any to any via any
/sbin/ipfw add 65525 unreach 255 icmp from any to any via any

/sbin/ipfw add 65526 unreach 255 tcp6 from any to any via any
/sbin/ipfw add 65527 unreach 255 udp6 from any to any via any
/sbin/ipfw add 65528 unreach 255 icmp6 from any to any via any

/sbin/ipfw add 65529 deny ip from any to any via any
/sbin/ipfw add 65530 deny ip via any

/sbin/ipfw add 65531 deny ip6 from any to any via any
/sbin/ipfw add 65532 deny ip6 via any

/sbin/ipfw add 65533 deny any via any
/sbin/ipfw add 65534 deny any via any

I'm not sure what else to include..

But this thing sure has me PUZZLED.. I have ran FreeBSD for years, but only RECENTLY has this issue cropped up on the last 2 hosts...

ANY help would be appreciated!

-DjZ-
 
hi again,

sorry if I sound "stand-off"ish (I can't really word inflections very well in a forum hehe), but, I'm finding it hard to believe that no-one, in this forum, hasn't any response or suggestion about this issue, as I would think its a VERY important issue being that if FreeBSD can actually process that many connections or not.

this may be a huge BUG in FreeBSD!

I have used FreeBSD for years and its my *nix OS of choice over Linux or whatnot for its robustness and its ability to handle said connections.. but this latest test really makes it FALL FLAT on its FACE! :(

Or, perhaps, when the moderators edited and then moved this post to its proper forum (to that Moderator: thanks for fixing the post), perhaps they forgot to set the "new post" flag on my message and, therefore, it has not been noticed or its been, simply, routed to a dead forum. :(

-DjZ-
:( :(
 
Maybe a few details would help... I ran 7.4-STABLE until last month, and never had problems like that, but I never ran Icecast, IRC, or eggdrop services.

Is the system running on real hardware, or is it a virtual machine? Details might help somebody spot something. Motherboard? Disk? Swap? NICs?

Is it really 7.4-RELEASE, or is it patched with freebsd-update? I guess it doesn't have the bug fixes from 7.4-STABLE?

What are you planning to do after February 28, 2013 when FreeBSD-7.x goes EOL?
 
hi!

and thanks for the replies!




I tried to be as detailed as I could without writing a book hehe

its a dedicated machine with real hardware, not virtual..

Wblock@:

I will try (again) to run it with sysctl.conf completely commented out..

I did this before, but ran into other issues that, at the time, did point to defective hardware that, now, has been replaced. I wanted to wait for "after the holidays" to try again...


Tingo:

its hard to say if it runs differently with/without Icecast, I have seen this happen when the eggdrop was "busy" with connections as well (but at the time, I thought it was something else) but Icecast sure make it happen a LOT faster..


Uniballer:

it is 7.4-RELEASE straight from the install (not a patch up)

I wish I can update to beyond 7.4 but I couldn't get anything to run on 8x and I couldn't even get 9x to install (kept saying the loader was missing/corrupted; I assume it wasn't compatible with the equipment)

-DjZ-
 
An ugly hardware problem like issues with say, receiving ethernet packets while reading the disk causing corrupted disk reads would be really hard to diagnose and quite fatal. It might show up very much like what you are describing.

It concerns me greatly that you can't boot newer versions on this hardware.
 
hi everyone!

Uniballer said:
An ugly hardware problem like issues with say, receiving ethernet packets while reading the disk causing corrupted disk reads would be really hard to diagnose and quite fatal. It might show up very much like what you are describing.

quite FATAL indeed!

UPDATE!

Uniballer:
I'm afraid I have "munched ideas" here, in my previous post, and therefore, confused the issue a little about "installing newer versions".. and I apologize..

What I meant/should have said, was, that since I have been unable to install and/or get working properly, anything newer than 7.4 on a local piece of hardware here, while NOT referring to the hardware that's currently located at the datacenter.. in actuality, I did get 8.3 to install and operate, but I couldn't get my apps to run on it (because the ports tree was corrupted/disabled) and 9.0 would install, but wouldn't boot, with an error stating: "loader missing".. I was later told that might been because I was trying to install it from a actual 'burned' CD and not a "virtual" environment or that the hardware I was trying to install it on doesn't support that new UEFI thingy.

but I have some good news and bad news

Since my last visit here, and after reading what you said, I have instructed the host techs to "run a full hardware diagnostics" and then to "reinstall" and REPLACE 7.4 with 9.0 which took a day for them to do.. and then another day for me to find and install all the packages and things for all my apps (not to mention, building a new security model) for 9.0 and I have to say, that, I'm quite PLEASED with how it all turned out]http://www.warp-radio.com[/url] and these are the very apps that are effected by this issue (includeing the website itself) ... (the video portion is a separate service and isn't effected by this issue) the tests will begin around 6pm CST

-DjZ-
:) :)
 
UPDATE: (again)

the tests FAILED!

in fact, the whole entire thing failed..

I was unable to establish a connection from the studio to Icecast for some reason; I became extremely bothered and just "walked away" for the day..

I will try again today to see if I can get this thing back up and running... I really don't know WHAT (or WHY) would cause complete failure from a specific IP to a specific PORT, even though other tests showed that the connection was working just FINE; I suspect that Icecast might not be fully-compatible with 9.0.

I have recompiled everything, too, only the cfg files were imported over.

more on this later... (sigh)

-DjZ-
 
hi..

Tonight, I was able to log into the system and present a broadcast...

however, I DID confirm the issue has returned; as the number of connections goes up, so do disk/descriptor access errors.. resulting in connections being dropped (by the kernel?) and servives/apps crashing from their disk/stack access being revoked internally.

though with 9.0, is takes a considerable amount of load before it happens, but once this occurs, the connections DO NOT COME BACK.. once the descriptors are revoked, they are GONE..

I gather that the problem happens early on.. but seems to rely on a specific number of "dead" descriptors before it becomes appreciately noticable..

and I'm not even sure WHAT would cause this...

-DjZ-
 
sorry, let me re-word this..

it seems that once a descriptor is "revoked", its gone.. and if enough of them are gone, then it becomes noticeable; where connections in the ircd, webserver, inetd, ipfw, bots, and finally the audio server itself starts being effected and unstable.. if enough of these descriptors are dropped, then the services/apps themselves will crash .. and (I conclude that) eventually, the box itself may crash.. (though I haven't been able to drive the issue to that point since at some point, prior, all my connections to the box are lost and can't be recovered).

DjZ
 
Please execute this command and post back the results (example values from my system attached):
# sysctl kern. | grep files
kern.maxfiles: 32808
kern.maxfilesperproc: 29527
kern.openfiles: 278


Preferably when the system is under load/having problems, if possible.

The value of kern.maxproc may also be of interest.
 
hi there Savagedlight here is as you requested: :)

Code:
kern.maxfiles: 36958
kern.maxfilesperproc: 18450
kern.openfiles: 110
kern.maxproc: 6164

I have just ran another test and it all puked (failed)

sigh...

DjZ
 
Just to give you all an update:

This issue has now destroyed THREE drives!

Can this be the result of a rootkit or hacker (gaining access from use of a rootkit) and somehow trying to "virtualize" the box in such a way so that I couldn't detect it?

In any case, this issue has officially claimed the life of three seperate hard drives, in a row.

-DjZ-
 
dj-zath said:
this may be a huge BUG in FreeBSD!

I've run 7.4 for several years as an SMTP relay + HTTP proxy for about 500 users and not run into any issues like this.

edit:
I had no tuning done to the box.

Not saying it isn't potentially a bug with some combination of hardware or software, but I suspect it is something peculiar to your setup. Which is why others haven't had much to suggest or posted "me too!".
 
Back
Top