Router causes 10-second connection delays

Hornpipe2 · Nov 14, 2009

Hey all. I'm using FreeBSD 7.1-RELEASE-p8 on a PC under the desk as a combination Apache webserver, plus using ipf / ipnat as a router for my two-machine LAN. The webserver hosts a "fan site" running some continuous PHP scripts that poke the online game dragcave.net, and in addition the two desktop users may be playing the game - in effect, three machines (router + desktop + desktop) trying to access dragcave.net at once.

What seems to be happening is that one or two machines sometimes get a "monopoly" on connections. When this happens, the favored machine gets quick access to dragcave.net. The other users get hit with very slow HTTP connections (just about 10 seconds exactly) - each time they try to do an access. Effectively this stops us down to one user at a time, or the webserver, but not all three. Highly frustrating. (Interestingly, it seems to be affected by the OS used on each desktop... my Linux box always gets stuck with slow connects, the Windows one almost never, and the Mac was intermittent until recently when it got much worse.)

I don't believe this is a connection limit imposed by dragcave.net, as other users report successfully hosting up to six players at once from a single IP, each reloading frantically or doing things which hammered the server, and nobody experienced problems. I would like to try to pin this down if it is happening at my end, and I think the FreeBSD router is highly suspect.

Here is my complete ipnat.rules. I have a couple of specific forwarded ports but I don't think that will affect anything. I don't know where else to look to troubleshoot this... any advice would be appreciated.

Code:

# cat /etc/ipnat.rules
map sis0 192.168.1.0/24 -> 0/32 portmap tcp/udp auto
map sis0 192.168.1.0/24 -> 0/32
rdr sis0 0/0 port 51413 -> 192.168.1.253 port 51413 tcp/udp
rdr sis0 0/0 port 41203 -> 192.168.1.252 port 41203 tcp/udp
rdr sis0 0/0 port 41203 -> 192.168.1.252 port 23399 tcp/udp

Hornpipe2 · Nov 14, 2009

Bizarrely it only happens (maybe) when trying to make HTTP requests. My Linux desktop is almost never able to connect while the other desktop (Mac) is running - even telnet to port 80 is slow and never returns. But... here's traceroute output from the Linux machine (while still unable to make HTTP requests!) which shows a mere 70ms response time. No problems there.

Code:

traceroute to dragcave.net (64.251.19.28), 30 hops max, 60 byte packets
 1  192.168.1.1 (192.168.1.1)  0.116 ms  0.093 ms  0.095 ms
 2  10.113.0.1 (10.113.0.1)  18.118 ms  18.119 ms  18.096 ms
 3  24.144.0.193 (24.144.0.193)  18.102 ms  18.168 ms  18.147 ms
 4  68.88.181.181 (68.88.181.181)  18.666 ms  18.657 ms  19.467 ms
 5  bb1-p4-1.ltrkar.sbcglobal.net (151.164.242.204)  19.493 ms  19.979 ms *
 6  151.164.99.177 (151.164.99.177)  34.463 ms  41.249 ms  41.234 ms
 7  te7-2.ccr02.ord03.atlas.cogentco.com (154.54.11.237)  31.733 ms  25.312 ms  29.955 ms
 8  vl3491.ccr02.ord01.atlas.cogentco.com (154.54.6.209)  29.962 ms  29.988 ms te1-1.ccr02.ord01.atlas.cogentco.com (154.54.29.21)  29.930 ms
 9  te7-7.ccr02.atl01.atlas.cogentco.com (154.54.28.74)  56.332 ms  56.338 ms te4-8.ccr02.atl01.atlas.cogentco.com (154.54.29.109)  56.343 ms
10  154.54.30.30 (154.54.30.30)  69.291 ms te8-1.ccr01.mia01.atlas.cogentco.com (154.54.26.10)  69.284 ms te7-1.ccr01.mia01.atlas.cogentco.com (154.54.3.26)  69.177 ms
11  vl3512.na21.b015452-0.mia01.atlas.cogentco.com (66.250.14.182)  71.126 ms  71.034 ms  64.129 ms
12  Infolink.demarc.cogentco.com (38.112.4.126)  67.933 ms  67.951 ms  65.861 ms
13  ge2-edge.mia.infolink.com (64.251.0.66)  71.833 ms  71.813 ms  71.823 ms
14  dragcave.net (64.251.19.28)  70.798 ms  69.981 ms  71.657 ms

I've added

Code:

pass in quick all
pass out quick all

to the top of ipf.rules and flushed / reloaded the table... still no help there either. ARGH

Hornpipe2 · Nov 14, 2009

Sorry for spamming this thread but I want to keep people up-to-date on what's happening. Running netstat on the Linux box while attempting to connect shows...

Code:

tcp        0      1 neuromancer.local:51124 dragcave.net:www        SYN_SENT

which seems to imply that either the returned syn cookie has not made it back to the server yet, that it has not been properly routed, or... ???

Hornpipe2 · Nov 15, 2009

Used tcpdump yesterday on the outgoing interface of the router to try to see what was going on. As I expected, when my machine is booted into Linux, SYN packets get routed out to the site but don't ever seem to receive an ACK. I then rebooted to Windows and tried the exact same thing. This time SYN packets also left the router, but were met with near immediate ACK from the remote machine.

The Mac, which has the 10-second delay, also sits in SYN_SENT so I believe the problems all stem from the same root cause. I now think the issue is somewhere between my cable modem and the remote website.

I believe the next step is to call my ISP and see if they can help me troubleshoot.

Christopher · Nov 18, 2009

You aren't by chance using PPPoE to connect to your ISP, are you?

Hornpipe2 · Nov 18, 2009

No, not that I know of. Good suggestion though.

Here's the next twist on the issue: I ran tcpdump on the Mac, which generally gets through after around 10 seconds of waiting. Check out this log:

Code:

17:18:27.716021 IP 192.168.1.253.54002 > dragcave.net.http: S 3266557446:3266557446(0) win 65535 <mss 1460,nop,wscale 0,nop,nop,timestamp 674295824 0,sackOK,eol>
17:18:30.292637 IP 192.168.1.253.54002 > dragcave.net.http: S 3266557446:3266557446(0) win 65535 <mss 1460,nop,wscale 0,nop,nop,timestamp 674295829 0,sackOK,eol>
17:18:33.293068 IP 192.168.1.253.54002 > dragcave.net.http: S 3266557446:3266557446(0) win 65535 <mss 1460,nop,wscale 0,nop,nop,timestamp 674295835 0,sackOK,eol>
17:18:36.338745 IP 192.168.1.253.54002 > dragcave.net.http: S 3266557446:3266557446(0) win 65535 <mss 1460,sackOK,eol>
17:18:36.406663 IP dragcave.net.http > 192.168.1.253.54002: S 2792411854:2792411854(0) ack 3266557447 win 5840 <mss 1460,nop,nop,sackOK>
17:18:36.413414 IP 192.168.1.253.54002 > dragcave.net.http: . ack 1 win 65535
17:18:36.413457 IP 192.168.1.253.54002 > dragcave.net.http: P 1:802(801) ack 1 win 65535

Well that explains at least something: the Mac makes three attempts at connecting (once every 3 seconds), then a fourth using a slightly different SYN packet. The response to that comes back immediately. So, something in the chain between here and there is not happy about one format but is okay with the other. My guess is Ubuntu Linux never tries a "compatible" format, but Windows always does. Unsure about what FreeBSD is doing. Again, in times of no load, everything connects OK.

I had the suspicion that the remote machine is running Windows IIS or something and is therefore being stupidly strict about connections, but nmap -O seems pretty certain it's a Linux variant.

Christopher · Nov 19, 2009

Reading from the ipnat(5) man page (I use pf nat myself so I wanted to learn more about the syntax) I saw this block of text about using "portmap auto" with a /32.

WARNING: It is not advisable to use the auto feature if you are map'ing to a /32 (i.e. 0/32) because the NAT code will try to map multiple hosts to the same port number, outgoing and ultimately this will only succeed for one of them. The problem here is that the map directive tells the NAT code to use the next address/port pair available for an outgoing con- nection, resulting in no easily discernible relation between external addresses/ports and internal ones.

That looks like what might be happening here.

Hornpipe2 · Nov 19, 2009

Wow, that's really a strange warning. Especially since 99% of all IPNAT config guides online recommend the same three rules, using 0/32 to refer to local address. Good catch, it's buried way down there in the man pages.

I changed my ruleset to:
map sis0 192.168.1.0/24 -> 0/32 portmap tcp/udp 1024:65535
map sis0 192.168.1.0/24 -> 0/32
Unfortunately it didn't fix my problem. Back to the drawing board. I'm thinking of spamming the ipf mailing list with my question to see if anyone out there has had the same issue. There is an open issue about ICMP which kind of mimics my problem (apparently ICMP packets can't be portmapped, so only one machine at a time on LAN can ping a certain external site)...

Hornpipe2 · Nov 20, 2009

On a whim I bumped to 7.2-RELEASE-p4. Two reboots later and the problem is still here.