tcpdump analysis

hi,

having difficulty receiving data from some hosts, and not others. Not sure what the issue may be. but its either my FBSD server or the current DSL modem configuration. Tech support at my ISP configures the modem in the way that they support, and I dont have the same connectivity issues. Problem solved in their opinion. But their configuration doesn't work with my LAN. Its a replacement modem, refurbished. A power outage precipitated my difficulties. Even though the modem was plugged into my APC UPS it fried/died somehow.

Nevertheless, they can get their modem to work. And Im left to believe my server is malfunctioning somehow - despite no changes in configuration in the days preceding the power outage.

Im relatively sure my ipfw/nat firewall is not the issue. I disabled it, I ran an open firewall, i re-wrote the rule set, and ran the same rule set which existed on the days preceding the outage. Behavior the same for all scenarios. Nothing suspiciously denied. Nothing logged to /var/log/security, debug, or messages.

Client browsers on my LAN, and on my server, all connect to the problem hosts. But then spin until timeout waiting for data.

I dumped a working site, and a non-working site. Maybe the contrast in data points to a problem. But I could really use some help interpreting the data. I manually added some comments to the dump of non-working site after the fact.

The non-working sites appear dead. Though tcping and tcptraceroute, plus tcpdump and dynamic rules appear to show the host is up and i am connected. Just not receiving data apparently.

Regarding the working sites: they're servicable, but not perfect. Connections to image servers, 3rd-party ad servers sometimes hang. Its not perfect, but doesnt otherwise impair the functionality of the site.

working site attachment forums.txt

non-working site netsltn.txt


Any insight greatly appreciated. Thanks in advance
 

Attachments

Try running the second dump with -XXXX added, so you can see what's inside. For the example below I used [cmd=]tcpdump -XXXX -s 0 -pnli <nic> host <webhost>[/cmd].

The first 'P' after the three-way handshake should be the HTTP GET request.

Example:

Code:
[B]Handshake intiate:[/B]
15:19:09.277404 IP myip.65446 > 194.109.6.92.80: [B][color="Red"]S[/color][/B] 3673583399:3673583399(0) win 65535 <mss 1460,nop,wscale 3,sackOK,timestamp 1553748665 0>
        0x0000:  000b 23fd a966 0016 7627 48a8 0800 4500  ..#..f..v'H...E.
        0x0010:  003c 0af7 4000 4006 fb35 c27b a94a c26d  .<..@.@..5.{.J.m
        0x0020:  065c ffa6 0050 daf6 6f27 0000 0000 a002  .\...P..o'......
        0x0030:  ffff 1a0a 0000 0204 05b4 0103 0303 0402  ................
        0x0040:  080a 5c9c 52b9 0000 0000                 ..\.R.....

[B]Handshake ack'ed[/B]
15:19:09.292952 IP 194.109.6.92.80 > myip.65446: [B][color="Red"]S[/color][/B] 4046684236:4046684236(0) [B][color="Red"]ack[/color][/B] 3673583400 win 65535 <mss 1460,nop,wscale 1,nop,nop,timestamp 949621547 1553748665,sackOK,eol>
        0x0000:  0016 7627 48a8 000b 23fd a966 0800 4500  ..v'H...#..f..E.
        0x0010:  0040 24b8 4000 3806 e970 c26d 065c c27b  .@$.@.8..p.m.\.{
        0x0020:  a94a 0050 ffa6 f133 804c daf6 6f28 b012  .J.P...3.L..o(..
        0x0030:  ffff 4bb0 0000 0204 05b4 0103 0301 0101  ..K.............
        0x0040:  080a 389a 132b 5c9c 52b9 0402 0000       ..8..+\.R.....

[B]Three-way handshake ack:[/B]
15:19:09.293004 IP myip.65446 > 194.109.6.92.80: . [B][color="Red"]ack[/color][/B] 1 win 8326 <nop,nop,timestamp 1553748680 949621547>
        0x0000:  000b 23fd a966 0016 7627 48a8 0800 4500  ..#..f..v'H...E.
        0x0010:  0034 0af8 4000 4006 fb3c c27b a94a c26d  .4..@.@..<.{.J.m
        0x0020:  065c ffa6 0050 daf6 6f28 f133 804d 8010  .\...P..o(.3.M..
        0x0030:  2086 6ae6 0000 0101 080a 5c9c 52c8 389a  ..j.......\.R.8.
        0x0040:  132b                                     .+

[B]Requesting URL (Push, requesting ACK):[/B]
15:19:09.293778 IP myip.65446 > 194.109.6.92.80: [B][color="Red"]P[/color][/B] 1:237(236) [B][color="Red"]ack[/color][/B] 1 win 8326 <nop,nop,timestamp 1553748681 949621547>
        0x0000:  000b 23fd a966 0016 7627 48a8 0800 4500  ..#..f..v'H...E.
        0x0010:  0120 0af9 4000 4006 fa4f c27b a94a c26d  ....@.@..O.{.J.m
        0x0020:  065c ffa6 0050 daf6 6f28 f133 804d 8018  .\...P..o(.3.M..
        0x0030:  2086 599f 0000 0101 080a 5c9c 52c9 389a  ..Y.......\.R.8.
        0x0040:  132b 4745 5420 2f20 4854 5450 2f31 2e30  .+[B][color="Red"]GET./.HTTP/1.0[/color][/B]
        0x0050:  0d0a 486f 7374 3a20 7777 772e 7873 3461  ..[B][color="Red"]Host:.www.xs4a[/color][/B]
        0x0060:  6c6c 2e6e 6c0d 0a41 6363 6570 743a 2074  [B][color="Red"]ll.nl[/color][/B]..Accept:.t
        0x0070:  6578 742f 6874 6d6c 2c20 7465 7874 2f70  ext/html,.text/p
        0x0080:  6c61 696e 2c20 7465 7874 2f63 7373 2c20  lain,.text/css,.
        0x0090:  7465 7874 2f73 676d 6c2c 202a 2f2a 3b71  text/sgml,.*/*;q
        0x00a0:  3d30 2e30 310d 0a41 6363 6570 742d 456e  =0.01..Accept-En
        0x00b0:  636f 6469 6e67 3a20 677a 6970 2c20 636f  coding:.gzip,.co
        0x00c0:  6d70 7265 7373 2c20 627a 6970 320d 0a41  mpress,.bzip2..A
        0x00d0:  6363 6570 742d 4c61 6e67 7561 6765 3a20  ccept-Language:.
        0x00e0:  656e 0d0a 5573 6572 2d41 6765 6e74 3a20  en..User-Agent:.
        0x00f0:  4c79 6e78 2f32 2e38 2e36 7265 6c2e 3520  Lynx/2.8.6rel.5.
        0x0100:  6c69 6277 7777 2d46 4d2f 322e 3134 2053  libwww-FM/2.14.S
        0x0110:  534c 2d4d 4d2f 312e 342e 3120 4f70 656e  SL-MM/1.4.1.Open
        0x0120:  5353 4c2f 302e 392e 386b 0d0a 0d0a       SSL/0.9.8k....

[B]Server replies with ACK, 200 OK, HEAD, plus payload:[/B]
15:19:09.393977 IP 194.109.6.92.80 > myip.65446: [B][color="Red"].[/color][/B] 1:1449(1448) [B][color="Red"]ack[/color][/B] 237 win 33304 <nop,nop,timestamp 949621646 1553748681>
        0x0000:  0016 7627 48a8 000b 23fd a966 0800 4500  ..v'H...#..f..E.
        0x0010:  05dc 2562 4000 3806 e32a c26d 065c c27b  ..%b@.8..*.m.\.{
        0x0020:  a94a 0050 ffa6 f133 804d daf6 7014 8010  .J.P...3.M..p...
        0x0030:  8218 7020 0000 0101 080a 389a 138e 5c9c  ..p.......8...\.
        0x0040:  52c9 4854 5450 2f31 2e31 2032 3030 204f  R.[B][color="Red"]HTTP/1.1.200.O[/color][/B]
        0x0050:  4b0d 0a44 6174 653a 2053 6174 2c20 3234  [B][color="Red"]K[/color][/B]..Date:.Sat,.24
        0x0060:  204f 6374 2032 3030 3920 3133 3a31 393a  .Oct.2009.13:19:
        0x0070:  3039 2047 4d54 0d0a 5365 7276 6572 3a20  09.GMT..Server:.
        0x0080:  4170 6163 6865 2f31 2e33 2e33 3720 2855  Apache/1.3.37.(U
        0x00b0:  636f 6469 6e67 3a20 677a 6970 2c20 636f  coding:.gzip,.co
        0x00c0:  6d70 7265 7373 2c20 627a 6970 320d 0a41  mpress,.bzip2..A

--->8---- snip

See what your Push requests do. I see you're sending three of them in a row after the three-way handshake has completed successfully. I wonder what's in them, and what the other side does after that.
 
1 thing immediately stands out..

In the dump of traffic from/to a notebook on my private lan i see 192.168.* ip going out. (first time I've seen that in the several dumps I've done) Probably not a good thing, though.

I've got an ipfw firewall with a divert rule to /sbin/natd. Flags: use sockets, same ports. My rule reads
Code:
divert natd all from any to any via $oif

The man page isn't as clear as I'd wish about doing nat with ipfw. The nat directive examples all use
Code:
ipfw nat 123
What's the "123"? just an arbitrary identifier?

A suitable replacement command for me might be:
Code:
ipfw nat 123 config if bge0 log deny_in same_ports
?
Is the reset or log option recommended or necessary?

And I saw in another thread a user seeing his private IP address going out to the net. Recommendation was: "if you can switch to pf, do it. now." Is ipfw broken or hopeless? I'll switch to pf if I must. Would appreciate some guidance in that regard.

But the 192.168 packet doesn't fully explain my issues.. The latter 2 of 3 attachments dont show the 192.168.* ip going out.

It looks like my ack is being sent.. Within the braces looks like there may be beginning & ending byte windows. The ending byte window appears to match in the ack packets- if I'm reading it right.
 

Attachments

It is rather difficult to analyse these data, because they're not entirely 'clean'. It looks like you visited the same site a number of times, leading to traffic of previous connections mixing in with newer traffic. And I see a lot of gibberish in the host's reply packets, which is probably down to both sides supporting (and using) gzip, which is of course not human-readable. A HTTP OK / HEAD reply should be readable though, but I don't see one ..

Could you try repeating the dump, but this time using e.g. fetch from a console instead of Firefox (no pollution from cache and such).

RFC1918 adresses can only show up in tcpdump when you're sniffing an interface 'before NAT', e.g. the LAN interface of a NAT router (with NAT happening on the WAN interface), or on a laptop in the LAN. Those addresses should not show up on the NAT'ing interface.

As to firewalls, with or without NAT, I'm entirely partial to PF myself, but that's mere opinion. An informed, proven and tested one though.
 
Progress made

Hi,

Just an update. I decided to call the technical support of the company I'm doing business with before posting more dump data here to see if they can see anything unusual about my traffic. Their tech support recommended I change the MTU setting on my WinXP notebook from default 1500 to 1492. It worked. Both for browser access and proprietary software access on this WinXP.

To do: Win Vista. There is apparently a DOS command 'netsh' which will change the MTU. However the syntax tech support supplied me didn't satisfy Vista. I'm expecting a callback unless I can find it first.

To do too: FBSD. If I change the MTU on my external interface bge0 from current 1500 to 1492, will it have major unintended consequences? And is it easy to change back to 1500 if for any reason?
 
Shouldn't be a problem, though I find it quite weird that a company tells you to make such a fundamental change.. The last time I saw something like this was about eight years ago when a business DSL customer couldn't connect to (I think) the SAP website (return traffic didn't arrive). Changing the MTU on his connection fixed that. Never seen it since.

Changing the MTU is simply part of ifconfig(8). It can be changed on the fly, and it can be set in /etc/rc.conf.
 
if i didnt see it with my own eyes...

my chin hit my keyboard. i followed along his instructions figuring its the same kind of help desk script theyre all trained to follow.

but it worked. And thats what counts in the end. Words cant express my relief.

And thanks again for your efforts: Dutch, Phoenix, Anomie. Next time you're in PDX, drinks are on me. :-)


I'll post follow up when I get a command string that sets MTU for Vista, and if setting MTU for FBSD works as well.


I guess I'm equally relieved that its not my server, firewall, hardware, or even the DSL modem or ISP network. Whew!
 
DutchDaemon said:
The last time I saw something like this was about eight years ago when a business DSL customer couldn't connect to (I think) the SAP website (return traffic didn't arrive). Changing the MTU on his connection fixed that. Never seen it since.

DSL connections are notorious for this problem. I had the same thing when changing my company from a T1 to a DSL line - randomly, PCs couldn't connect to sites. One minute they'd get a timeout, click reload and it was instantly there. I was doing the same thing as OP - analyzing sniffer traffic and wasn't understanding why it was so random.

The reason is that DSL (or more specifically PPPoE) adds its own overhead and 'chops off' the end of the packet, even if there's data there, so that it has room for this overhead. (Exceptionally poor design decision IMO.) If the data chopped was an internal part of a web page or something, it was fine and would just drop out a character. If the truncated information happened to contain routing/request/other important information, timeouts would be the result.

Glad you got the problem fixed OP.



*edit - see about 60% of the way down this page for a more complete explanation, complete with diagrams. That page is what saved my sanity. ;) :) *
 
Just a follow-up regarding WinVista

the DOS command
Code:
netsh interface ipv4 set subinterface "Local Area Connection" mtu=1450 store=persistent
must be run as administrator. That tripped me up the first go-round.

For WinXP there is a utility called DrTCP which I got from http://www.dslreports.com/drtcp. It did the trick.

I've now got the whole private net on the same MTU value. Previous problems with certain websites are gone.

VPN to work now behaves a little squirrely. But it's not a roadblock.
 
Back
Top