Apparent TCP Hang when streaming

Hello. I've got a strange issue that I'm not even sure how to troubleshoot. The issue occurs when I'm streaming an mp3 file using php/apache (fpassthru) from a freebsd 8.0 server. At seemingly random times the stream appears to hang for 2 or 3 minutes and then resumes. I don't see anything in the apache or php logs when this occurs and don't know where else in bsd land to look for clues. The network from server to client (over internet) appears to be up or atleast routeable as I can jump onto a ssh connection to the server and do ping tests back and forth.
Unfortunately there were a number of changes (new router, dsl, server hardware and server os!) that all happpened about the same time when this problem cropped up so it's hard for me to narrow down where the problem is coming from. Because it was easy, I loaded a web server(same version) and the streaming app onto a linux box (ubuntu 9.10) and after a few days the symptom hasn't occurred which leads me to think there's a problem on the new bsd server (hardware or software). Does anyone have any suggestions on what I could monitor/inspect to find out what's going on?

Thanks,

John
 
I seem to have a similar problem. It happens with NFS and samba shares. Still haven't figured out what exactly is causing it.

Everything works fine for some time then speed drops to almost nothing for a couple of minutes.
 
Try running these commands and see if it changes the situation for the better:
# sysctl kern.ipc.maxsockbuf=2097152

# sysctl net.inet.tcp.recvspace=262144
# sysctl net.inet.tcp.recvspace=262144
# sysctl net.inet.tcp.mssdflt=1452

# sysctl net.inet.udp.recvspace=65535
# sysctl net.inet.udp.maxdgram=65535

# sysctl net.local.stream.recvspace=65535
# sysctl net.local.stream.sendspace=65535

I have these values on my FBSD server right now, and they're doing wonders for network performance (example: Samba is capable of capping a 1gbps connection in both directions, where it would hit the roof at around 60MB/s without these adjustments)

If you start getting NOQUEUE errors or the box stops accepting TCP connections, you can undo the changes by rebooting the box (ctrl+alt+delete in console, for example).

If everything works fine after these changes, you can add the values to /etc/sysctl.conf (minus the # sysctl part)
 
Hi Marie, thanks for the suggestion. It took me a few days to try it as I was loading various bsd and linux's onto the box to try to determine whether the issue is freebsd 8 related (doesn't appear so as it occurred with all of them). I tried your suggestion above, but it had no effect unfortunately. I am now trying to isolate network pieces which is proving hard. If anyone has suggestions for network monitoring tools that may help identify the problem or freebsd log files that might be helpful, please chime in :)

J
 
Is there any chance your modem (whatever type it is) is hanging at these periods? Does it help forcing it to restart when it happens? Or are you able to see if it's restarting on its own?

If the problem is frequent or reproducable, you could also try to connect the BSD box directly to the modem (unless that would require too much of a config hassle - read PPPoE etc).

What kind of a router are you using?
Is there any P2P (bittorrent, p2p games etc) going on? That usually messes with devices which have a limited states table size
 
>Is there any chance your modem (whatever type it is) is hanging at these periods?

Maybe. But I have often had a continuous ssh conection from the same client to the same server open during one of these freezes. The ssh connection is not affected, so it's not a wholesale modem hang, rather on a session level.

>Does it help forcing it to restart when it happens? Or are you able to see if it's restarting on its own?

I haven't tried restarting the modem/router, but if I leave everything alone, it restarts session activity (streaming) on it's own after a couple of minutes.

>If the problem is frequent or reproducable, you could also try to connect the BSD box directly to the modem (unless that would require too much of a config hassle - read PPPoE etc).

It's an integrated modem router, so that'd be unfortunately difficult. It is fairly reproduce able though (several times a day), so I'm trying to isolate out any networking hardware/cables 1 at a time right now.

>What kind of a router are you using?

It's an actiontec q1000 for qwest's fiber dsl.

>Is there any P2P (bittorrent, p2p games etc) going on? That usually messes with devices which have a limited states table size

No, I don't believe so. But that's a good point, I'll try turning off all the other devices on the network (satellite, game, netflix, laptops...). Maybe it's something stupid like an ip conflict with a device that's trying to call home for an update.

Thanks for the suggestions.

John
 
Resolved! (mostly)

A quick update for any with similar issues. I finally found a solution, although the exact nature of the cause still escapes me. I have qwest fiber dsl (I think its vdsl?) which requires an actiontec q1000 router/modem. Apparently the embedded nic card in my dell poweredge and the q1000 don't get along very well when there is any kind of packet loss. I was able to monitor the problem by packet sniffing the tcp traffic from both sides and anytime a retransmission was required it would trigger this symptom. To fix the issue, I placed the q1000 in bridged mode and connected another nat router to be the backbone of the network. This has worked flawlessly for several days now.

Thanks for the help :)

J
 
Back
Top