Netgraph control messages on nodes with high traffic

Hi.

Our FreeBSD 8.2 boxes process high volumes of traffic, they work as policers/service control engine. Everything is handled in a own self written netgraph module. When traffic comes close to 4-6Gb/s in both directions we experience significant slowdown of netgraph control messages sent to the module. According to our investigation this is not the our module problem. Every node involved in the traffic processing starting to answer slow. Since such a box handles traffic up to 10K users at a time, there is significant control message flow on the node, so the slowdown becomes critical. But more important it looks like that processing of a control message stops traffic processing of a node for a while.

So, I see that on peak hours ng_queue processes appears in the top and in worst situations can consume up to 100% of a processor core. Packet delays and losts starting to appear. At the same time total CPU utilization is less then 50% and all operations on the system woks just fine on interfaces not involved in the traffic processing: SSH is smooth, console is responsive and so on.

Removing of control messages flow helps well - ng_queue immediately drops processor use, traffic losts and delyas disappears. Netgraph control messages of our module do not require any significant CPU processing. In fact, simple netgraph control messages, say, to ng_ether modules called in a cycle causing immediate traffic degradation and ng_queue CPU usage.

I haven't finished all the tests yet, but as for now: simple messages without params are processed fast, and do not cause degradation (again, this needs to be double-checked, maybe they less hugry or something). At the moment I assume there is a kind of lock for data flow when messages are processed. I would appreciative if someone with kernel architecture knowledge and specifically, netgraph subsystem could comment on this. Maybe there is a straightforward ways to fix the netgraph control messages handling.

P.S.

I've found a solution - instead of the netgraph messages use data hooks for controlling the module together with ng_ksocket module or NgSocket C library. This worked well on 'proof of concept' setup, but requires to switch from very convenient netgraph message parsing library to own protocol. But I'd better spent my efforts on improving messages handling instead of doing another wheel.
 
Back
Top