Dummynet and lot of queues

Hello!

I have a 7.2-RELEASE-p5 and ipfw configuration in which there are lot of queues (more then 1500) created inside the pipe:

Code:
        bwEmployes="30Mbit/s"
        ipEmployes="10.252.0.0/16, 10.1.4.0/24"

        # Employes pipe and queue
        ${fwcmd} pipe 40 config bw ${bwEmployes} 
        ${fwcmd} queue 41 config weight 50 pipe 40 mask src-ip 0xffffffff 
        ${fwcmd} queue 42 config weight 50 pipe 40 mask dst-ip 0xffffffff
        ${fwcmd} add queue 41 ip from ${ipEmployes} to any in via ${ifInt}
        ${fwcmd} add queue 42 ip from any to ${ipEmployes} out via ${ifInt}

Can a FreeBSD manage a lot of queues easely? Does i need tune a kernel limits for this configuration?

P.S. Have a server crash after near 40 days of working. Is this somehow related with this configuration?

P.P.S. I can give any additional info if it help.
 
Perhaps you should do a 'top -S' and look for dummynet's CPU usage.
PS: Why did you put both queues on the same pipe? I assume you want to share 30 mbps download AND another 30mbps for upload instead of 30mbps for both download and upload, right? If so, then you should configure two pipes, and connect the dynamic queues to different pipes for download and upload.
 
ecazamir said:
Perhaps you should do a 'top -S' and look for dummynet's CPU usage.
PS: Why did you put both queues on the same pipe? I assume you want to share 30 mbps download AND another 30mbps for upload instead of 30mbps for both download and upload, right? If so, then you should configure two pipes, and connect the dynamic queues to different pipes for download and upload.

There is my "top -S" output from console:

Code:
last pid: 51392;  load averages:  0.00,  0.00,  0.00                                                                                                       up 53+20:06:33  08:48:33
73 processes:  5 running, 49 sleeping, 19 waiting
CPU:  0.0% user,  0.0% nice,  0.4% system,  1.6% interrupt, 98.0% idle
Mem: 10M Active, 526M Inact, 216M Wired, 244K Cache, 112M Buf, 2503M Free
Swap: 4096M Total, 4096M Free

  PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
   12 root        1 171 ki31     0K     8K CPU2   2 1261.7 100.00% idle: cpu2
   11 root        1 171 ki31     0K     8K CPU3   3 1260.5 100.00% idle: cpu3
   14 root        1 171 ki31     0K     8K CPU0   0 1283.1 99.56% idle: cpu0
   13 root        1 171 ki31     0K     8K RUN    1 1276.0 99.07% idle: cpu1
   15 root        1 -44    -     0K     8K WAIT   3  52.1H  4.05% swi1: net
   45 root        1 -68    -     0K     8K -      2  23.4H  0.10% dummynet

And about shaping... I share a total bandwidth (upload and download) between users equaly... So, i put both directions traffic into the one pipe...
In other words - it's all right with this rules... I realy need to do so...

P.S. I make some tunes for kernel and still waiting for server fault, if it's occured... Now all looks clean...
 
I've had some problems in the following scenario:
- more than approx. 150 concurent users, everyone has two upper-limits of the download and upload, one for 'nearby traffic' and the other for 'global traffic'.
- traffic classification/shaping is done with ipfw + dummynet, only dynamic pipes (upper limit only)

If the number of users is below 150, everything works as expected, but as the number of users increase, shaping is no longer working fine, even if the total used bandwidth is below the 75% of the link available bandwidth.

I saw that:
- dummynet's CPU usage is above 50%
- the number of context switches is above 40k (systat -vmstat, Csw counter)

I assume that for a large number of users, considering that dummynet seems to be sigle-threaded, a single-dual core CPU at high frequency (let's say 3GHz) will work better than a quad-core CPU at a lower clock (2.3GHz).

I never experienced crashes, but only dummynet not being able to keep up with high data and packet rates:
- problematic data rate: above 50mbit/s
- problematic packet rate: above 10kpps
- context switches: above 40k/s
These numbers are recorded on a Intel Quad-Core CPU @2.3 GHz

I don't know how much it would help to enable DEVICE_POLLING

Regarding your 'single pipe' instead of two:
If your provider bandwidth is a symmetric 30mbit/s, and you share a single pipe both for upload and download, then you'll never use maximum available upload while downloading, and viceversa. You will only be able to have a _cummulated_ download and upload rate of 30mbit/s. You should have a pipe for download, with a rate slightly less than what your provider gives you (95% should be fine), and another pipe for upload, with a data rate of 95% of what your ISP maximum uplink speed.
 
ecazamir said:
I saw that:
- dummynet's CPU usage is above 50%
- the number of context switches is above 40k (systat -vmstat, Csw counter)

I assume that for a large number of users, considering that dummynet seems to be sigle-threaded, a single-dual core CPU at high frequency (let's say 3GHz) will work better than a quad-core CPU at a lower clock (2.3GHz).

...

I never experienced crashes, but only dummynet not being able to keep up with high data and packet rates:
- problematic data rate: above 50mbit/s
- problematic packet rate: above 10kpps
- context switches: above 40k/s
These numbers are recorded on a Intel Quad-Core CPU @2.3 GHz

I don't know how much it would help to enable DEVICE_POLLING

My csw parameter is near a 7K and dummynet usage is near 0.68% of CPU time. Device Polling was enabled and used in-kernel NAT. After some tunes, the system still working more then 40 days (now uptime is 54 days). But... I still waiting... What was happend next...

ecazamir said:
Regarding your 'single pipe' instead of two:
If your provider bandwidth is a symmetric 30mbit/s, and you share a single pipe both for upload and download, then you'll never use maximum available upload while downloading, and viceversa. You will only be able to have a _cummulated_ download and upload rate of 30mbit/s.

Yes, i want exactly this. All users have an equal queues inside the one big pipe. What for they use theirs bandwidth depend from theirs wishes.
 
ecazamir said:
- problematic data rate: above 50mbit/s
- problematic packet rate: above 10kpps
- context switches: above 40k/s
These numbers are recorded on a Intel Quad-Core CPU @2.3 GHz

I don't know how much it would help to enable DEVICE_POLLING

I've tried DEVICE_POLLING, and I have new numbers:
Max bandidth: approx. 100 Mbps (out of 200)
Max PPS: 15k x 4 = 60000 PPS aggregate packet rate
IPFW rules count: 550
At this packet rate, my machine starts dropping packets. Until I set net.inet.ip.process_options=0, which releases the beast. After setting net.inet.ip.process_options to zero packet loss went away with or without DEVICE_POLLING. I don't know that might be the new maximum achievable PPS, but it's more than 80k PPS and 150 MBit/s

PS: My current kernel is 8.1-PRERELEASE, set with with:
/boot/loader.conf
Code:
net.isr.direct=0
net.isr.maxthreads=4
net.isr.bindthreads=1
and /etc/sysctl.conf:
Code:
net.inet.ip.fastforwarding=1
net.inet.ip.intr_queue_maxlen=500
net.inet.ip.dummynet.io_fast=1
net.inet.ip.fastforwarding=1
net.inet.ip.process_options=0
 
Back
Top