"no buffer space" - interface buffer problem gets worse

One of my interfaces is frequently locking up, and it is getting worse. And I can't find a knob to tune the interface specific buffers.

What I can find is about tuning tcp+udp buffers - but these seem to be for individual connections, whereas this is an interface problem with one specific interface. Other comments mention mbuf shortage, which does also not really make sense:

Code:
29993/23902/53895 mbufs in use (current/cache/total)
20973/9077/30050/3053964 mbuf clusters in use (current/cache/total/max)
78/8116 mbuf+clusters out of packet secondary zone in use (current/cache)
51/392/443/1526982 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/452439 9k jumbo clusters in use (current/cache/total/max)
0/0/0/254497 16k jumbo clusters in use (current/cache/total/max)
49659K/25697K/75357K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)

The problem is, when the interface locks up, it can only be recovered with ifconfig down/up, on the concerned machine. And this has the effect that rtadvd (and probably other things) do no longer work afterwards and need to be restarted.
I would very much like to get rid of this problem and increase the buffer space for that interface - but how?
 
What type of hardware is the problem interface?
Output from pciconf -lv would probably be most useful
 
No hardware, it's virtual. And I checked the source, there is not even a buffer. :(

Also, while I cannot really reproduce the issue, there is a conditio-sine-qua-non: this interface transports all my personal data, music, websurfing, video watching, downloads, and there is no problem with that. Only when I transfer the output from a running llvm build on the other machine, the buffer runs full - and it seems not to matter it the build is in base or ports.
 
Code:
-------------------------          ---------------------------------------------------------
  client      ------------|        | server                                         
             | ngbridge0  |        |-----------------                          
   xterm     |            |        | ngbridge1       |                              
   ssh   --- | eiface1c   |  wire  |                 |                      
           / |        alc |--------| igb    eiface1s | --- sshd                    
   xterm  /  |            |        |                 | \    cu -------------           
   ssh   /   | eiface2c   |        |                 |  \                   |
              ------------         |        eiface2s |   \ sshd             |
                                   |            fib2 |      cu ------       |
                                   |                 |               |      |
                                   |                 |     ---------------------
                                   |                 |    | guest   COM1   COM2
                                   |                 |    |
                                   |          socket | -- | vtnet0
                                   |            nfsd |    | nfs

That's about how it looks. eiface1s is the problematic one.
There are more things connected to ngbridge1, and these continue to work. Only that one interface locks up, and only when running cu in it with some ongoing output.

While it didn't seem to make any sense that there would be any relation between a (virtual) serial connection and the network, there is something strange with the serials. In a normal ssh login, doing an ls might create 1700 bytes, and what goes encrypted over the wire is about 2100 bytes. With a cu console however, when doing the same ls of 1700 bytes, about 40'000 byte go over the wire. Maybe it sends every character as a separetely encrypted packet?:
 
Is the llvm build running on the guest, you are on the server and are monitoring the progress of the build over the serial ports of the guest?
If so, can you ssh to the guest over vtnet0 and monitor the build from there, just to see if there is any difference?
It is possible/likely that the serial port is doing something differently.
 
Is the llvm build running on the guest, you are on the server and are monitoring the progress of the build over the serial ports of the guest?
If so, can you ssh to the guest over vtnet0 and monitor the build from there, just to see if there is any difference?
It is possible/likely that the serial port is doing something differently.
I didn't bother to configure that. The guest gets created ad-hoc from a script, then the script opens COM2 and displays some progress-report through that. Then when I'm bored enough, I open COM1 and tail -f the actual output logs.Then I switch to some other screen and do some useful things, and suddenly the radio stops playing and the stuff I'm doing breaks apart. That's how this happens.

To have ssh login, that would need to be enabled in the guest - password or kerberos - but, these guests do build themselves: it can do a buildworld/installworld, then the snapshot is switched, and the next time it runs from it's own new build (so that this is already tested and seen working, before rolling it out). And the result might also be used as an image for contingency rescue of cloud VMs - I am not eager to have network access enabled beforehand in them - would need to think of a sensible way to put that into the code.

Yesterday I tried connecting not via eiface1s, but from laptop via VPN, and only then I noticed the data volume amplification that apparently comes due to the serials. That is now the first thing I see that could make some sense: that something in the network virtualization gets hammered with syscalls, and then occasionally locks up.
 
  • Like
Reactions: mer
Back
Top