FreeBSD 9.1 high load averages and interrupt/ idle

Greetings,

I have several servers with consumer grade hardware all running great with 9.1-RELEASE but one keeps freezing after some days, sometime it freezes even only after a few hours since server restart. Furthermore this is the only server with high load averages and high interrupt/idle for CPU1 (This is an Intel Pentium G630T 2.3 GHz dual core CPU on Intel H61 chipset) even when idle. None of the other servers have these problems.

The server was upgraded from 9.0-RELEASE (I don't recall any problem then with 9.0-RELEASE) and is a dedicated router/firewall with 3 NICs, 1x onboard Realtek and 2x from an Intel PT/1000 dual port, all gigabits:
  1. Runs PPPoE, PF, OpenVPN and Avahi (configured to route Bonjour's zero-config between private subnets).
  2. Realtek is connected to ISP's broadband modem.
  3. Intels each connected to different private subnet, one is jumbo frame (MTU 9000) only subnet.

Network load is not too high (see caption below). Appreciate any help urgently as this server is in production and link the rest with the outside world. Thank you so much.

Code:
root@moon:/ # uname -a
FreeBSD moon 9.1-RELEASE-p3 FreeBSD 9.1-RELEASE-p3 #0: Mon Apr 29 18:27:25 UTC 2013     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

Code:
root@moon:/ # netstat -w1
            input        (Total)           output
  
 packets  errs idrops      bytes    packets  errs      bytes colls
        12     0     0       2171         21     0       2798     0
        13     0     0       2382         12     0       1983     0
         7     0     0        970          5     0        778     0
         3     0     0        390          3     0        502     0
         3     0     0        406          3     0        518     0
         3     0     0        390          3     0        502     0
         3     0     0        358          3     0        470     0
         7     0     0        678          7     0        750     0
        13     0     0       1836         22     0       2624     0
         7     0     0        752          5     0        667     0
         3     0     0        374          3     0        486     0
         9     0     0        916          9     0        943     0
        33     0     0       5153         61     0       7941     0
        58     0     0      11935         73     0      19171     0
       293     0     0     276038        256     0     145615     0
       159     0     0     145884        140     0      77236     0
         4     0     0        574          7     0       1064     0
         6     0     0        756          5     0        715     0
         5     0     0        622          4     0        583     0
         5     0     0        576          4     0        576     0
         5     0     0        642          4     0        601     0

Code:
root@moon:/ # top -S -P -b -d2
last pid:  1568;  load averages:  0.60,  0.58,  0.51  up 0+01:21:26    13:13:05
53 processes:  2 running, 50 sleeping, 1 waiting

Mem: 27M Active, 18M Inact, 178M Wired, 23M Buf, 7630M Free
Swap: 8192M Total, 8192M Free



  PID USERNAME       THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
   11 root             2 155 ki31     0K    32K RUN     1 142:40 154.69% idle
   12 root            13 -84    -     0K   208K WAIT    0  19:29 50.49% intr
    0 root            13 -52    0     0K   208K -       1   0:16  0.00% kernel
  920 root             1  20    0 44884K  3476K select  1   0:05  0.00% ppp
   14 root             1 -16    -     0K    16K -       1   0:01  0.00% yarrow
 1176 root             1  20    0 12052K  1568K select  1   0:01  0.00% powerd
 1313 user1           1  20    0 69944K  5608K select  0   0:01  0.00% sshd
 1249 avahi            1  20    0 32372K  3164K select  0   0:01  0.00% avahi-daemon
   15 root             8 -68    -     0K   128K -       1   0:01  0.00% usb
 1173 root             1  20    0 24256K  3284K select  0   0:00  0.00% ntpd
  828 _pflogd          1  20    0 12184K  1896K bpf     0   0:00  0.00% pflogd
 1388 root             1  20    0 19592K  3332K pause   1   0:00  0.00% csh
 1311 root             1  30    0 69944K  5580K sbwait  0   0:00  0.00% sshd
   16 root             1 -16    -     0K    16K tzpoll  0   0:00  0.00% acpi_thermal
 1261 root             1  20    0 22332K  4568K select  1   0:00  0.00% sendmail
   13 root             3  -8    -     0K    48K -       1   0:00  0.00% geom
  812 root             1 -16    -     0K    16K pftm    1   0:00  0.00% pfpurge
   17 root             1  16    -     0K    16K syncer  1   0:00  0.00% syncer

last pid:  1568;  load averages:  0.60,  0.58,  0.51  up 0+01:21:28    13:13:07
53 processes:  2 running, 50 sleeping, 1 waiting
CPU 0:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 1:  0.0% user,  0.0% nice,  0.0% system, 49.6% interrupt, 50.4% idle
Mem: 27M Active, 18M Inact, 178M Wired, 23M Buf, 7630M Free
Swap: 8192M Total, 8192M Free

  PID USERNAME       THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
   11 root             2 155 ki31     0K    32K CPU1    1 142:43 154.69% idle
   12 root            13 -84    -     0K   208K WAIT    0  19:30 50.68% intr
    0 root            13 -52    0     0K   208K -       1   0:16  0.00% kernel
  920 root             1  20    0 44884K  3476K select  1   0:05  0.00% ppp
   14 root             1 -16    -     0K    16K -       1   0:01  0.00% yarrow
 1176 root             1  20    0 12052K  1568K select  1   0:01  0.00% powerd
 1313 user1           1  20    0 69944K  5608K select  1   0:01  0.00% sshd
 1249 avahi            1  20    0 32372K  3164K select  0   0:01  0.00% avahi-daemon
   15 root             8 -68    -     0K   128K -       1   0:01  0.00% usb
 1173 root             1  20    0 24256K  3284K select  0   0:00  0.00% ntpd
  828 _pflogd          1  20    0 12184K  1896K bpf     1   0:00  0.00% pflogd
 1388 root             1  20    0 19592K  3332K pause   1   0:00  0.00% csh
 1311 root             1  30    0 69944K  5580K sbwait  0   0:00  0.00% sshd
   16 root             1 -16    -     0K    16K tzpoll  0   0:00  0.00% acpi_thermal
 1261 root             1  20    0 22332K  4568K select  1   0:00  0.00% sendmail
   13 root             3  -8    -     0K    48K -       1   0:00  0.00% geom
  812 root             1 -16    -     0K    16K pftm    1   0:00  0.00% pfpurge
   17 root             1  16    -     0K    16K syncer  1   0:00  0.00% syncer
 
The intr 'process' does seem to take up a lot of CPU. You can have a look with [cmd=]systat -vmstat[/cmd] to see which device is causing all those interrupts. It may simply be a defective card.
 
SirDice said:
The intr 'process' does seem to take up a lot of CPU. You can have a look with [cmd=]systat -vmstat[/cmd] to see which device is causing all those interrupts. It may simply be a defective card.

Thanks @SirDice.

The server just froze again and I had to hard boot the machine. Upon returning back the high load averages/interrupts/idle are gone! I tried to put lots of traffic and capture the command output per your suggestion, seems like re0 and em0 are taking quite a number of interrupts. Do I have an intermittently defective re0 and/or em0?

Btw, the server froze again after I put lots of traffic. :(

Code:
$ systat -vmstat

    1 users    Load  0.27  0.22  0.17                  Jul 22 14:47

Mem:KB    REAL            VIRTUAL                       VN PAGER   SWAP PAGER
        Tot   Share      Tot    Share    Free           in   out     in   out
Act   38824    5908   598732     7080 7809324  count
All  165520    6800 1074435k    22312          pages
Proc:                                                            Interrupts
  r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Flt        cow    6126 total
  1          27       13k   52  21k 4485  106             zfod        atkbd0 1
                                                          ozfod     2 ehci0 16
 8.1%Sys   2.0%Intr  1.8%User  0.0%Nice 88.0%Idle        %ozfod     2 ehci1 23
|    |    |    |    |    |    |    |    |    |    |       daefr   730 cpu0:timer
====+>                                                    prcfr  1306 em0 264
                                         4 dtbuf        2 totfr     8 em1 265
Namei     Name-cache   Dir-cache    205550 desvn          react  3165 re0 266
   Calls    hits   %    hits   %       594 numvn          pdwak     1 ahci0 267
       3       3 100                    41 frevn          pdpgs   912 cpu1:timer
                                                          intrn
Disks  ada0 pass0                                  190276 wire
KB/t  32.00  0.00                                   26688 act
tps       1     0                                   15500 inact
MB/s   0.02  0.00                                         cache
%busy     0     0                                 7809324 free
                                                    22976 buf

Code:
$ top -S -P -b -d2
last pid:  1352;  load averages:  0.01,  0.02,  0.04  up 0+00:19:41    15:09:18
50 processes:  2 running, 47 sleeping, 1 waiting

Mem: 25M Active, 15M Inact, 181M Wired, 22M Buf, 7632M Free
Swap: 8192M Total, 8192M Free



  PID USERNAME       THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
   11 root             2 155 ki31     0K    32K CPU1    1  38:51 200.00% idle
   12 root            13 -84    -     0K   208K WAIT    0   0:05  0.00% intr
  929 root             1  20    0 44884K  3476K select  0   0:04  0.00% ppp
    0 root            13 -52    0     0K   208K -       0   0:04  0.00% kernel
  828 _pflogd          1  20    0 12184K  1932K bpf     1   0:01  0.00% pflogd
   14 root             1 -16    -     0K    16K -       1   0:01  0.00% yarrow
 1176 root             1  20    0 12052K  1568K select  1   0:00  0.00% powerd
 1249 avahi            1  20    0 32372K  3164K select  0   0:00  0.00% avahi-daemon
 1173 root             1  20    0 24256K  3280K select  0   0:00  0.00% ntpd
 1312 user1           1  20    0 69944K  5608K select  1   0:00  0.00% sshd
   15 root             8 -68    -     0K   128K -       1   0:00  0.00% usb
 1310 root             1  26    0 69944K  5580K sbwait  0   0:00  0.00% sshd
   16 root             1 -16    -     0K    16K tzpoll  0   0:00  0.00% acpi_thermal
 1261 root             1  20    0 22332K  4568K select  1   0:00  0.00% sendmail
   13 root             3  -8    -     0K    48K -       0   0:00  0.00% geom
  812 root             1 -16    -     0K    16K pftm    1   0:00  0.00% pfpurge
   17 root             1  16    -     0K    16K syncer  1   0:00  0.00% syncer
   18 root             1 -16    -     0K    16K sdflus  1   0:00  0.00% softdepflush

last pid:  1352;  load averages:  0.01,  0.02,  0.04  up 0+00:19:43    15:09:20
50 processes:  2 running, 47 sleeping, 1 waiting
CPU 0:  0.0% user,  0.0% nice,  0.4% system,  0.8% interrupt, 98.8% idle
CPU 1:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 25M Active, 15M Inact, 181M Wired, 22M Buf, 7631M Free
Swap: 8192M Total, 8192M Free

  PID USERNAME       THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
   11 root             2 155 ki31     0K    32K CPU1    1  38:55 200.00% idle
 1352 ijunus           1  20    0 18620K  2324K CPU0    0   0:00  0.10% top
   12 root            13 -84    -     0K   208K WAIT    0   0:05  0.00% intr
  929 root             1  20    0 44884K  3476K select  0   0:04  0.00% ppp
    0 root            13 -52    0     0K   208K -       0   0:04  0.00% kernel
  828 _pflogd          1  20    0 12184K  1932K bpf     1   0:01  0.00% pflogd
   14 root             1 -16    -     0K    16K -       1   0:01  0.00% yarrow
 1176 root             1  20    0 12052K  1568K select  1   0:00  0.00% powerd
 1249 avahi            1  20    0 32372K  3164K select  0   0:00  0.00% avahi-daemon
 1173 root             1  20    0 24256K  3280K select  0   0:00  0.00% ntpd
 1312 user1           1  20    0 69944K  5608K select  0   0:00  0.00% sshd
   15 root             8 -68    -     0K   128K -       1   0:00  0.00% usb
 1310 root             1  26    0 69944K  5580K sbwait  0   0:00  0.00% sshd
   16 root             1 -16    -     0K    16K tzpoll  0   0:00  0.00% acpi_thermal
 1261 root             1  20    0 22332K  4568K select  1   0:00  0.00% sendmail
   13 root             3  -8    -     0K    48K -       0   0:00  0.00% geom
  812 root             1 -16    -     0K    16K pftm    1   0:00  0.00% pfpurge
   17 root             1  16    -     0K    16K syncer  1   0:00  0.00% syncer
 
Last edited by a moderator:
User23 said:
I would disable the Realtek onboard NIC in the BIOS and use another Intel PCI/PCIe card first.

Thanks @User23. I wish I had an empty slot but you're pointing me to the right direction.

UPDATE:
I managed to get another motherboard and it runs flawlessly, so I'm pretty sure it's a faulty motherboard and/or onboard re0. It's in RMA now.

Also while in the subject of high interrupt, I found that booting up the box with VGA port hooked to monitor once I unhook the VGA cable the interrupt number for ehci0 (in my case it shares irq16 with pcib1, pcib2, pcib3, vgapci0 and em0) exploded and keep increasing in huge number. I read from some forums other people are experiencing the same thing. For my case the solution was to boot up without hooking up the monitor to the VGA port.

Hope this can be useful for everyone.
 
Last edited by a moderator:
How did you get that re0 to work with FreeBSD 9.1? Mine just completely died but works great with 8.4-RELEASE now.
 
kvi said:
How did you get that re0 to work with FreeBSD 9.1? Mine just completely died but works great with 8.4-RELEASE now.

Hi @kvi,

I'm not quite sure I understand you question, you're saying in your case 9.1 RELEASE doesn't recognize the Realtek port?
 
Last edited by a moderator:
It did recognize it, but no matter what I did it kept telling "status: no carrier" and no data would go trough it. Installing Vista got the NIC working again, but reinstalling 9.1-RELEASE killed it again. I actually made a bug-report on the re driver as I was instructed in another thread.

Now I have 8.4-RELEASE and it works fine. I'd love to give 9.x a shot, but I couldn't get my Realtek to work with it.
 
I see, just read your thread.

I did a standard install, nothing exotic, perhaps my Realtek chipset is different from yours. Motherboard is Foxconn H61S and I believe it is using RTL8111E chipset if that helps.
 
Back
Top