High interrupt rate on ehci0

Goomba · Jul 9, 2011

Hello,

I am using FreeBSD 8.2-RELEASE on a new computer intended to be used as a backup server on my home LAN. After leaving the system running for a few days, the processor begins to get overwhelmed with processing interrupts on either ehci0 or ehci1.

Here is some background on my installation:

I am using the "GELI + ZFS" configuration described in another post. I basically followed this guide to the letter, with the exception of not installing a swap partition; the machine has 6 GB of RAM. The important thing to note from this guide is that I have the computer set to boot from a USB thumbdrive, which is physically removed once the operating system has finished booting.

Here is some output from top, to illustrate the problem:

Code:

last pid: 13766;  load averages:  0.00,  0.00,  0.00         up 3+03:18:46  02:52:33
42 processes:  1 running, 41 sleeping
CPU:  0.0% user,  0.0% nice,  0.0% system, 19.5% interrupt, 80.5% idle
Mem: 29M Active, 6844K Inact, 4147M Wired, 16K Cache, 1536M Free
Swap:

  PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
 1137 root        1  44    0  9372K  2100K select  0   0:28  0.00% top
...

This is what vmstat -i has to say about it:

Code:

interrupt                          total       rate
irq16: ehci0                 11672434566      43042
irq19: atapci0+                   611601          2
irq23: ehci1                      544034          2
cpu0: timer                    542358880       1999
irq256: re0                      9199472         33
cpu1: timer                    542348239       1999
Total                        12767496792      47080

Here's the relevant portions of devinfo -v:

Code:

nexus0
  ...
  acpi0
    ...
    pcib0 pnpinfo _HID=PNP0A08 _UID=0 at handle=\_SB_.PCI0
      pci0
        ...
        ehci0 pnpinfo vendor=0x8086 device=0x3b3c subvendor=0x1462 subdevice=0x7636 class=0x0c0320 at slot=26 function=0 handle=\_SB_.PCI0.USBE
          usbus0
            uhub0
              uhub2 pnpinfo vendor=0x8087 product=0x0020 devclass=0x09 devsubclass=0x00 sernum="" release=0x0000 intclass=0x09
              intsubclass=0x00 at bus=1 hubaddr=1 port=0 devaddr=2 interface=0
        ...
        ehci1 pnpinfo vendor=0x8086 device=0x3b34 subvendor=0x1462 subdevice=0x7636 class=0x0c0320 at slot=29 function=0 handle=\_SB_.PCI0.EUSB
          usbus1
            uhub1
              uhub3 pnpinfo vendor=0x8087 product=0x0020 devclass=0x09 devsubclass=0x00 sernum="" release=0x0000 intclass=0x09 intsubclass=0x00
              at bus=1 hubaddr=1 port=1 devaddr=2 interface=0
                ukbd0 pnpinfo vendor=0x413c product=0x2105 devclass=0x00 devsubclass=0x00 sernum="" release=0x0352 intclass=0x03 
                intsubclass=0x01 at bus=2 hubaddr=4 port=1 devaddr=4 interface=0
...

For this particular boot, the bootable USB thumbdrive was on ehci1 before it was removed.

I configured a cronjob to log the output of vmstat -i every half hour. Today, between 10:29 and 10:59 local time, the interrupt rate of ehci0 jumped from 1 to 1967. It has been steadily rising since then. The only cronjob that is scheduled to run between those times is an ntpdate, and that task runs every hour. The only USB device plugged into the machine is a USB keyboard.

The computer itself has a "MSI H55M-P33 Intel H55 LGA1156" motherboard, four SATA HDDs, and a 2.8 GHz Intel Pentium G6950.

Does anyone know what might cause this problem, or what I might be able to do to track down the cause? I'm not very experienced with non-Windows systems so I'm not sure what to try next.

Thank you for any assistance!

starslab · Apr 19, 2013

Sorry for the necromancy, but I should note that this issue appears to exist as PR 156596. I'd like to add that I'm seeing this on 9.0-RELEASE and 9.1-RELEASE as well, on an Intel DH67BL motherboard.

Interestingly, I'm also running a GELI+ZFS configuration. Mine doesn't boot from removable storage though.

Terri_Kennedy · Apr 20, 2013

Goomba (resurrected by starslab) said:
Does anyone know what might cause this problem, or what I might be able to do to track down the cause? I'm not very experienced with non-Windows systems so I'm not sure what to try next.

In addition to the PR you found, another possibility is an IRQ shared with another device. Not all devices receive specific drivers in FreeBSD - unsupported hardware obviously doesn't, but also things like some PCI bus controllers which "just work" without specific drivers.

If an interrupt is shared between two devices with drivers, normally the drivers will cooperate to make sure the interrupt gets handled by the appropriate driver. For example, on one of my systems uhci0 and vgapci0 share IRQ 16 even they're on separate PCI buses:

Code:

uhci0: <Intel 82801JI (ICH10) USB controller USB-D> port 0xaf80-0xaf9f irq 16 at device 26.0 on pci0
vgapci0: <VGA-compatible display> mem 0xf9000000-0xf9ffffff,0xfaffc000-0xfaffffff,0xfb000000-0xfb7fffff irq 16 at device 4.0 on pci7

Anyway, if a device with no driver attached generates a shared interrupt, it will get handled by whatever driver(s) handle the supported device. That device driver will then check to see why the device generated an interrupt. After seeing that its device isn't requesting service, it may try to dismiss the interrupt. Or it may simply expect the interrupt to go away on its own. But since the other device that's sharing the IRQ didn't see anything done to service its request, it will interrupt again. And again.

The interrupt counts you see will get associated with whatever driver registered that interrupt. If there's no driver registered for the IRQ, the kernel will log:

Code:

stray irqN

Where N is the IRQ number.

Some systems let you change the way the BIOS sets up interrupts. For example, some Dell systems have this option:

PowerEdge R300 Hardware manual said:
System Interrupt Assignment (Standard)

Controls the interrupt assignment of PCI devices in the system. When set to distributed, the interrupt routing will be swizzled to minimize IRQ sharing.

There's an extensive FreeBSD whitepaper on the subject here.

If changing that option doesn't help, or your BIOS doesn't offer that option, check with your motherboard / system manufacturer to see if there's a newer BIOS available.

starslab · Apr 20, 2013

Interesting. I'll poke around in the BIOS on the machine in question later.

In my case, vmstat -i shows a single driver on IRQ 16, that being ehci0. However, cat /var/run/dmesg.boot | grep 16 indicates three devices on that IRQ - ehci0, vgapci0, and vgapci0's PCI-PCI bridge.

If this issue is happening only on modern Intel Core architecture machines, could this be some unexpected behavior of the processor-integrated graphics? Reading through the PR, nobody has been able to actually catch the USB driver misbehaving, and the problem on the laptop only surfaces when it's plugged/unplugged from power - laptop screens like to change their brightness depending on power source.

I just figured out what causes my machine to start storming - when I pull the VGA connector from the back. Is there a way for me to add this information to the PR, or should I leave that to people who know what they're doing?

bthomson · Sep 12, 2014

Here is the new bugzilla URL for this issue.

High interrupt rate on ehci0

Goomba

starslab

Terri_Kennedy

starslab

bthomson