High interrupt rate on ehci0

General questions about the FreeBSD operating system. Ask here if your question does not fit elsewhere.

High interrupt rate on ehci0

Postby Goomba » 09 Jul 2011, 03:13

Hello,

I am using FreeBSD 8.2-RELEASE on a new computer intended to be used as a backup server on my home LAN. After leaving the system running for a few days, the processor begins to get overwhelmed with processing interrupts on either [FILE]ehci0[/FILE] or [FILE]ehci1[/FILE].

Here is some background on my installation:

I am using the "GELI + ZFS" configuration described in another post. I basically followed this guide to the letter, with the exception of not installing a swap partition; the machine has 6 GB of RAM. The important thing to note from this guide is that I have the computer set to boot from a USB thumbdrive, which is physically removed once the operating system has finished booting.

Here is some output from [cmd=]top[/cmd], to illustrate the problem:
Code: Select all
last pid: 13766;  load averages:  0.00,  0.00,  0.00         up 3+03:18:46  02:52:33
42 processes:  1 running, 41 sleeping
CPU:  0.0% user,  0.0% nice,  0.0% system, 19.5% interrupt, 80.5% idle
Mem: 29M Active, 6844K Inact, 4147M Wired, 16K Cache, 1536M Free
Swap:

  PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
 1137 root        1  44    0  9372K  2100K select  0   0:28  0.00% top
...


This is what [cmd=]vmstat -i[/cmd] has to say about it:
Code: Select all
interrupt                          total       rate
irq16: ehci0                 11672434566      43042
irq19: atapci0+                   611601          2
irq23: ehci1                      544034          2
cpu0: timer                    542358880       1999
irq256: re0                      9199472         33
cpu1: timer                    542348239       1999
Total                        12767496792      47080


Here's the relevant portions of [cmd=]devinfo -v[/cmd]:
Code: Select all
nexus0
  ...
  acpi0
    ...
    pcib0 pnpinfo _HID=PNP0A08 _UID=0 at handle=\_SB_.PCI0
      pci0
        ...
        ehci0 pnpinfo vendor=0x8086 device=0x3b3c subvendor=0x1462 subdevice=0x7636 class=0x0c0320 at slot=26 function=0 handle=\_SB_.PCI0.USBE
          usbus0
            uhub0
              uhub2 pnpinfo vendor=0x8087 product=0x0020 devclass=0x09 devsubclass=0x00 sernum="" release=0x0000 intclass=0x09
              intsubclass=0x00 at bus=1 hubaddr=1 port=0 devaddr=2 interface=0
        ...
        ehci1 pnpinfo vendor=0x8086 device=0x3b34 subvendor=0x1462 subdevice=0x7636 class=0x0c0320 at slot=29 function=0 handle=\_SB_.PCI0.EUSB
          usbus1
            uhub1
              uhub3 pnpinfo vendor=0x8087 product=0x0020 devclass=0x09 devsubclass=0x00 sernum="" release=0x0000 intclass=0x09 intsubclass=0x00
              at bus=1 hubaddr=1 port=1 devaddr=2 interface=0
                ukbd0 pnpinfo vendor=0x413c product=0x2105 devclass=0x00 devsubclass=0x00 sernum="" release=0x0352 intclass=0x03
                intsubclass=0x01 at bus=2 hubaddr=4 port=1 devaddr=4 interface=0
...


For this particular boot, the bootable USB thumbdrive was on [FILE]ehci1[/FILE] before it was removed.

I configured a cronjob to log the output of [cmd=]vmstat -i[/cmd] every half hour. Today, between 10:29 and 10:59 local time, the interrupt rate of [FILE]ehci0[/FILE] jumped from 1 to 1967. It has been steadily rising since then. The only cronjob that is scheduled to run between those times is an [cmd=]ntpdate[/cmd], and that task runs every hour. The only USB device plugged into the machine is a USB keyboard.

The computer itself has a "MSI H55M-P33 Intel H55 LGA1156" motherboard, four SATA HDDs, and a 2.8 GHz Intel Pentium G6950.

Does anyone know what might cause this problem, or what I might be able to do to track down the cause? I'm not very experienced with non-Windows systems so I'm not sure what to try next.

Thank you for any assistance! :)
Goomba
Junior Member
 
Posts: 1
Joined: 09 Jul 2011, 02:40

Postby starslab » 19 Apr 2013, 10:48

Sorry for the necromancy, but I should note that this issue appears to exist as PR 156596. I'd like to add that I'm seeing this on 9.0-RELEASE and 9.1-RELEASE as well, on an Intel DH67BL motherboard.

Interestingly, I'm also running a GELI+ZFS configuration. Mine doesn't boot from removable storage though.
starslab
Junior Member
 
Posts: 9
Joined: 14 Feb 2012, 20:16

Postby Terry_Kennedy » 20 Apr 2013, 05:09

Goomba (resurrected by starslab) wrote:Does anyone know what might cause this problem, or what I might be able to do to track down the cause? I'm not very experienced with non-Windows systems so I'm not sure what to try next.

In addition to the PR you found, another possibility is an IRQ shared with another device. Not all devices receive specific drivers in FreeBSD - unsupported hardware obviously doesn't, but also things like some PCI bus controllers which "just work" without specific drivers.

If an interrupt is shared between two devices with drivers, normally the drivers will cooperate to make sure the interrupt gets handled by the appropriate driver. For example, on one of my systems [file]uhci0[/file] and [file]vgapci0[/file] share IRQ 16 even they're on separate PCI buses:
Code: Select all
uhci0: <Intel 82801JI (ICH10) USB controller USB-D> port 0xaf80-0xaf9f irq 16 at device 26.0 on pci0
vgapci0: <VGA-compatible display> mem 0xf9000000-0xf9ffffff,0xfaffc000-0xfaffffff,0xfb000000-0xfb7fffff irq 16 at device 4.0 on pci7


Anyway, if a device with no driver attached generates a shared interrupt, it will get handled by whatever driver(s) handle the supported device. That device driver will then check to see why the device generated an interrupt. After seeing that its device isn't requesting service, it may try to dismiss the interrupt. Or it may simply expect the interrupt to go away on its own. But since the other device that's sharing the IRQ didn't see anything done to service its request, it will interrupt again. And again.

The interrupt counts you see will get associated with whatever driver registered that interrupt. If there's no driver registered for the IRQ, the kernel will log:
Code: Select all
stray irqN

Where N is the IRQ number.

Some systems let you change the way the BIOS sets up interrupts. For example, some Dell systems have this option:
PowerEdge R300 Hardware manual wrote:System Interrupt Assignment (Standard)

Controls the interrupt assignment of PCI devices in the system. When set to distributed, the interrupt routing will be swizzled to minimize IRQ sharing.

There's an extensive FreeBSD whitepaper on the subject here.

If changing that option doesn't help, or your BIOS doesn't offer that option, check with your motherboard / system manufacturer to see if there's a newer BIOS available.
User avatar
Terry_Kennedy
Member
 
Posts: 514
Joined: 09 Apr 2010, 11:22
Location: New York City

Postby starslab » 20 Apr 2013, 09:10

Interesting. I'll poke around in the BIOS on the machine in question later.

In my case, [CMD="$"]vmstat -i[/CMD] shows a single driver on IRQ 16, that being [FILE]ehci0[/FILE]. However, [CMD="$"]cat /var/run/dmesg.boot | grep 16[/CMD] indicates three devices on that IRQ - [FILE]ehci0[/FILE], [FILE]vgapci0[/FILE], and [FILE]vgapci0[/FILE]'s PCI-PCI bridge.

If this issue is happening only on modern Intel Core architecture machines, could this be some unexpected behavior of the processor-integrated graphics? Reading through the PR, nobody has been able to actually catch the USB driver misbehaving, and the problem on the laptop only surfaces when it's plugged/unplugged from power - laptop screens like to change their brightness depending on power source.

I just figured out what causes my machine to start storming - when I pull the VGA connector from the back. Is there a way for me to add this information to the PR, or should I leave that to people who know what they're doing?
starslab
Junior Member
 
Posts: 9
Joined: 14 Feb 2012, 20:16


Return to General

Who is online

Users browsing this forum: No registered users and 0 guests