XenServer - S-ATA / PCI-Passthrough - Kernel Panic

Hi

I thought I share this in order to help the FreeBSD project to aim better support when it comes to Xen: http://forums.anandtech.com/showthread. ... st36387753

My hardware
  • Mainboard: SuperMicro X10SAT
  • CPU: Intel(R) Xeon(R) CPU E3-1245 v3 @ 3.40GHz

I have a FreeBSD 10.0 RELEASE VM running on a XenServer 6.2. Today I succeded to passthrough SuperMicro's onboard 6x Port S-ATA controller (Intel C226) to the FreeBSD 10.0 RELEASE VM. Unfortunately it only works if I give the VM less than the max available 8x vCPUS. So eg. 7x vCPUS would work, and the FreeBSD VM would boot. 8x vCPUS are resulting in a kernel panic like:
Code:
[...]
Netvsc initializing... SMP: AP CPU #4 Launched!
SMP: AP CPU #6 Launched!
panic: can't schedule timer
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff808e7dd0 at kdb_backtrace+0x60
#1 0xffffffff808af8b5 at panic+0x155
#2 0xffffffff807a14dd at xentimer_et_start+0xed
#3 0xffffffff80d66d6d at loadtimer+0xfd
#4 0xffffffff80d657fd at handleevents+0x2dd
#5 0xffffffff80d65fc8 at timercb+0x308
#6 0xffffffff807a152d at xentimer_intr+0x4d
#7 0xffffffff80883e5b at intr_event_handle+0x9b
#8 0xffffffff80d8d1c8 at intr_execute_handlers+0x48
#9 0xffffffff80d96909 at xen_intr_handle_upcall+0x159
#10 0xffffffff80c760ac at Xxen_intr_upcall+0x8c
#11 0xffffffff80861238 at mi_startup+0x118
#12 0xffffffff802d3e0c at btext+0x2c
Uptime: 1m15s
Automatic reboot in 15 seconds - press a key on the console to abort
--> Press a key on the console to reboot,
--> or switch off the system now.
Rebooting...
SMP: AP CPU #7 Launched!
Also interesting is, that the HDDs on the S-ATA controller are only spinning up kind of at the same time as this line on boot screen: "Netvsc initializing... SMP: ..."

Even though system seems to work propper after a boot with less than 8x vCPUs, the boot process will still throw me the following errors:
Code:
[...]
SNM: AP CPU #1 Launched!
SNM: AP CPU #5 Launched!
SNM: AP CPU #4 Launched!
SNM: AP CPU #2 Launched!
SNM: AP CPU #6 Launched!
g_dev_taste: make_dev_p() failed (gp->name=ada0, error=17)
ugen0.2: <QEMU 0.10.2> at usbus0
g_dev_taste: make_dev_p() failed (gp->name=ada0p1, error=17)
Trying to mount root from ufs:/dev/ada0p2 [rw]...
[...]

Therefore, my friend Google lead me here:
http://forums.freenas.org/index.php?thr ... ugh.16574/
I guess you've got naming conflict between normal disk names of passed through AHCI controller and fake disk with the same name created by XEN PV drivers (for "compatibility" reasons). FreeBSD GEOM does not deny having two providers with the same name, that is why you see both of them in `gpart show`, but that is definitely considered wrong practice. As a workaround you can make CAM subsystem to not use ada0 device name. Try to set via loader prompt or via GUI tunables interface something like: hint.ada.0.at="scbus100".

And the "g_dev_taste" problem seems to be solved.
... But the kernel panic still comes up when I configure the VM with 8x vCPUs.
Any idea why this is happening?


Also, for people who are trying to do the same thing like I - here is what I did:

1) Get the PCI adress on the XenServer side:
Code:
lspci
00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v3 Processor DRAM Controller [8086] (rev 06)
00:02.0 VGA compatible controller [0300]: Intel Corporation Xeon E3-1200 v3 Processor Integrated Graphics Controller [8086] (rev 06)
00:03.0 Audio device [0403]: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller [8086] (rev 06)
00:14.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI [8086] (rev 04)
00:16.0 Communication controller [0780]: Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1 [8086] (rev 04)
00:16.3 Serial controller [0700]: Intel Corporation 8 Series/C220 Series Chipset Family KT Controller [8086] (rev 04)
00:19.0 Ethernet controller [0200]: Intel Corporation Ethernet Connection I217-LM [8086] (rev 04)
00:1a.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2 [8086] (rev 04)
00:1b.0 Audio device [0403]: Intel Corporation 8 Series/C220 Series Chipset High Definition Audio Controller [8086] (rev 04)
00:1c.0 PCI bridge [0604]: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #1 [8086] (rev d4)
00:1c.1 PCI bridge [0604]: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #2 [8086] (rev d4)
00:1c.3 PCI bridge [0604]: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #4 [8086] (rev d4)
00:1c.4 PCI bridge [0604]: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #5 [8086] (rev d4)
00:1d.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1 [8086] (rev 04)
00:1f.0 ISA bridge [0601]: Intel Corporation C226 Series Chipset Family Server Advanced SKU LPC Controller [8086] (rev 04)
00:1f.2 SATA controller [0106]: Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode] [8086] (rev 04)
00:1f.3 SMBus [0c05]: Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller [8086] (rev 04)
00:1f.6 Signal processing controller [1180]: Intel Corporation 8 Series Chipset Family Thermal Management Controller [8086] (rev 04)
01:00.0 SATA controller [0106]: ASMedia Technology Inc. ASM1062 Serial ATA Controller [1b21] (rev 01)
02:00.0 PCI bridge [0604]: PLX Technology, Inc. PEX 8606 6 Lane, 6 Port PCI Express Gen 2 (5.0 GT/s) Switch [10b5] (rev ba)
03:01.0 PCI bridge [0604]: PLX Technology, Inc. PEX 8606 6 Lane, 6 Port PCI Express Gen 2 (5.0 GT/s) Switch [10b5] (rev ba)
03:04.0 PCI bridge [0604]: PLX Technology, Inc. PEX 8606 6 Lane, 6 Port PCI Express Gen 2 (5.0 GT/s) Switch [10b5] (rev ba)
03:05.0 PCI bridge [0604]: PLX Technology, Inc. PEX 8606 6 Lane, 6 Port PCI Express Gen 2 (5.0 GT/s) Switch [10b5] (rev ba)
03:07.0 PCI bridge [0604]: PLX Technology, Inc. PEX 8606 6 Lane, 6 Port PCI Express Gen 2 (5.0 GT/s) Switch [10b5] (rev ba)
03:09.0 PCI bridge [0604]: PLX Technology, Inc. PEX 8606 6 Lane, 6 Port PCI Express Gen 2 (5.0 GT/s) Switch [10b5] (rev ba)
07:00.0 USB controller [0c03]: Renesas Technology Corp. uPD720201 USB 3.0 Host Controller [1912] (rev 03)
08:00.0 PCI bridge [0604]: Texas Instruments XIO2213A/B/XIO2221 PCI Express to PCI Bridge [Cheetah Express] [104c] (rev 01)
09:00.0 FireWire (IEEE 1394) [0c00]: Texas Instruments XIO2213A/B/XIO2221 IEEE-1394b OHCI Controller [Cheetah Express] [104c] (rev 01)
0a:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086] (rev 03)

In my case this line was relevant:
Code:
00:1f.2 SATA controller [0106]: Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode] [8086] (rev 04)

2) Many threads / how to / tutorials are now talking about PCI hide ... this did not work out for me neither did it seem to be relevant - so I skipped the following step.
Code:
vi /boot/extlinux.conf

label xe
[...]
append /boot/xen.gz [...] splash xen-pciback.hide=(00:1f.2) --- /boot/initrd-2.6-xen.img
[...]

extlinux -i /boot
shutdown -r now

3) I simply continued herewith:
Code:
# Get VM relevant ID and put manually into var UUID
xe vm-list

UUID="88e0869b-4de6-5f6a-d438-fde555d40015"

if ! xe vm-shutdown uuid="${UUID}"; then
    echo ''
    xe vm-shutdown uuid="${UUID}" force=true
fi
xe vm-param-set other-config:pci=0/0000:00:1f.2" uuid="${UUID}"
xe vm-start uuid="${UUID}"

In case something goes wrong - here is the cmd to remove PCI hardware from VM:
Code:
xe vm-param-remove param-name=other-config param-key=pci uuid="${UUID}"

Please let me know if someone has an idea about this kernel panic. Thanks
Kind regards


EDIT: Seems like I'm not alone ... yet 8x vCPUs work perfectly without the S-ATA PCI Passthrough for me.
http://freebsd.1045724.n5.nabble.com/Fr ... 86694.html
http://lists.freebsd.org/pipermail/free ... 02034.html
Here seems to be a patch for this issue: http://xenbits.xen.org/people/royger/00 ... imer.patch
Code:
From 8ea40470d15de47aa3bd6004fc5783f94535e00d Mon Sep 17 00:00:00 2001
From: Roger Pau Monne <roger.pau@citrix.com>
Date: Mon, 17 Feb 2014 16:08:58 +0100
Subject: [PATCH] xen: debug Xen PV timer

---
 sys/dev/xen/timer/timer.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/sys/dev/xen/timer/timer.c b/sys/dev/xen/timer/timer.c
index 354085b..a31343b 100644
--- a/sys/dev/xen/timer/timer.c
+++ b/sys/dev/xen/timer/timer.c
@@ -76,6 +76,8 @@ static devclass_t xentimer_devclass;
 
 #define	XENTIMER_QUALITY	950
 
+#define NUM_RETRIES	60
+
 struct xentimer_pcpu_data {
 	uint64_t timer;
 	uint64_t last_processed;
@@ -413,8 +415,10 @@ xentimer_et_start(struct eventtimer *et,
 	 *     equipped to deal with start failures.
 	 */
 	do {
-		if (++i == 60)
-			panic("can't schedule timer");
+		if (++i == NUM_RETRIES) {
+			panic("can't schedule timer on vCPU#%d, interval: %" PRIu64 "ns",
+			    cpu, first_in_ns);
+		}
 		next_time = xen_fetch_vcpu_time() + first_in_ns;
 		error = xentimer_vcpu_start_timer(cpu, next_time);
 	} while (error == -ETIME);
-- 
1.7.7.5 (Apple Git-26)
I still have to find out whether changing "NUM_RETRIES #define" will be an acceptable workaround this. Still ... if some one has any better clue - I would love to hear about it - or any other expierience of the kind ;)


Also for some may the following setting be interesting as well:
I also remembered I set 'hw.pci.enable_msi=1' and 'hw.pci.enable_msix=0' in /etc/sysctl.conf - someone else found that was necessary to use the LSI in passthrough mode.
... But it also didn't solve my kernel panic issue ...






Is there a chance that this could change something?

Code:
sysctl -a | grep hz
kern.clockrate: { hz = 100, tick = 10000, profhz = 8128, stathz = 127 }
kern.hz: 100
[...]

or may a change at one of the following have some positive effect?
Code:
sysctl -a | grep '950'
kern.osrevision: 199506
kern.eventtimer.et.XENTIMER.quality: 950
kern.eventtimer.choice: XENTIMER(950) LAPIC(400) i8254(100) RTC(0)
kern.timecounter.tc.XENTIMER.quality: 950
kern.timecounter.tc.HPET.quality: 950
kern.timecounter.choice: TSC-low(-100) ACPI-fast(900) i8254(0) HPET(950) XENTIMER(950) dummy(-1000000)
vm.kmem_size_max: 1319413950874

Since I don't really know what I may crash by changing one of them , it would be helpful to get some advice? ;)
Thanks
 
I attached some screenshots.
 

Attachments

  • XenServer-FreeBSD-PCI_Passthrough-8x_vCPU.png
    XenServer-FreeBSD-PCI_Passthrough-8x_vCPU.png
    30.8 KB · Views: 1,194
  • XenServer-FreeBSD-PCI_Passthrough-7x_vCPU.png
    XenServer-FreeBSD-PCI_Passthrough-7x_vCPU.png
    16.3 KB · Views: 1,110
  • XenServer-FreeBSD-PCI_Passthrough-6x_vCPU.png
    XenServer-FreeBSD-PCI_Passthrough-6x_vCPU.png
    25.3 KB · Views: 1,152
Here is the second part of the attachment.
 

Attachments

  • XenServer-FreeBSD-PCI_Passthrough-1x_vCPU.png
    XenServer-FreeBSD-PCI_Passthrough-1x_vCPU.png
    34.7 KB · Views: 1,138
  • XenServer-FreeBSD-PCI_Passthrough-1x_vCPU_-_hint.ada.0.0.at.png
    XenServer-FreeBSD-PCI_Passthrough-1x_vCPU_-_hint.ada.0.0.at.png
    19 KB · Views: 1,159
Leander said:
I thought I should move this to FreeBSD developement.
Just ask a moderator to move it. Please don't make copies of the same thread.

Threads merged.
 
Back
Top