Hi everyone,
I have assembled a new system that has turned out to have a recurring problem; every so often it will lock up completely. All connections time out, the console is unresponsive and if the zfs pool was busy, the disk activity LEDs will remain in the state they were in when it happened. As far as I can tell, it is frozen through and through.
I made a post on the "Base System / Storage" board, since I was convinced this all started when I attached the Supermicro AOC-USAS2-L8e card and configured the ZFS pool - the system seemed to run just fine for two weeks prior to this. I've now verified that it will also happen without that card connected, and that the issue may in fact have been present from the beginning.
I can't seem to pin down a trigger. Moving data around seems to push forward the point at which a freeze will occur. I can fairly reliably provoke a freeze within a couple of hours, by repeatedly copying a 50GB file over a Samba share, but it can also happen if the system is just sitting more or less idle.
I've tried to rule out a few things.
What I haven't tried yet.
Hardware:
PSU: Corsair RM450
Motherboard: ASUS P8H77-M Pro
CPU: Intel Core i3-3250
RAM: Corsair XMS3
Any ideas or suggestions would be welcome.
I have assembled a new system that has turned out to have a recurring problem; every so often it will lock up completely. All connections time out, the console is unresponsive and if the zfs pool was busy, the disk activity LEDs will remain in the state they were in when it happened. As far as I can tell, it is frozen through and through.
I made a post on the "Base System / Storage" board, since I was convinced this all started when I attached the Supermicro AOC-USAS2-L8e card and configured the ZFS pool - the system seemed to run just fine for two weeks prior to this. I've now verified that it will also happen without that card connected, and that the issue may in fact have been present from the beginning.
I can't seem to pin down a trigger. Moving data around seems to push forward the point at which a freeze will occur. I can fairly reliably provoke a freeze within a couple of hours, by repeatedly copying a 50GB file over a Samba share, but it can also happen if the system is just sitting more or less idle.
I've tried to rule out a few things.
- Memory: I've tried two separate sets of memory, 2x4 GB and 4x8 GB. No difference.
- Network: Using a PCI-E 1x Marvell NIC instead of the Realtek 8111F on the motherboard made no difference.
- Overheating: I suspected overheating at first and still haven't ruled it out, even though
healthd -dhas yet to report anything higher than 36 degrees. Last time I tried scrubbing the zpool, it ran for 2.5 hours before locking up and then immediately locked up another 3 times, minutes after rebooting, before I aborted it. - HBA card: Still freezes when not present.
- Firmware: The motherboard is running its latest firmware and so is the SSD system drive. I don't think there's anything else with an updateable firmware, other then the HBA card, which is running its latest IT firmware from Supermicro (16), even if it isn't the latest firmware for the LSI 2008 controller (18).
- FreeBSD version: Started out with 10.0-RELEASE and switched to 10.0-STABLE. No difference.
What I haven't tried yet.
- Buying a different motherboard.
- Moving the system to a regular 2.5" harddrive, instead of the SSD (I've seen some weird behavior from SSD drives. Even though it ran just fine as the old server's system drive, it's now managed by a different and faster controller and it might be worth a shot).
- Temporarily installing Windows to see if it happens regardless of the OS.
Hardware:
PSU: Corsair RM450
Motherboard: ASUS P8H77-M Pro
CPU: Intel Core i3-3250
RAM: Corsair XMS3
Any ideas or suggestions would be welcome.
Code:
# uname -a
FreeBSD dingo.pawtuxet.dk 10.0-STABLE FreeBSD 10.0-STABLE #0 r264493: Tue Apr 15 12:42:40 CEST 2014 dingo@dingo.pawtuxet.dk:/usr/obj/usr/src/sys/GENERIC amd64
Code:
# dmesg
Copyright (c) 1992-2014 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 10.0-STABLE #0 r264493: Tue Apr 15 12:42:40 CEST 2014
dingo@dingo.pawtuxet.dk:/usr/obj/usr/src/sys/GENERIC amd64
FreeBSD clang version 3.4 (tags/RELEASE_34/final 197956) 20140216
CPU: Intel(R) Core(TM) i3-3250 CPU @ 3.50GHz (3500.07-MHz K8-class CPU)
Origin = "GenuineIntel" Id = 0x306a9 Family = 0x6 Model = 0x3a Stepping = 9
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Features2=0x3d9ae3bf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,POPCNT,TSCDLT,XSAVE,OSXSAVE,AVX,F16C>
AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
AMD Features2=0x1<LAHF>
Standard Extended Features=0x281<GSFSBASE,SMEP,ENHMOVSB>
TSC: P-state invariant, performance statistics
real memory = 34359738368 (32768 MB)
avail memory = 32979369984 (31451 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <ALASKA A M I>
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 1 package(s) x 2 core(s) x 2 SMT threads
cpu0 (BSP): APIC ID: 0
cpu1 (AP): APIC ID: 1
cpu2 (AP): APIC ID: 2
cpu3 (AP): APIC ID: 3
ioapic0 <Version 2.0> irqs 0-23 on motherboard
Cuse4BSD v0.1.33 @ /dev/cuse
kbd1 at kbdmux0
random: <Software, Yarrow> initialized
acpi0: <ALASKA A M I> on motherboard
acpi0: Power Button (fixed)
acpi0: reservation of 67, 1 (4) failed
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
cpu2: <ACPI CPU> on acpi0
cpu3: <ACPI CPU> on acpi0
hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0
Timecounter "HPET" frequency 14318180 Hz quality 950
Event timer "HPET" frequency 14318180 Hz quality 550
Event timer "HPET1" frequency 14318180 Hz quality 440
Event timer "HPET2" frequency 14318180 Hz quality 440
Event timer "HPET3" frequency 14318180 Hz quality 440
Event timer "HPET4" frequency 14318180 Hz quality 440
atrtc0: <AT realtime clock> port 0x70-0x77 irq 8 on acpi0
atrtc0: Warning: Couldn't map I/O.
Event timer "RTC" frequency 32768 Hz quality 0
attimer0: <AT timer> port 0x40-0x43,0x50-0x53 irq 0 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
pcib1: <ACPI PCI-PCI bridge> irq 16 at device 1.0 on pci0
pci1: <ACPI PCI bus> on pcib1
vgapci0: <VGA-compatible display> port 0xf000-0xf03f mem 0xf7800000-0xf7bfffff,0xe0000000-0xefffffff irq 16 at device 2.0 on pci0
agp0: <IvyBridge desktop GT1 IG> on vgapci0
agp0: aperture size is 256M, detected 262140k stolen memory
vgapci0: Boot video device
xhci0: <Intel Panther Point USB 3.0 controller> mem 0xf7d00000-0xf7d0ffff irq 16 at device 20.0 on pci0
xhci0: 32 byte context size.
xhci0: Port routing mask set to 0xffffffff
usbus0 on xhci0
pci0: <simple comms> at device 22.0 (no driver attached)
ehci0: <Intel Panther Point USB 2.0 controller> mem 0xf7d17000-0xf7d173ff irq 23 at device 26.0 on pci0
usbus1: EHCI version 1.0
usbus1 on ehci0
hdac0: <Intel Panther Point HDA Controller> mem 0xf7d10000-0xf7d13fff irq 22 at device 27.0 on pci0
pcib2: <ACPI PCI-PCI bridge> irq 16 at device 28.0 on pci0
pci2: <ACPI PCI bus> on pcib2
pcib3: <ACPI PCI-PCI bridge> irq 16 at device 28.4 on pci0
pci3: <ACPI PCI bus> on pcib3
re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port 0xe000-0xe0ff mem 0xf0004000-0xf0004fff,0xf0000000-0xf0003fff irq 16 at device 0.0 on pci3
re0: Using 1 MSI-X message
re0: Chip rev. 0x48000000
re0: MAC rev. 0x00000000
miibus0: <MII bus> on re0
rgephy0: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 1 on miibus0
rgephy0: none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow
re0: Ethernet address: d8:50:e6:41:59:24
pcib4: <ACPI PCI-PCI bridge> irq 18 at device 28.6 on pci0
pci4: <ACPI PCI bus> on pcib4
atapci0: <Marvell ATA controller> port 0xd040-0xd047,0xd030-0xd033,0xd020-0xd027,0xd010-0xd013,0xd000-0xd00f mem 0xf7c10000-0xf7c101ff irq 18 at device 0.0 on pci4
ata2: <ATA channel> at channel 0 on atapci0
ata3: <ATA channel> at channel 1 on atapci0
ehci1: <Intel Panther Point USB 2.0 controller> mem 0xf7d16000-0xf7d163ff irq 23 at device 29.0 on pci0
usbus2: EHCI version 1.0
usbus2 on ehci1
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
atapci1: <Intel Panther Point SATA300 controller> port 0xf110-0xf117,0xf100-0xf103,0xf0f0-0xf0f7,0xf0e0-0xf0e3,0xf0d0-0xf0df,0xf0c0-0xf0cf irq 19 at device 31.2 on pci0
ata4: <ATA channel> at channel 0 on atapci1
ata5: <ATA channel> at channel 1 on atapci1
pci0: <serial bus, SMBus> at device 31.3 (no driver attached)
atapci2: <Intel Panther Point SATA300 controller> port 0xf0b0-0xf0b7,0xf0a0-0xf0a3,0xf090-0xf097,0xf080-0xf083,0xf070-0xf07f,0xf060-0xf06f irq 19 at device 31.5 on pci0
ata6: <ATA channel> at channel 0 on atapci2
ata7: <ATA channel> at channel 1 on atapci2
acpi_button0: <Power Button> on acpi0
acpi_tz0: <Thermal Zone> on acpi0
acpi_tz1: <Thermal Zone> on acpi0
ppc1: <Parallel port> port 0x378-0x37f irq 5 on acpi0
ppc1: Generic chipset (NIBBLE-only) in COMPATIBLE mode
ppbus0: <Parallel port bus> on ppc1
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
ppi0: <Parallel I/O> on ppbus0
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
orm0: <ISA Option ROM> at iomem 0xc0000-0xce7ff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
ppc0: cannot reserve I/O port range
est0: <Enhanced SpeedStep Frequency Control> on cpu0
p4tcc0: <CPU Frequency Thermal Control> on cpu0
est1: <Enhanced SpeedStep Frequency Control> on cpu1
p4tcc1: <CPU Frequency Thermal Control> on cpu1
est2: <Enhanced SpeedStep Frequency Control> on cpu2
p4tcc2: <CPU Frequency Thermal Control> on cpu2
est3: <Enhanced SpeedStep Frequency Control> on cpu3
p4tcc3: <CPU Frequency Thermal Control> on cpu3
Timecounters tick every 1.000 msec
hdacc0: <Realtek ALC892 HDA CODEC> at cad 0 on hdac0
hdaa0: <Realtek ALC892 Audio Function Group> at nid 1 on hdacc0
pcm0: <Realtek ALC892 (Rear Analog 7.1/2.0)> at nid 20,22,21,23 and 24,26 on hdaa0
pcm1: <Realtek ALC892 (Front Analog)> at nid 27 and 25 on hdaa0
pcm2: <Realtek ALC892 (Rear Digital)> at nid 30 on hdaa0
pcm3: <Realtek ALC892 (Onboard Digital)> at nid 17 on hdaa0
hdacc1: <Intel Panther Point HDA CODEC> at cad 3 on hdac0
hdaa1: <Intel Panther Point Audio Function Group> at nid 1 on hdacc1
pcm4: <Intel Panther Point (HDMI/DP 8ch)> at nid 5 on hdaa1
pcm5: <Intel Panther Point (HDMI/DP 8ch)> at nid 7 on hdaa1
random: unblocking device.
usbus0: 5.0Gbps Super Speed USB v3.0
usbus1: 480Mbps High Speed USB v2.0
usbus2: 480Mbps High Speed USB v2.0
ugen1.1: <Intel> at usbus1
uhub0: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1
ugen0.1: <0x8086> at usbus0
uhub1: <0x8086 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
ugen2.1: <Intel> at usbus2
uhub2: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus2
uhub1: 8 ports with 8 removable, self powered
uhub0: 2 ports with 2 removable, self powered
uhub2: 2 ports with 2 removable, self powered
ugen1.2: <vendor 0x8087> at usbus1
uhub3: <vendor 0x8087 product 0x0024, class 9/0, rev 2.00/0.00, addr 2> on usbus1
ugen2.2: <vendor 0x8087> at usbus2
uhub4: <vendor 0x8087 product 0x0024, class 9/0, rev 2.00/0.00, addr 2> on usbus2
uhub3: 6 ports with 6 removable, self powered
uhub4: 8 ports with 8 removable, self powered
ugen2.3: <Logitech> at usbus2
uhub5: <Logitech Logitech BT Mini-Receiver, class 9/0, rev 2.00/49.00, addr 3> on usbus2
uhub5: 3 ports with 1 removable, bus powered
ugen2.4: <Logitech> at usbus2
ukbd0: <Logitech Logitech BT Mini-Receiver, class 0/0, rev 2.00/49.00, addr 4> on usbus2
kbd2 at ukbd0
ugen2.5: <Logitech> at usbus2
ada0 at ata4 bus 0 scbus2 target 0 lun 0
ada0: <Samsung SSD 840 PRO Series DXM06B0Q> ATA-9 SATA 3.x device
ada0: Serial Number S12RNEAD401503J
ada0: 600.000MB/s transfers (SATA 3.x, UDMA5, PIO 8192bytes)
ada0: 244198MB (500118192 512 byte sectors: 16H 63S/T 16383C)
ada0: Previously was known as ad8
ugen2.6: <vendor 0x2548> at usbus2
SMP: AP CPU #1 Launched!
SMP: AP CPU #2 Launched!
SMP: AP CPU #3 Launched!
Timecounter "TSC-low" frequency 1750035404 Hz quality 1000
Trying to mount root from ufs:/dev/ada0p2 [rw]...
WARNING: / was not properly dismounted
ums0: <Logitech Logitech BT Mini-Receiver, class 0/0, rev 2.00/49.00, addr 5> on usbus2
ums0: 14 buttons and [XYZT] coordinates ID=2
ums0: 8 buttons and [XYZT] coordinates ID=5
umodem0: <vendor 0x2548 product 0x1002, class 0/0, rev 1.10/10.00, addr 6> on usbus2
umodem0: data interface 1, has CM over data, has break
ums1: <vendor 0x2548 product 0x1002, class 0/0, rev 1.10/10.00, addr 6> on usbus2
ums1: 3 buttons and [XY] coordinates ID=0
ipfw2 (+ipv6) initialized, divert loadable, nat loadable, default to deny, logging disabled
pid 830 (xfsettingsd), uid 1001: exited on signal 11 (core dumped)
info: [drm] Initialized drm 1.1.0 20060810
drmn0: <Intel IvyBridge> on vgapci0
info: [drm] MSI enabled 1 message(s)
info: [drm] AGP at 0xe0000000 256MB
iicbus0: <Philips I2C bus> on iicbb0 addr 0xff
iic0: <I2C generic I/O> on iicbus0
iic1: <I2C generic I/O> on iicbus1
iicbus2: <Philips I2C bus> on iicbb1 addr 0x0
iic2: <I2C generic I/O> on iicbus2
iic3: <I2C generic I/O> on iicbus3
iicbus4: <Philips I2C bus> on iicbb2 addr 0x0
iic4: <I2C generic I/O> on iicbus4
iic5: <I2C generic I/O> on iicbus5
iicbus6: <Philips I2C bus> on iicbb3 addr 0x0
iic6: <I2C generic I/O> on iicbus6
iic7: <I2C generic I/O> on iicbus7
iicbus8: <Philips I2C bus> on iicbb4 addr 0x0
iic8: <I2C generic I/O> on iicbus8
iic9: <I2C generic I/O> on iicbus9
iicbus10: <Philips I2C bus> on iicbb5 addr 0x0
iic10: <I2C generic I/O> on iicbus10
iic11: <I2C generic I/O> on iicbus11
iicbus12: <Philips I2C bus> on iicbb6 addr 0x0
iic12: <I2C generic I/O> on iicbus12
iic13: <I2C generic I/O> on iicbus13
iicbus14: <Philips I2C bus> on iicbb7 addr 0x0
iic14: <I2C generic I/O> on iicbus14
iic15: <I2C generic I/O> on iicbus15
info: [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
info: [drm] Driver supports precise vblank timestamp query.
drmn0: taking over the fictitious range 0xe0000000-0xf0000000
info: [drm] GMBUS timed out, falling back to bit banging on pin 7 [gmbus bus dpd]
info: [drm] Initialized i915 1.6.0 20080730