Previously stable server reboots during jail usage

dch

Developer
This previously stable (2+ years) server has had 3 unexpected reboots in the last week, each time during jail-related work (creation, deletion, poudriere runs). zpool scrubs are clean, and there is no log or trace of a crash or error prior to any of the reboots. I'm considering a h/w issue but don't have a lot to go on. Ideas welcomed! I do see somethiing in dmidecode (see below) but I'm not familiar with its output to know what the issue might actually be.

- is there anything else I should check?

- is the dmidecode "error" actually an error?

Code:
Handle 0x0057, DMI type 16, 15 bytes
Physical Memory Array
   Location: System Board Or Motherboard
   Use: System Memory
   Error Correction Type: Single-bit ECC
   Maximum Capacity: 32 GB
   Error Information Handle: 0x0059
   Number Of Devices: 4

Handle 0x0056, DMI type 17, 28 bytes
Memory Device
   Array Handle: 0x0057
   Error Information Handle: 0x005A

system config:

Code:
# config
# /etc/rc.conf
hostname="wintermute"
panicmail_autosubmit="YES"
dumpdev="AUTO"

# uname
FreeBSD wintermute 10.2-RELEASE FreeBSD 10.2-RELEASE #0 r286666: Wed Aug 12 15:26:37 UTC 2015  root@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64

Logs:

Code:
# dmesg
# full dmesg is at https://dpaste.de/awrT/raw
FreeBSD 10.2-RELEASE #0 r286666: Wed Aug 12 15:26:37 UTC 2015
root@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64
FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512
CPU: Intel(R) Xeon(R) CPU E3-1245 V2 @ 3.40GHz (3400.09-MHz K8-class CPU)
Origin="GenuineIntel" Id=0x306a9 Family=0x6 Model=0x3a Stepping=9
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Features2=0x7fbae3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
AMD Features2=0x1<LAHF>
Structured Extended Features=0x281<FSGSBASE,SMEP,ERMS>
XSAVE Features=0x1<XSAVEOPT>
VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
TSC: P-state invariant, performance statistics
real memory = 34359738368 (32768 MB)
avail memory = 33195868160 (31658 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <ALASKA A M I>
FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
FreeBSD/SMP: 1 package(s) x 4 core(s) x 2 SMT threads
cpu0 (BSP): APIC ID: 0
cpu1 (AP): APIC ID: 1
cpu2 (AP): APIC ID: 2
cpu3 (AP): APIC ID: 3
cpu4 (AP): APIC ID: 4
cpu5 (AP): APIC ID: 5
cpu6 (AP): APIC ID: 6
cpu7 (AP): APIC ID: 7

# /var/log/messages
Oct 19 05:24:28 wintermute last message repeated 2 times
Oct 21 09:55:19 wintermute syslogd: kernel boot file is /boot/kernel/kernel
Oct 21 09:55:19 wintermute kernel: Copyright (c) 1992-2015 The FreeBSD Project.

The only thing I can find is some of the dmidecode info *might* suggest an ECC RAM error in one of the banks, but as I've never looked at this info before I am driving blind:

Code:
# dmidecode
# full output at https://dpaste.de/5wAy/raw
...
Handle 0x0057, DMI type 16, 15 bytes
Physical Memory Array
   Location: System Board Or Motherboard
   Use: System Memory
   Error Correction Type: Single-bit ECC
   Maximum Capacity: 32 GB
   Error Information Handle: 0x0059
   Number Of Devices: 4

Handle 0x0056, DMI type 17, 28 bytes
Memory Device
   Array Handle: 0x0057
   Error Information Handle: 0x005A
   Total Width: 128 bits
   Data Width: 64 bits
   Size: 8192 MB
   Form Factor: DIMM
   Set: None
   Locator: ChannelA-DIMM0
   Bank Locator: BANK 0
   Type: DDR3
   Type Detail: Synchronous
   Speed: 1333 MHz
   Manufacturer: Kingston
   Serial Number: 2E120C77
   Asset Tag: 9876543210
   Part Number: 9965525-100.A00LF
   Rank: 2

Handle 0x0059, DMI type 18, 23 bytes
32-bit Memory Error Information
   Type: OK
   Granularity: Unknown
   Operation: Unknown
   Vendor Syndrome: Unknown
   Memory Array Address: Unknown
   Device Address: Unknown
   Resolution: Unknown

Handle 0x0058, DMI type 20, 19 bytes
Memory Device Mapped Address
   Starting Address: 0x00000000000
   Ending Address: 0x001FFFFFFFF
   Range Size: 8 GB
   Physical Device Handle: 0x0056
   Memory Array Mapped Address Handle: 0x0062
   Partition Row Position: Unknown
   Interleave Position: 1
   Interleaved Data Depth: 2

Handle 0x005B, DMI type 17, 28 bytes
Memory Device
   Array Handle: 0x0057
   Error Information Handle: No Error
   Total Width: 128 bits
   Data Width: 64 bits
   Size: 8192 MB
   Form Factor: DIMM
   Set: None
   Locator: ChannelA-DIMM1
   Bank Locator: BANK 1
   Type: DDR3
   Type Detail: Synchronous
   Speed: 1333 MHz
   Manufacturer: Kingston
   Serial Number: 3012D576
   Asset Tag: 9876543210
   Part Number: 9965525-100.A00LF
   Rank: 2

Handle 0x005A, DMI type 18, 23 bytes
32-bit Memory Error Information
   Type: OK
   Granularity: Unknown
   Operation: Unknown
   Vendor Syndrome: Unknown
   Memory Array Address: Unknown
   Device Address: Unknown
   Resolution: Unknown

Handle 0x005D, DMI type 20, 19 bytes
Memory Device Mapped Address
   Starting Address: 0x00400000000
   Ending Address: 0x005FFFFFFFF
   Range Size: 8 GB
   Physical Device Handle: 0x005B
   Memory Array Mapped Address Handle: 0x0062
   Partition Row Position: Unknown
   Interleave Position: 1
   Interleaved Data Depth: 2

...
 
Back
Top