Solved bhyve - FreeBSD 11.0 Release - FBUF, guest coredump, if uptime more 17 - 20 days...

Hi ))

I have four production servers under bhyve.
Host system - FreeBSD 11.0-RELEASE #0 r306362

But server on Windows Server 2012 R2 when uptime more 17 - 20 days...
At moment when i connect use VNC to server, bhyve(guest OS) coredump
If don't use VNC (fbuf), server uptime 100-200 days and more, it's ok and work very well!!!

script for Windows Server 2012 R2
Code:
bhyve -c 4 -s 7,fbuf,tcp=192.168.254.120:5920,wait \
-s 0,hostbridge \
-s 3,ahci-hd,winsrv2012r2.hdd \
-s 4,ahci-cd,null-cd.iso \
-s 5,ahci-hd,data_winsrv2012r2.hdd \
-s 6,ahci-hd,data.700G.winsrv2012r2.hdd \
-s 10,virtio-net,tap3 \
-s 31,lpc -l bootrom,BHYVE_UEFI.fd \
-m 8G -H -w winsrv2012wrk


One time coredump was with FreeBSD 11.0 under bhyve, but uptime was 34 days...
Two times coredump was with Windows 10 under bhyve, uptime was 27 days...

Total info
if use fbuf in config after 15 - 20 days, еhere's a possibility connect to server use VNC, server is down and coredump.

I use usually UltraVNC on Windows and vncviewer under MATE on FreeBSD :)

Uses fbuf not good idea for a long uptime ?
Please tell me, why this ?
 
  • Thanks
Reactions: Oko
Are you able to run gdb on the coredump ? (use gdb from ports, and you may need to have the 11.0 debug symbols installed).
 
Ok, i will try use gdb... onetime i used this, i think remember or reading about, how use gdb :)
 
Are you able to run gdb on the coredump ? (use gdb from ports, and you may need to have the 11.0 debug symbols installed).

Try attach to process bhyve

Check ID process
Code:
# ps axu | grep bhyve
root    11494  100,0  7,2 7420436 2403276  2  S+   19:20               3:59,51 bhyve: win10pro_2 (bhyve)
root    11493    0,0  0,0   13184    2460  2  I+   19:20               0:00,00 /bin/sh ./bhyve_win10pro_2
root    11580    0,0  0,0   14836    2444  3  S+   19:33               0:00,00 grep bhyve

GDB 7.12v (install)
Code:
# pkg ins gdb


Code:
 # gdb712
GNU gdb (GDB) 7.12 [GDB v7.12 for FreeBSD]
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd11.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
(gdb) attach 11494
Attaching to process 11494
Couldn't get registers: Device busy.
Couldn't get registers: Device busy.
(gdb) Reading symbols from /usr/sbin/bhyve...(no debugging symbols found)...done.
[New LWP 100350 of process 11494]
[New LWP 100351 of process 11494]
[New LWP 100352 of process 11494]
[New LWP 100353 of process 11494]
[New LWP 100354 of process 11494]
[New LWP 100355 of process 11494]
[New LWP 100356 of process 11494]
[New LWP 100357 of process 11494]
[New LWP 100358 of process 11494]
[New LWP 100359 of process 11494]
[New LWP 100360 of process 11494]
Reading symbols from /usr/lib/libvmmapi.so.5...(no debugging symbols found)...done.
Reading symbols from /lib/libmd.so.6...(no debugging symbols found)...done.
Reading symbols from /lib/libz.so.6...(no debugging symbols found)...done.
Reading symbols from /lib/libthr.so.3...(no debugging symbols found)...done.
Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done.
Reading symbols from /lib/libutil.so.9...(no debugging symbols found)...done.
Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols found)...done.
[Switching to LWP 100144 of process 11494]
0x000000080121198a in _kevent () from /lib/libc.so.7
Quit
(gdb) quit

Above all, i am doing well ?
You saying about debugging symbols, tell me please, how i must install debugging symbols ?
 
grehan@
Look at this please.
Today was coredump on the guest OS under bhyve.

Guest OS under bhyve
Services on this server is not heavy (router for WiFi, LAN and VLAN separate networks)
Uptime was 21 - 23 days from last coredump.
Code:
# uname -a
FreeBSD srv-net 11.0-RELEASE-p1 FreeBSD 11.0-RELEASE-p1 #2 r306932M: Mon Oct 10 02:44:16 MSK 2016     root@srv-net:/usr/obj/usr/src/sys/GENERIC  amd64

Code:
bhyve -c 1 -s 7,fbuf,tcp=192.168.55.200:5922,wait \
-s 0,hostbridge \
-s 3,ahci-hd,router_fbsd.img \
-s 10:0,virtio-net,tap7 \
-s 10:1,virtio-net,tap8 \
-s 31,lpc -l bootrom,BHYVE_UEFI.fd \
-m 512M -H -w router_fbsd

Debug on the Host system
Code:
# uname -a
FreeBSD srv 11.0-RELEASE FreeBSD 11.0-RELEASE #0 r306362: Tue Sep 27 12:20:00 KRAT 2016     root@srv-aero:/usr/obj/usr/src/sys/GENERIC  amd64

Dmesg
Code:
FreeBSD 11.0-RELEASE #0 r306362: Tue Sep 27 12:20:00 KRAT 2016
    root@srv-aero:/usr/obj/usr/src/sys/GENERIC amd64
FreeBSD clang version 3.8.0 (tags/RELEASE_380/final 262564) (based on LLVM 3.8.0)
VT(efifb): resolution 1280x1024
CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz (3997.77-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x306c3  Family=0x6  Model=0x3c  Stepping=3
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x7ffafbbf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,SDBG,FMA,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
  AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
  AMD Features2=0x21<LAHF,ABM>
  Structured Extended Features=0x2fbb<FSGSBASE,TSCADJ,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,NFPUSG>
  XSAVE Features=0x1<XSAVEOPT>
  VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
  TSC: P-state invariant, performance statistics
real memory  = 34359738368 (32768 MB)
avail memory = 33237377024 (31697 MB)



Code:
# gdb bhyve bhyve.core
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...
Core was generated by `bhyve'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/lib/libvmmapi.so.5...Reading symbols from /usr/lib/debug//usr/lib/libvmmapi.so.5.debug...done.
done.
Loaded symbols for /usr/lib/libvmmapi.so.5
Reading symbols from /lib/libmd.so.6...Reading symbols from /usr/lib/debug//lib/libmd.so.6.debug...done.
done.
Loaded symbols for /lib/libmd.so.6
Reading symbols from /lib/libz.so.6...Reading symbols from /usr/lib/debug//lib/libz.so.6.debug...done.
done.
Loaded symbols for /lib/libz.so.6
Reading symbols from /lib/libthr.so.3...Reading symbols from /usr/lib/debug//lib/libthr.so.3.debug...done.
done.
Loaded symbols for /lib/libthr.so.3
Reading symbols from /lib/libc.so.7...Reading symbols from /usr/lib/debug//lib/libc.so.7.debug...done.
done.
Loaded symbols for /lib/libc.so.7
Reading symbols from /lib/libutil.so.9...Reading symbols from /usr/lib/debug//lib/libutil.so.9.debug...done.
done.
Loaded symbols for /lib/libutil.so.9
Reading symbols from /libexec/ld-elf.so.1...Reading symbols from /usr/lib/debug//libexec/ld-elf.so.1.debug...done.
done.
Loaded symbols for /libexec/ld-elf.so.1
#0  0x0000000800c9afa7 in deflate_fast (s=<value optimized out>, flush=2) at /usr/src/lib/libz/deflate.c:1654
1654                INSERT_STRING(s, s->strstart, hash_head);
[New Thread 82460ea00 (LWP 100243/<unknown>)]
[New Thread 801a19c00 (LWP 100238/<unknown>)]
[New Thread 801a19700 (LWP 100237/<unknown>)]
[New Thread 801a19200 (LWP 100236/<unknown>)]
[New Thread 801a18d00 (LWP 100234/<unknown>)]
[New Thread 801a18800 (LWP 100233/<unknown>)]
[New Thread 801a18300 (LWP 100232/<unknown>)]
[New Thread 801a17e00 (LWP 100231/<unknown>)]
[New Thread 801a17900 (LWP 100230/<unknown>)]
[New Thread 801a17400 (LWP 100229/<unknown>)]
[New Thread 801a16f00 (LWP 100228/<unknown>)]
[New Thread 801a16a00 (LWP 100227/<unknown>)]
[New Thread 801a16500 (LWP 100226/<unknown>)]
[New Thread 801a16000 (LWP 100095/<unknown>)]
(gdb) bt
#0  0x0000000800c9afa7 in deflate_fast (s=<value optimized out>, flush=2) at /usr/src/lib/libz/deflate.c:1654
#1  0x0000000800c9a303 in deflate (strm=<value optimized out>, flush=<value optimized out>) at /usr/src/lib/libz/deflate.c:905
#2  0x00000000004338ea in rfb_send_all (rc=0x801a7c000, cfd=6, gc=0x801a370c0) at /usr/src/usr.sbin/bhyve/rfb.c:411
#3  0x00000000004329b3 in rfb_send_screen (rc=0x801a7c000, cfd=6, all=0) at /usr/src/usr.sbin/bhyve/rfb.c:570
#4  0x0000000000432d62 in rfb_wr_thr (arg=0x801a7c000) at /usr/src/usr.sbin/bhyve/rfb.c:719
#5  0x0000000800eb1b55 in thread_start (curthread=<value optimized out>) at /usr/src/lib/libthr/thread/thr_create.c:289
#6  0x00007fffde1f0000 in ?? ()
Cannot access memory at address 0x7fffde3f0000
Current language:  auto; currently minimal
(gdb)
 
Thanks for that. Could I get some additional gdb commands to be run on the core ?

frame 2
p/x *rc

Yes of course

Code:
(gdb) frame 2
#2  0x00000000004338ea in rfb_send_all (rc=0x801a7c000, cfd=6, gc=0x801a370c0) at /usr/src/usr.sbin/bhyve/rfb.c:411
411                     err = deflate(&rc->zstream, Z_SYNC_FLUSH);

Code:
(gdb) p/x *rc
$1 = {sfd = 0x5, tid = 0x801a18d00, cfd = 0x6, width = 0x400, height = 0x300, enc_raw_ok = 0x1, enc_zlib_ok = 0x1, enc_resize_ok = 0x1, zstream = {next_in = 0x822806000, avail_in = 0x2fa000, total_in = 0x0,
    next_out = 0x82496f34d, avail_out = 0x7b88c3, total_out = 0x4004, msg = 0x0, state = 0x824615800, zalloc = 0x800ca2700, zfree = 0x800ca2710, opaque = 0x0, data_type = 0x0, adler = 0x1, reserved = 0x0},
  zbuf = 0x824800000, zbuflen = 0x0, conn_wait = 0x0, sending = 0x1, mtx = 0x801a3a6c0, cond = 0x801a370e0, hw_crc = 0x1, crc = 0x801a83a00, crc_tmp = 0x801ad4180, crc_width = 0x400, crc_height = 0x300}
(gdb)
 
Solution
To prevent coredump, disable zlib compression, if you connect via VNC-client to server and uptime more 18 days.

Example
Code:
# vncviewer --help
TightVNC Viewer version 1.3.10

Code:
# vncviewer -encodings raw 192.168.55.200:5922


Many thanks to grehan@ for helping solve the problem!
 
Back
Top