Help with a full system freeze

CraigA_UK · Apr 3, 2017

Hi all,

I'm trying to troubleshoot a problem that's been going on for a while on a production FreeBSD server I'm running, as recently the issue is getting worse.
Essentially, the problem is that after a certain period of time the server completely freezes. It will ping, but you can't access the web service it's running (served via a Java engine) and you can't SSH the box.

The freezes can occur as infrequently as monthly, to as frequently as daily. The server is racked in a data centre, but putting a keyboard/screen on it reveals nothing. I can't get any input/output to the screen.

A reboot brings everything back to life.

I've tried:
- Hardware checks
- New PSU
- New MB/CPU/RAM (ECC Ram)
- New drives

Everything except swapping out the application itself, or rebuilding the server.

The box has FreeBSD-10.2-RELEASE on it.
It has 16GB of RAM, an 8 drive ZFS array (7 disks in the zpool - RaidZ3 - with 1 spare). It has a dedicated drive for boot OS, and now a dedicated 250GB SATA disk for swap as I thought this was the issue as swap usage seemed to grow and it was originally swapping on ZFS - not ideal.

The server is running a backup utility called Syncrify, which runs via Java and is basically a fancy HTTPS wrapper around Rsync - so when a lot of clients are backing up together there will be a lot of random disk I/O reads and writes, plus high CPU as the blocks on disk are checked.

I had PuTTY running to the box today, with top running when it froze. I'm not sure if this can help any experts shed any light on things?

The machine will never come back to life on it's own, it will need a full power off/on.
I checked dmesg (and dmesg.yesterday) earlier today after this afternoon's reboot and also 'messages' and neither had anything resembling any errors or issues at all.

Any troubleshooting steps I should take before I throw this out the window would be most appreciated!

SirDice · Apr 4, 2017

Did you perhaps enable dedub on the ZFS pool? That can 'hang up' a pool completely if you don't have enough RAM.

Another thing to look for is in /var/log/messages, specifically errors like these:

Code:

Jan 19 13:23:03 molly kernel: (da4:mps0:0:7:0): WRITE(10). CDB: 2a 00 21 9f 84 8b 00 00 80 00
Jan 19 13:23:03 molly kernel: (da4:mps0:0:7:0): CAM status: Command timeout
Jan 19 13:23:03 molly kernel: (da4:mps0:0:7:0): Retrying command
Jan 19 13:23:11 molly kernel: (da4:mps0:0:7:0): WRITE(10). CDB: 2a 00 21 9f 84 8b 00 00 80 00 length 65536 SMID 534 terminated ioc 8
04b scsi 0 state c xfer 0
Jan 19 13:23:11 molly kernel: (da4:mps0:0:7:0): WRITE(10). CDB: 2a 00 21 9f 84 8b 00 00 80 00
Jan 19 13:23:11 molly kernel: (da4:mps0:0:7:0): CAM status: CCB request completed with an error
Jan 19 13:23:11 molly kernel: (da4:mps0:0:7:0): Retrying command

These were due to a failty disk I had in the system. That resulted in the same behavior you're getting, not being able to login or use it but still pingable.

Terry_Kennedy · Apr 4, 2017

CraigA_UK said:
Essentially, the problem is that after a certain period of time the server completely freezes. It will ping, but you can't access the web service it's running (served via a Java engine) and you can't SSH the box.
[...]
Any troubleshooting steps I should take before I throw this out the window would be most appreciated!

I go into a general discussion of first-level hang diagnosis here.

Since you've done the first part of that already and posted the screenshot... You're out of memory. Look at the various processes showing 0K in the RES column. You're also using 3.2GB of swap space. It looks like you have 32GB of RAM on that box (even though you said 16GB in your post - the numbers don't add up), so you'll need to find where all that memory is going. Does this problem happen all of a sudden or do things get slower and slower over time and then grind to a halt? The latter would indicate more and more processes being pushed out to swap space, while the former could indicate something like a disk hang where no I/O to the disk completes.

CraigA_UK · Apr 4, 2017

Hi Terry,

Thanks for the feedback - yes, you're right, the RAM is 32GB.
I failed the services onto a secondary server, which has different hardware and the same software (and FreeBSD version) and again within hours the same issue. I've attached the grab of that one - looks to be the same issue.

The only thing running, barring base O/S services, is Java (for the app that runs - which is limited to 1.5GB of RAM maximum per server). The RAM limit seems correct, as the app will show it has 1.5GB of RAM available to it - and is usually using around 300MB-450MB in use.

There is no ipfw or any other services that I've custom installed. As you can see, there are only 18 processes running on the 1st screenshot.

Both systems have ZFS. No dedupe. Compression (LZ4) is enabled on the pool.
The second system had a zpool iostat running every 2 seconds, as seen on the left hand panel. No read/writes occurred at the last update back to the screen.

I can't really tell if it's slowing down gradually, as I'll log in to the web service and it's fine, come back and check in an hour and it's dead. I suspect it's not slowing up, but is just hitting a point where it bails and freezes up otherwise I think I'd have noticed.

Now, I'm very much a BSD/*nix novice and don't profess much other than basic sysadmin skills - so I'm wondering how the system allows the RAM to be exhausted. Surely that's what swap is for?
I appreciate things would run VERY slow, while swapping, but the box should at least keep living. I can leave these servers for a week and they will never come back again.

Where should I/we look next? Kernel parameters/ZFS tuning?
How do I see where the memory is being consumed? From the 2nd sever, which has less RAM, it looks like Java is grabbing more (from the 'Size' column at least).

Thanks for your help, I'll have a read of your guide now as I'm also running FreeBSD/ZFS at home and on-site for network storage (where it's been working well for a few years so far!)

CraigA_UK · Apr 4, 2017

Oh, and both the top output and zpool iostat stop at the point of failure.
PuTTY doesn't throw a disconnect error, but no more updates. Can't open a new SSH session once it's frozen.

Terry_Kennedy · Apr 4, 2017

CraigA_UK said:
I failed the services onto a secondary server, which has different hardware and the same software (and FreeBSD version) and again within hours the same issue. I've attached the grab of that one - looks to be the same issue.

Looks like it. This one has a lot less installed memory (8GB?) so I'd expect it to happen faster.

There is no ipfw or any other services that I've custom installed. As you can see, there are only 18 processes running on the 1st screenshot.

Try doing a top -S so we can see system processes as well. If you can capture data that's scrolled off the top of your terminal emulator window, running ps -alx will also show some possibly-useful additional information (use a 132-column or wider terminal window).

Both systems have ZFS. No dedupe. Compression (LZ4) is enabled on the pool.

I don't see anything in the top(1) headers to indicate that ZFS is "running away" with memory.

The second system had a zpool iostat running every 2 seconds, as seen on the left hand panel. No read/writes occurred at the last update back to the screen.

Is the system root and/or swap on ZFS, or on a traditional UFS partition? If there are other things beside the ZFS pool, gstat might be more useful that zpool iostat.

I can't really tell if it's slowing down gradually, as I'll log in to the web service and it's fine, come back and check in an hour and it's dead.

You could run a script that repeats date; dd if=some-big-file of=/dev/null; sleep 30 (pick "some big file" to be something on the root partition that's a MB or two and a sleep value that doesn't cause this to hog enough resources to affect your production use of the system). Then run it and see if the dates (actually times) reported get further and further apart, or if things just stop.

Now, I'm very much a BSD/*nix novice and don't profess much other than basic sysadmin skills - so I'm wondering how the system allows the RAM to be exhausted. Surely that's what swap is for?

Yes. But at some point it becomes non-recoverable. With top -S, we'll see if (for example) the pager is consuming 100% CPU. If there is a disk I/O hang, then there's no way of paging stuff out or back in. We can probably excluded the classic IBM 370 "Oops! We paged out the paging code!" as that would lead to a panic.

Where should I/we look next? Kernel parameters/ZFS tuning?

Please post the contents of /boot/loader.conf and /etc/sysctl.conf files. /etc/rc.conf might also be useful, but it contains possibly-sensitive information (host IP, etc.) so you should look it over and censor anything like that before posting it - or you can wait on that file for later if we don't find anything elsewhere.

All of the above assumes you're running the GENERIC kernel with nothing in /etc.make.conf. If either of those assumptions are incorrect, please post the contents of your kernel config file and / or /etc/make.conf.

I'm surprised that nobody else has mentioned it already, but FreeBSD 10.2 has been end-of-life (EoL) since the beginning of this year. You might want to try updating to either 10.3 or 11.0. 10.3 will probably have fewer surprises. The easy way to do this is with freebsd-update(8) (which I've personally never used). You'll want to take full backups first and have a plan to get back where you started from, "just in case". You'll want the latest 10.3-Px or 11.0-Px release, not plain old -RELEASE (the -Px have patches for security and errata applied, while -RELEASE are the unpatched original releases).

CraigA_UK · Apr 5, 2017

Thanks Terry - great help!

OK, here is the output from top -S soon after reboot:

Code:

UID PID PPID CPU PRI NI     VSZ    RSS MWCHAN   STAT TT       TIME COMMAND
  0   0    0   0  -8  0       0   4736 -        DLs   -    0:14.13 [kernel]
  0   1    0   0  20  0    9476    816 wait     SLs   -    0:00.09 /sbin/init --
  0   2    0   0 -16  0       0     64 -        DL    -    0:00.00 [cam]
  0   3    0   0  -8  0       0    112 zvol:io  DL    -    0:00.60 [zfskern]
  0   4    0   0 -16  0       0     16 pftm     DL    -    0:00.12 [pf purge]
  0   5    0   0 -16  0       0     16 waiting_ DL    -    0:00.00 [sctp_iterator]
  0   6    0   0 -16  0       0     16 idle     DL    -    0:00.00 [enc_daemon0]
  0   7    0   0 -16  0       0     32 umarcl   DL    -    0:00.06 [pagedaemon]
  0   8    0   0 -16  0       0     16 psleep   DL    -    0:00.00 [vmdaemon]
  0   9    0   0 -16  0       0     16 pollid   DL    -    0:00.00 [idlepoll]
  0  10    0   0 -16  0       0     16 audit_wo DL    -    0:00.00 [audit]
  0  11    0   0 155  0       0    256 -        RL    -  167:48.78 [idle]
  0  12    0   0 -72  0       0    992 -        WL    -    0:07.79 [intr]
  0  13    0   0  -8  0       0     48 -        DL    -    0:00.02 [geom]
  0  14    0   0 -16  0       0     16 -        DL    -    0:00.54 [rand_harvestq]
  0  15    0   0 -68  0       0    384 -        DL    -    0:00.02 [usb]
  0  16    0   0 155  0       0     16 pgzero   DL    -    0:00.00 [pagezero]
  0  17    0   0 -16  0       0     16 psleep   DL    -    0:00.00 [bufdaemon]
  0  18    0   0  16  0       0     16 syncer   DL    -    0:00.19 [syncer]
  0  19    0   0 -16  0       0     16 vlruwt   DL    -    0:00.35 [vnlru]
  0 155    1   0  52  0   12352   1748 pause    Is    -    0:00.00 adjkerntz -i
  0 388    1   0  20  0   13624   4692 select   Is    -    0:00.00 /sbin/devd
  0 525    1   0  20  0   14520   2108 select   Ss    -    0:00.03 /usr/sbin/syslogd -s
  0 683    1   0  20  0   14488   1940 select   Ss    -    0:00.09 /usr/sbin/powerd
  0 817    1   0  20  0   16612   2224 nanslp   Is    -    0:00.00 /usr/sbin/cron -s
  0 873    1   0  20  0   61232   6368 select   Is    -    0:00.00 /usr/sbin/sshd
  0 949  873   0  20  0   86500   6988 select   Is    -    0:00.04 sshd: ssh [priv] (sshd)
 44 952  949   0  20  0   86500   7048 select   S     -    0:00.01 sshd: ssh@pts/0 (sshd)
  0 827    1   0  22  0 3473988 734572 -        R    v0-   2:37.86 /usr/local/openjdk8-jre/bin/java -server -Xmx1536m -cp :lib
  0 919    1   0  52  0   70132   9412 piperd   Is+  v0    0:00.03 /usr/local/bin/php /etc/zfsguru-login.sh -fp root
  0 924  919   0  52  0   17044   2828 select   I+   v0    0:00.01 /usr/local/bin/cdialog --clear --title Welcome to ZFSguru -
  0 920    1   0  52  0   14508   1976 ttyin    Is+  v1    0:00.00 /usr/libexec/getty Pc ttyv1
 44 953  952   0  24  0   17844   3456 wait     Ss    0    0:00.01 -bash (bash)
  0 963  953   0  26  0   47736   2696 wait     S     0    0:00.01 su
  0 964  963   0  20  0   17844   3460 wait     S     0    0:00.00 su (bash)
  0 974  964   0  20  0   18760   2192 -        R+    0    0:00.00 ps -alx

Here is the output of /boot/loader.conf

Code:

#
## loader.conf
#

# mimic TLER behavior
# note: error recovery is useful in cases where you lost your redundancy!
kern.cam.ada.default_timeout="7"
kern.cam.ada.retry_count="1"
kern.cam.da.default_timeout="7"
kern.cam.da.retry_count="1"

## optional power saving settings
# lower kernel frequency from 1000 to 100 times a second (danger!)
#kern.hz="100"
# disable ACPI CPU frequency throttling (better performance, higher idle power)
hint.p4tcc.0.disabled="1"
hint.acpi_throttle.0.disabled="1"

# do not power devices that have no FreeBSD driver attached to them
# note: to use this effectively, you must recompile a custom kernel with only
# the drivers enabled you want to be functional, leaving the others without
# driver and also without power.
hw.pci.do_power_nodriver="3"

# disable the memory test at boot time. On systems with alot of RAM this can
# slow down the booting by several minutes; use MemTest86 to test RAM memory.
hw.memtest.test="0"

## other tuning
kern.maxfiles="950000"
# fix for error 'swap zone exhausted, increase kern.maxswzone'
kern.maxswzone="512m"

# fix for AOC-SAT2-MV8 (Marvell) controllers
hw.hptrr.attach_generic="0"

# fix for USB Root-on-ZFS that do not shutdown properly
hw.usb.no_shutdown_wait=1

# quicker boot time by shortening boot menu countdown
autoboot_delay="2"

# delay booting to allow some devices to settle (needed for some systems)
#kern.cam.boot_delay="10000"
# disable id labels
kern.geom.label.gptid.enable="0"
kern.geom.label.ufsid.enable="0"

# disable legacy device mappings (ada->ad)
kern.cam.ada.legacy_aliases="0"

# enable vt/Newcons console driver supporting KMS graphics drivers
kern.vty=vt

# Intel GPU power saving when using graphics mode saves about 3W of power:
drm.i915.enable_rc6=7

## mandatory kernel modules (REQUIRED)
zfs_load="YES"
geom_uzip_load="YES"

## recommended kernel modules
# ahci (TRIM capable) driver
ahci_load="YES"
# package filter firewall
pf_load="YES"
# asynchronous I/O kernel module
aio_load="YES"


## optional kernel modules
# silicon image driver
siis_load="YES"

# end #
kern.cam.boot_delay="2000"
printk.time=1

Here is /etc/sysctl.conf

Code:

#
#  This file is read when going to multi-user and its contents piped thru
#  ``sysctl'' to adjust kernel values.  ``man 5 sysctl.conf'' for details.
#

# Disable .core files being written when an application crashes
kern.coredump=0

# Uncomment this to prevent users from seeing information about processes that
# are being run under another UID.
#security.bsd.see_other_uids=0

vm.defer_swapspace_pageouts=1

Note - I've only just added the bottom line based on this thread - https://forums.freebsd.org/threads/59513/ - just to see if it helps.

I think I'm running the generic kernel, as far as I know. There is no /etc/make.conf file.

CraigA_UK · Apr 5, 2017

Oh, both servers are root-on-ZFS
First server (not the one with the file outputs above) has swap on a dedicated SATA disk
Second server (outputs above) has swap on the ZFS pool (not good, I know)

Terry_Kennedy · Apr 6, 2017

CraigA_UK said:
Thanks Terry - great help!

OK, here is the output from top -S soon after reboot:

ps -alx, actually.

Can you run that in a loop (with a sleep in between) so we can see how it changes when the problem happens?

Here is the output of /boot/loader.conf

yuck!

That looks like a cookbook cut-and-paste that applied to some older FreeBSD release. Try it with an empty file and add anything you need to get your environment working. In particular, a number of drivers you're trying to load are part of the GENERIC kernel anyway, memory testing has been skipped since 10.0, and even in 9.x it printed a message about the test. Make sure you have some sort of console access so you can fix things up if you really need one of the options you've removed.

I think I'm running the generic kernel, as far as I know. There is no /etc/make.conf file.

Do a uname -a. I get something like:

Code:

FreeBSD hostname.example.com 10.3-STABLE FreeBSD 10.3-STABLE #0 r314783: Mon Mar  6 13:32:25 EST 2017     terry@hostname.example.com:/usr/obj/usr/src/sys/GENERIC  amd64

If you're running a -RELEASE or -Px kernel, the builder (user@hostname) should be a freebsd.org site.

Also try running the disk I/O command in a loop with a sleep to see if the system slows down over time or dies all at at once.

CraigA_UK · Apr 7, 2017

Hmmm... I've got a custom kernel.
These boxes were built a while ago, but from memory I built them using the ZFSGuru installer.
It seems the kernel is GENERIC with the following modules compiled in: ALTQ, POLLING, OFED

How easy (hah) is it to rebuild a GENERIC kernel in-situ? (Or is it easier just to grab a LiveCD and blow away the base install?)

There is a kernel.old file in /boot - but no idea how to see if that's the original GENERIC kernel. If so, I can try and boot that with a clean loader.conf and then do an in-place patch/upgrade which I'm happy with the process for.
I assume that Root-on-ZFS is supported as standard with no changes to loader.conf?

I'm going to run the grabs you suggested anyway, as it would be good to see what's happening and where the starvation is happening. How often would you like the output from ps -alx to be logged? I was thinking every 30 secs, but not sure if this will be granular enough.

Once again, thanks for all your help. You're a star!

IPTRACE · Apr 7, 2017

Please look at the bug on https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=213856 .
I've reported the problem with PCIe network card and ALTQ compiled in kernel.

Terry_Kennedy · Apr 8, 2017

CraigA_UK said:
Hmmm... I've got a custom kernel.
These boxes were built a while ago, but from memory I built them using the ZFSGuru installer.
It seems the kernel is GENERIC with the following modules compiled in: ALTQ, POLLING, OFED

If you don't need those options (like if you don't need those /boot/loader.conf or /etc/sysctl.conf options), best to leave them out.

How easy (hah) is it to rebuild a GENERIC kernel in-situ? (Or is it easier just to grab a LiveCD and blow away the base install?)

# cd /usr/src; make clean; make kernel

That should build a new kernel and install it, moving the previous kernel to /boot/kernel.old. Make sure you have an unmodified GENERIC kernel configuration file and kernel source tree first, though.

There is a kernel.old file in /boot - but no idea how to see if that's the original GENERIC kernel. If so, I can try and boot that with a clean loader.conf and then do an in-place patch/upgrade which I'm happy with the process for.

# strings /boot/kernel.old/kernel | tail

I assume that Root-on-ZFS is supported as standard with no changes to loader.conf?

I believe so, but since it is a feature I don't use I can't say for sure. Note that any directives in /boot/loader.conf can be entered intractively on the console during the boot process, so as long as you have console access it shouldn't be possible to "paint yourself into a corner".

I'm going to run the grabs you suggested anyway, as it would be good to see what's happening and where the starvation is happening. How often would you like the output from ps -alx to be logged? I was thinking every 30 secs, but not sure if this will be granular enough.

Start with 30 seconds to keep the system overhead low. If that doesn't catch it, go to 15 / 10 / etc.

CraigA_UK · Apr 11, 2017

Quick update for you - since removing almost everything from loader.conf and cleaning up rc.conf to not run some pre-built ZFS auto-tuning scripts, the system has been stable under load for longer than it has been in the last week. It's still running the custom kernel on 10.2 at present, but I plan to re-work it to 10.3-RELEASE this week.

Current output from top:

Code:

last pid: 10601;  load averages:  0.02,  0.01,  0.00    up 3+11:43:59  11:25:32
20 processes:  1 running, 19 sleeping
CPU:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt, 99.9% idle
Mem: 269M Active, 333M Inact, 7120M Wired, 2908K Cache, 195M Free
ARC: 5836M Total, 3398M MFU, 1083M MRU, 16K Anon, 154M Header, 1200M Other
Swap: 2048M Total, 2048M Free

  PID USERNAME     THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
  675 root          71  20    0  3404M   574M select  8 282:47   0.00% java
 2316 root           1  20    0 21940K  1608K select  3   2:17   0.00% top
  628 root           1  20    0 14488K   884K select  0   0:31   0.00% powerd
 2312 ssh            1  20    0 86500K  2604K select 11   0:11   0.00% sshd
  479 root           1  20    0 14520K  1016K select 13   0:01   0.00% syslogd
  669 root           1  20    0 16612K   564K nanslp  9   0:01   0.00% cron
  393 root           1  20    0 13624K  4224K select  6   0:00   0.00% devd
 2309 root           1  20    0 86500K  2604K select  7   0:00   0.00% sshd
10596 root           1  21    0 86500K  6452K select 10   0:00   0.00% sshd
  749 root           1  52    0 70132K  2876K piperd  8   0:00   0.00% php
  791 root           1  20    0 17044K  1236K select  0   0:00   0.00% cdialog
10600 ssh            1  20    0 17844K  3132K wait   12   0:00   0.00% bash
10601 ssh            1  20    0 21940K  2452K CPU8    8   0:00   0.00% top
 2314 ssh            1  21    0 47736K     0K wait   10   0:00   0.00% <su>
 2313 ssh            1  20    0 17844K     0K wait    7   0:00   0.00% <bash>
 2315 root           1  20    0 17844K     0K wait    3   0:00   0.00% <bash>

On the other server, I've managed to get the kernel source from subversion (/usr/src was empty) and then rebuild a generic kernel, which booted. I then upgraded up to 10.3-RELEASE without issue.
That server isn't under load, but is also happy.

From watching the servers when they were failing, the issue seemed to be that the large chunk of Wired memory would be pressured into Inact and the ARC would drop as well. Then eventually it would freeze up.
The memory is now staying Wired - which looks good.

I suspect the issue was some of the crud in loader.conf and the auto-tuning scripts that were running from the ZFSGuru base install.

Terry - thanks for all your help, I wouldn't have got it sorted without your guidance, and I've learned some cool stuff along the way. I'm also not scared about building a new kernel from source, or upgrading in-situ. I'm always surprised about just how well FreeBSD "just works" when changing hardware/kernels.

Terry_Kennedy · Apr 11, 2017

CraigA_UK said:
Terry - thanks for all your help, I wouldn't have got it sorted without your guidance, and I've learned some cool stuff along the way.

No problem! I'm glad we got it sorted out for you.

Help with a full system freeze

CraigA_UK

Attachments

SirDice

Administrator

Terry_Kennedy

CraigA_UK

Attachments

CraigA_UK

Terry_Kennedy

CraigA_UK

CraigA_UK

Terry_Kennedy

CraigA_UK

IPTRACE

Terry_Kennedy

CraigA_UK

Terry_Kennedy