Random freezes on a FreeBSD VPS

I've been a happy customer of RootBSD since 2010 without any major issues. So, a customer asked for advise on buying a VPS and I recommended them to go with RootBSD. They bought and deployed the VPS with a fresh FreeBSD 11.0-Release install, and I configured the VPS for them (running a nameserver, webserver, mailserver and a PostgreSQL instance).

For whatever reasons I had many issues with this VPS from day one. First, I was not able to login as user root and I had to login using a regular user and the type su then it asks for root password and when I enter the password it accepts the password. I tried changing the password using passwd and the result is still the same. Huh, weird! OK, I can tolerate that.

Then, in less than a day it randomly started spitting out, out of swap errors; despite the fact that it came with a 256MB swap partition by default. I added a 2GB swap file and that issue was gone. At least temporarily.

Just a bit after that, the VPS now randomly freezes a few times a day and only responds to pings. No http, no ftp, no ssh, nothing. Even when I access the VNC console, the keyboard does not respond and OS does not print any new logs on tty1.

It's been a week since this random freezes and I'm pulling my hair due to the fact that I cannot track down the issue. Of all my FreeBSD installation/configuration this one is the only VPS that seems unstable for an unknown reason.

Do you have any suggestions where to start?

My suspicion is that maybe the filesystem is corrupted (due to those force power-offs during the out of swap errors). But, I appreciate your suggestions. What's the first thing you do in such a situation?
 
In FreeBSD root access is not allowed via ssh unless you change that option in /etc/ssh/sshd_config.

Running a mail server, webserver, db server and a name server with only 256MB of swap is a no go unless you don't expect any traffic.
My guess is that your system is running out of memory and it is constantly swapping. Also, I am curious, how exactly did you increase the swap?

Of course those are just guesses, you will need to show us at least the output of top and dmesg.
 
Thank you all for the answers.

I'd delete and start again at this point to rule out a corrupted install.

Well, I'd prefer to keep that option as a last resort since the server is in production at the moment. Furthermore, I would like to know the underlying reason.

What is the output of df? What did RootBSD say when you emailed them?

Here is the output:
Code:
$ df -h
Filesystem     Size    Used   Avail Capacity  Mounted on
/dev/ada0p3    116G     20G     86G    19%    /
devfs          1.0K    1.0K      0B   100%    /dev
fdescfs        1.0K    1.0K      0B   100%    /dev/fd
linprocfs      4.0K    4.0K      0B   100%    /compat/linux/proc

And their support said:
This is an unmanaged server, so you will need to troubleshoot what exactly is causing the server to use such a large amount of memory that it has to resort to using swap. The hypervisor this server is on has been experiencing no load issues. ....


In FreeBSD root access is not allowed via ssh unless you change that option in /etc/ssh/sshd_config.

Running a mail server, webserver, db server and a name server with only 256MB of swap is a no go unless you don't expect any traffic.
My guess is that your system is running out of memory and it is constantly swapping. Also, I am curious, how exactly did you increase the swap?

Of course those are just guesses, you will need to show us at least the output of top and dmesg.

I should clarify that I was not talking about a ssh session. It was on the VNC console (tty1) that I was not able to login. Hopefully, that was my fault. I had a similar issue to this post from FreeBSD freebsd-questions mailing list. The first thing I always do is enabling root password on single user mode; and it seems this time I mistakenly touched the wrong line inside /etc/ttys.

I changed the following:
Code:
console none                            unknown off secure
#
ttyv0   "/usr/libexec/getty Pc"         xterm   on  insecure

To:
Code:
console none                            unknown off insecure
#
ttyv0   "/usr/libexec/getty Pc"         xterm   on  secure

And the issue with the root login is resolved now.

I must agree with you that the 256MB swap was never enough. I use a 4GB swap partition for my own VPS. But, this was the way the VPS was deployed originally. It seems that RootBSD does not support installing from ISO anymore and you have a list of options that you select at the build/rebuild time. You select the OS from the web interface, they deploy it automatically with a default partition scheme. I liked their old method as it was more flexible.

This is how I added 2GB of more swap space (later 4GB without any luck):

Code:
$ dd if=/dev/zero of=/var/swap0 bs=1m count=4096
$ chmod 0600 /var/swap0
$ swapon -aL
$ swapctl -l
Device:       1024-blocks     Used:
/dev/ada0p2      262144      8300
/dev/md99       4194304      7704

And /etc/fstab
Code:
# Device        Mountpoint      FStype  Options Dump    Pass#
/dev/ada0p2     none            swap    sw      0       0
/dev/ada0p3     /               ufs     rw,userquota,groupquota      1       1

# More SWAP
md99            none            swap    sw,file=/var/swap0,late 0   0

# for some programs such as kde4
proc /proc procfs rw,noauto,late 0 0

# for some programs such as bash
fdesc /dev/fd fdescfs rw,late 0 0

# linux-emulation
linproc /compat/linux/proc linprocfs rw,late 0 0

I tried to rebuild the world/kernel from source again this time with devel/ccache disabled. I left an sysutil/htop instance on screen in my ssh session. Then I ran the following to rebuild the world/kerenl on the VNC console:

Code:
$ cd /usr/
$ rm -rf src
$ svnlite checkout http://svn.freebsd.org/base/releng/11.0 /usr/src/
$ echo 'include         GENERIC' > /usr/src/sys/amd64/conf/CUSTOM
$ echo 'ident           CUSTOM' >> /usr/src/sys/amd64/conf/CUSTOM
$ echo '' >> /usr/src/sys/amd64/conf/CUSTOM
$ echo '# Quota' >> /usr/src/sys/amd64/conf/CUSTOM
$ echo 'options         QUOTA' >> /usr/src/sys/amd64/conf/CUSTOM
$ cat /usr/src/sys/amd64/conf/CUSTOM
include         GENERIC
ident           CUSTOM

# Quota
options         QUOTA
$ cd /usr/src/ && make clean
$ cd /usr/obj/ && rm -rf *
$ cd /usr/src/ && make buildworld -j5 && make buildkernel -j5 KERNCONF=CUSTOM

This time freezing happened at kernel linking stage. This is part of the output from the VNC console (I couldn't see the output before because everything was frozen):
Code:
. [NO DATA BEFORE THIS POINT IS AVAILABLE]
ctfconvert -L VERSION -g hptrr_lib.o
ERROR: ctfconvert: rc = -1 No entry found [dwarf_next_cu_header_c(61)]
. (blah)
. (blah)
. (blah)
--- kernel.full ---
linking kernel.full
ctfmerge -L VERSION -g -o kernel.full ...
. [THE OUTPUT AND KEYBOARD WERE FROZEN HERE]

The lines starting with dot are my own comments. And the memory usage from sysutil/htop on the disconnected ssh session is as follows:

Code:
Mem[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||1.56G/1.97G]
Swp[||||||                                                                                            225M/4.25G]

According to the sysutil/htop output, one can conclude memory usage has nothing to do with these random freezes.

I'll post dmesg and /var/log/messages in another post.
 
Hi NuLL3rr0r,

I have been experiencing something similar and am clueless too. My crashes seem to be linked to network data transfer (I still have this issue with large files within a virtual machine). Do you know if there is a lot of network or disk activity when the machine hangs?

regards,
tcn
 
Hi tcn,

Unfortunately, the issue is still around for me, too. I even wiped the VPS an reinstalled it again, still the same issue. I doubt it that my crash is related to network or disk activity since this VPS does not have any load.

But, I'll monitor that and report back if I find anything useful.

Regards
 
As a follow-up I would like to share my findings on this issue. So, it may comes handy to other people.

I reinstalled this VPS twice and unfortunately results were the same. So, I tried other providers such as DigitalOcean and Vultr but I wasn't satisfied with their performance; RootBSD (4 Cores, 2GB RAM) performed 2x to 3x as fast on my PostgreSQL benchmarks compared to Vultr (6 Cores, 8GB RAM) and DigitalOcean (4 Cores, 8GB RAM).

As a workaround, I decided to give 10.3-RELEASE a go. It has been in production almost a week without those freezes or any other issues.

Furthermore, I noticed something else on FreeBSD 11.0-RELEASE which has been on my mind for some time (actually I was too lazy to ask it here :D). I always generate my random passwords using a simple script like this:

Code:
#!/bin/sh
strings /dev/urandom | grep -o "[[:alnum:]]" | head -n 128 | tr -d '\n' | xargs echo

On my own VPS I always ran this and get a 128-length password instantly. After upgrading to 11.0-RELEASE I noticed that this simple script was running painfully slow that I was able to leave my desk, go get a cup of coffee and come back and still looking at the screen waiting for my random password to appear on the output.

So, I ran this script on the new VPS with the freezing issues and here are the sample results:

Code:
# 11.0-RELEASE
$ time sh rpwd.sh
4luBW07S02hx4Sc2FBblhadujbENKYhjW9BjTPAwO5gDpTAf02owTyedVETEuZhBxsJLawfUQiFoRhgC5YNBc6Bk1tURITdUX9zuGSycVtTfpSp0jyVIUuN5GJhPHPiC
0.091u 39.385s 1:22.02 48.1%   15+178k 0+0io 202pf+3w

Code:
# 10.3-RELEASE
$ time sh rpwd.sh
sGXjFowDYPkMUqdmx823FQFhY6zaPiLr5X6FbRZ1ej5y6Lr6vQZnzGgllJtG48oyCRgHr0GZQ0YnXgyZOS04tfPGk6RRHj5pHSty4CqB8mJoPnfrnCUnWR9OKuhxqtGT
0.007u 0.014s 0:00.01 100.0%   880+432k 0+0io 0pf+0w

Crazy!! They are not comparable in any way! Reading from /dev/urandom should not be that slow! Running that script a few times caused the mentioned VPS to freeze. By the way, the culprit is strings /dev/urandom part.

Although, I won't consider the issue as resolved, in my estimation there are still some rough edges on 11.0-RELEASE which makes me believe 10.3-RELEASE is the best candidate for production environments at the moment; thanks to the FreeBSD team for their policy of supporting multiple releases concurrently.
 
Weird. I ran your script on a Digital Ocean 10.3 x64 VM (512Mb RAM) and it completed almost instantly.

FreeBSD centurion 10.3-RELEASE-p18 FreeBSD 10.3-RELEASE-p18 #0: Tue Apr 11 10:31:00 UTC 2017 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64

I ran it on a Vultr 11.0 x64 (1G RAM, 1G swap file) and it locked up the VM... the apparent reason being that it consumed all the memory and then started chewing its way through the swap file. It wasn't completely locked up for the first few minutes, ssh logged in eventually after a couple of minutes, but no prompt for another couple of minutes. Console access couldn't produce a prompt.

FreeBSD vulture 11.0-RELEASE-p9 FreeBSD 11.0-RELEASE-p9 #0: Tue Apr 11 08:48:40 UTC 2017 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64

I only found this thread because a simple make clean in a ports directory locked up the VM and it needed a restart. Currently in the process of downgrading the 11.0 VM to 10.3 :(
 
Yes, I noticed that this happens on VirtualBox and even bare-metal installs, too. Due to this reason alone I only use 10.3-RELEASE in production until 11.1-RELEASE arrives to see if this issue has gone away or not.
 
Back
Top