Swap exhaustion linked to jobs in a jails

Dear all,

Every night I receive lots of messages like this in my event log :
Code:
swap_pager_getswapspace(2): failed
swap_pager_getswapspace(3): failed
swap_pager_getswapspace(2): failed
swap_pager_getswapspace(2): failed
swap_pager_getswapspace(4): failed
swap_pager_getswapspace(5): failed
swap_pager_getswapspace(2): failed
swap_pager_getswapspace(3): failed
swap_pager_getswapspace(2): failed

My investigations demonstrated that the scheduled jobs running at the same moment are 2 heavy rsync jobs (over ssh) performed by a remote machine in order to backup things on file space located within a regular FreeBSD jail. (rsync -avz of > 60GB of data).

Network access to this jail is done through the main network Internet interface and thanks to an IPinIP tunnel (gif interface) routing RFC1918 traffic up to the jail.

A top performed every minute while the job is running shows that the physical memory is not impacted but there is a quick reduction of available swap space as soon as the jobs start.
This swap space never falls to 0. So the kernel seems to fail to allocate swap space at the rate it's being requested by the 2 rsync jobs.

This Jail is never updated while the host gets updated regularly.
Currently the host is running FreeBSD 10.1-RELEASE-p25 FreeBSD 10.1-RELEASE-p25 #7 r291865: Sat Dec 5 21:30:52 CET 2015 amd64

Rsync version in the jail : rsync version 3.1.1 protocol version 31

Any idea what could cause this swap issue ?

Thanks !
a.
 
Sometimes tmpfs(5) is used for /tmp. This might eat up available memory.

Also see if you can reschedule the jobs so they don't run at the same time. Most of the time it's quicker to sequentially back them up instead of all at once. The idea being that one job sucks up all available I/O, leaving nothing for the other jobs. So they both end up fighting for I/O making everything slow.
 
Hi SirDice,

The /tmp within the jail is not mounted on tmpfs(5):
Code:
# df -h
Filesystem  Size  Used  Avail Capacity  Mounted on
/dev/ada0s1a  1.8T  391G  1.2T  24%  /
I can reschedule the jobs but I'm a bit surprised that the systems lacks of swap space or I/O as the system has 16GB of RAM and is hardly using anything even when the rsync jobs are running (see attachment). Most of the RAM (10GB) is tagged as inactive all the time on this system.
 

Attachments

  • Snap 2016-01-04 at 15.53.16.png
    Snap 2016-01-04 at 15.53.16.png
    22.2 KB · Views: 201
The /tmp within the jail is not mounted on tmpfs(5):
Code:
# df -h
Filesystem  Size  Used  Avail Capacity  Mounted on
/dev/ada0s1a  1.8T  391G  1.2T  24%  /
Look on the host, depending on the state of enforce_statfs the jail may not show any other filesystem besides root.
 
Nope on the host it does not appear :
Code:
% df -h
Filesystem  Size  Used  Avail Capacity  Mounted on
/dev/ada0s1a  1.8T  391G  1.2T  24%  /
devfs  1.0K  1.0K  0B  100%  /dev
procfs  4.0K  4.0K  0B  100%  /proc
fdescfs  1.0K  1.0K  0B  100%  /dev/fd
linprocfs  4.0K  4.0K  0B  100%  /usr/compat/linux/proc
devfs  1.0K  1.0K  0B  100%  /usr/home/jails/*hidden*/dev

% cat /etc/fstab
# Device  Mountpoint  FStype  Options  Dump  Pass#
/dev/ada0s1a  /  ufs  rw  1  1
/dev/ada0s1b  swap  swap  sw  0  0
proc  /proc  procfs  rw  0  0
#For Linux compatibility
linproc  /compat/linux/proc  linprocfs rw,late  0  0
#For BASH
fdesc  /dev/fd  fdescfs rw  0  0
 
Note that after rebooting the machine or issuing a swapoff/swapon, then the problem is gone for a few days.
Then it will reappear and happen every day until a new reboot is performed or swapoff/swapon is issued.

For example, I rebooted 1 day ago and last night I didn't encounter the problem :

# uptime
Code:
10:43AM  up 1 day, 14:54, 1 user, load averages: 0.54, 0.40, 0.35

Code:
CPU:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 320M Active, 13G Inact, 1801M Wired, 85M Cache, 1592M Buf, 349M Free
Swap: 513M Total, 185M Used, 328M Free, 36% Inuse
 
Back
Top