UFS System hangs when deleting a rsnapshot folder after 10.1 upgrade

I've run into what I think is a bug in FreeBSD 10.1 on hosts using UFS.

We recently upgraded our off-site storage server to FreeBSD 10.1 (from 10.0) and found that the system was unresponsive each morning. Assuming hardware failure, a new host was set up, again running FreeBSD 10.1. The system worked for a few days but then started to become unresponsive in the mornings as well, requiring a restart each day.

I found that the system was hanging during the rsnapshot task, specifically when it was trying to remove the oldest snapshot folder. I spun up a local VM and was able to reproduce the issue.

I performed the rsnapshot task on the VM host while it was still running FreeBSD 10.0:
Code:
root@colleen:/usr/home/admin] pkg version -v | grep rs
rsnapshot-1.3.1_2                  =   up-to-date with index
rsync-3.1.1_3                      =   up-to-date with index
[root@colleen:/usr/home/admin] uname -a
FreeBSD colleen.sandbox.local 10.0-RELEASE-p17 FreeBSD 10.0-RELEASE-p17 #0: Tue Jan 27 13:45:18 UTC 2015     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64
[root@colleen:/usr/home/admin] du -d 1 -h -A /backup/remote/
150G    /backup/remote/daily.1
8.2G    /backup/remote/daily.0
91M    /backup/remote/daily.2
91M    /backup/remote/daily.3
158G    /backup/remote/
[root@colleen:/usr/home/admin] /usr/local/bin/rsnapshot -v -c /usr/local/etc/rsnapshot.d/remote.conf sync ; /usr/local/bin/rsnapshot -v -c /usr/local/etc/rsnapshot.d/remote.conf daily
mkdir -m 0755 -p /backup/remote/.sync/severA.company.com/
/usr/local/bin/rsync -a --delete --bwlimit=40000 --numeric-ids --relative \
    --delete-excluded --no-owner --no-group \
    --chmod=Du+rwx,Dgo-rwx,Fu+rw,Fgo-rw --rsync-path="/usr/local/bin/sudo \
    /usr/local/bin/rsync" \
    --exclude-from=/usr/local/etc/rsnapshot.d/excludes.txt \
    --rsh="/usr/bin/ssh -i /home/admin/.ssh/id_dsa" \
    --link-dest=/backup/remote/daily.0/severA.company.com/mailserver/ \
    rsync@192.168.23.178:/var/vmail \
    /backup/remote/.sync/severA.company.com/mailserver/
touch /backup/remote/.sync/
/bin/rm -rf /backup/remote/daily.3/
mv /backup/remote/daily.2/ /backup/remote/daily.3/
mv /backup/remote/daily.1/ /backup/remote/daily.2/
mv /backup/remote/daily.0/ /backup/remote/daily.1/
mv /backup/remote/.sync/ /backup/remote/daily.0/
[root@colleen:/usr/home/admin] du -d 1 -h -A /backup/remote/                 
141G    /backup/remote/daily.1
906M    /backup/remote/daily.0
90M    /backup/remote/daily.2
16G    /backup/remote/daily.3
159G    /backup/remote/

Everything worked as it should. So I upgraded the VM to FreeBSD 10.1 and performed the same steps:
Code:
[root@colleen:/usr/home/admin] uname -a
FreeBSD colleen.sandbox.local 10.1-RELEASE-p6 FreeBSD 10.1-RELEASE-p6 #0: Tue Fe          b 24 19:00:21 UTC 2015     root@amd64-builder.daemonology.net:/usr/obj/usr/src/s          ys/GENERIC  amd64
[root@colleen:/usr/home/admin]  du -d 1 -h -A /backup/remote/
141G    /backup/remote/daily.1
906M    /backup/remote/daily.0
90M    /backup/remote/daily.2
16G    /backup/remote/daily.3
159G    /backup/remote/
[root@colleen:/usr/home/admin]  /usr/local/bin/rsnapshot -v -c /usr/local/etc/            napshot.d/remote.conf sync ; /usr/local/bin/rsnapshot -v -c /usr/local/etc/rsn            shot.d/remote.conf daily
mkdir -m 0755 -p /backup/remote/.sync/severA.company.com/
/usr/local/bin/rsync -a --delete --bwlimit=40000 --numeric-ids --relative \
    --delete-excluded --no-owner --no-group \
    --chmod=Du+rwx,Dgo-rwx,Fu+rw,Fgo-rw --rsync-path="/usr/local/bin/sudo \
    /usr/local/bin/rsync" \
    --exclude-from=/usr/local/etc/rsnapshot.d/excludes.txt \
    --rsh="/usr/bin/ssh -i /home/admin/.ssh/id_dsa" \
    --link-dest=/backup/remote/daily.0/severA.company.com/mailserver/ \
    rsync@192.168.23.178:/var/vmail \
    /backup/remote/.sync/severA.company.com/mailserver/
touch /backup/remote/.sync/
/bin/rm -rf /backup/remote/daily.3/

Here's a copy of the output from ps 2 hours after rsnapshot attempted to remove the folder and just before I lost access to the server:
Code:
[root@colleen:/usr/home/admin] ps aux
USER       PID %CPU %MEM   VSZ   RSS TT  STAT STARTED       TIME COMMAND
root         9 89.0  0.0     0    16  -  RL   10:18PM   90:58.90 [syncer]
root        11 82.9  0.0     0    32  -  RL   10:18PM 1123:52.13 [idle]
root         2 25.0  0.0     0    32  -  DL   10:18PM   19:43.31 [cam]
root         7  2.0  0.0     0    32  -  DL   10:18PM    3:55.53 [bufdaemon]
root        12  1.0  0.0     0   208  -  WL   10:18PM    4:16.58 [intr]
root         0  0.0  0.0     0   160  -  DLs  10:18PM    0:05.09 [kernel]
root         1  0.0  0.0  9428   868  -  ILs  10:18PM    0:00.02 /sbin/init --
root         3  0.0  0.0     0    16  -  DL   10:18PM    0:00.00 [sctp_iterator]
root         4  0.0  0.0     0    16  -  DL   10:18PM    0:00.92 [pagedaemon]
root         5  0.0  0.0     0    16  -  DL   10:18PM    0:00.00 [vmdaemon]
root         6  0.0  0.0     0    16  -  DL   10:18PM    0:00.00 [pagezero]
root         8  0.0  0.0     0    16  -  DL   10:18PM    0:05.41 [vnlru]
root        10  0.0  0.0     0    16  -  DL   10:18PM    0:00.00 [audit]
root        13  0.0  0.0     0    48  -  DL   10:18PM    0:00.16 [geom]
root        14  0.0  0.0     0    16  -  DL   10:18PM    0:09.50 [rand_harvestq]
root        15  0.0  0.0     0    64  -  DL   10:18PM    0:01.41 [usb]
root       129  0.0  0.0 12264  1752  -  Is   10:18PM    0:00.00 adjkerntz -i
root       489  0.0  0.1 14556  2152  -  Is   10:18PM    0:00.00 dhclient: em0 [priv] (dhclient)
_dhcp      525  0.0  0.1 14556  2264  -  Is   10:18PM    0:00.00 dhclient: em0 (dhclient)
root       538  0.0  0.1 13584  4484  -  Is   10:18PM    0:00.03 /sbin/devd
root       649  0.0  0.0 14424  2056  -  Ss   10:18PM    0:00.11 /usr/sbin/syslogd -s
root       795  0.0  0.4 27968 17848  -  Ds   10:18PM    0:10.47 /usr/local/sbin/ntpd -c /etc/ntp.conf -p /var/run/ntp
root       825  0.0  0.2 60816  6772  -  Is   10:18PM    0:00.00 /usr/sbin/sshd
root       840  0.0  0.1 23984  5836  -  Ss   10:18PM    0:00.75 sendmail: accepting connections (sendmail)
root       844  0.0  0.1 16520  2244  -  Ss   10:18PM    0:00.21 /usr/sbin/cron -s
root     86527  0.0  0.2 86476  8544  -  Is    7:51AM    0:00.03 sshd: admin [priv] (sshd)
admin    86530  0.0  0.2 86476  8532  -  S     7:51AM    0:00.04 sshd: admin@pts/0 (sshd)
root     86542  0.0  0.2 86476  8556  -  Is    7:52AM    0:00.03 sshd: admin [priv] (sshd)
admin    86545  0.0  0.2 86476  8628  -  R     7:52AM    0:00.10 sshd: admin@pts/1 (sshd)
root     86598  0.0  0.1 16520  2288  -  I     8:10AM    0:00.00 cron: running job (cron)
operator 86599  0.0  0.1 17064  2632  -  Is    8:10AM    0:00.00 /bin/sh /usr/libexec/save-entropy
operator 86601  0.0  0.0  8236  1904  -  D     8:10AM    0:07.93 unlink saved-entropy.8
root     86612  0.0  0.1 16520  2288  -  I     8:22AM    0:00.00 cron: running job (cron)
operator 86613  0.0  0.1 17064  2628  -  Ds    8:22AM    0:00.00 /bin/sh /usr/libexec/save-entropy
root     86622  0.0  0.1 16520  2288  -  I     8:33AM    0:00.00 cron: running job (cron)
operator 86623  0.0  0.1 17064  2628  -  Ds    8:33AM    0:00.00 /bin/sh /usr/libexec/save-entropy
root     86631  0.0  0.1 16520  2288  -  I     8:44AM    0:00.00 cron: running job (cron)
operator 86632  0.0  0.1 17064  2628  -  Ds    8:44AM    0:00.00 /bin/sh /usr/libexec/save-entropy
root     86641  0.0  0.1 16520  2288  -  I     8:55AM    0:00.00 cron: running job (cron)
operator 86643  0.0  0.1 17064  2628  -  Ds    8:55AM    0:00.00 /bin/sh /usr/libexec/save-entropy
root     86648  0.0  0.1 16520  2288  -  I     9:00AM    0:00.00 cron: running job (cron)
operator 86651  0.0  0.1 17064  2628  -  Ds    9:00AM    0:00.00 /bin/sh /usr/libexec/save-entropy
root     86659  0.0  0.1 16520  2288  -  I     9:11AM    0:00.00 cron: running job (cron)
operator 86660  0.0  0.1 17064  2628  -  Ds    9:11AM    0:00.00 /bin/sh /usr/libexec/save-entropy
root     86667  0.0  0.1 16520  2288  -  I     9:22AM    0:00.00 cron: running job (cron)
operator 86668  0.0  0.1 17064  2628  -  Ds    9:22AM    0:00.00 /bin/sh /usr/libexec/save-entropy
root     86674  0.0  0.1 16520  2288  -  I     9:33AM    0:00.00 cron: running job (cron)
operator 86675  0.0  0.1 17064  2628  -  Ds    9:33AM    0:00.04 /bin/sh /usr/libexec/save-entropy
root     86681  0.0  0.1 16520  2288  -  I     9:44AM    0:00.00 cron: running job (cron)
operator 86682  0.0  0.1 17064  2628  -  Ds    9:44AM    0:00.00 /bin/sh /usr/libexec/save-entropy
root     86690  0.0  0.1 16520  2288  -  I     9:55AM    0:00.00 cron: running job (cron)
operator 86692  0.0  0.1 17064  2628  -  Ds    9:55AM    0:00.02 /bin/sh /usr/libexec/save-entropy
root     86696  0.0  0.1 16520  2288  -  I    10:00AM    0:00.00 cron: running job (cron)
operator 86699  0.0  0.1 17064  2628  -  Ds   10:00AM    0:00.00 /bin/sh /usr/libexec/save-entropy
root     86707  0.0  0.1 16520  2288  -  I    10:11AM    0:00.00 cron: running job (cron)
operator 86708  0.0  0.1 17064  2628  -  Ds   10:11AM    0:00.00 /bin/sh /usr/libexec/save-entropy
root       885  0.0  0.1 47656  2800 v0  Is   10:18PM    0:00.02 login [pam] (login)
root       896  0.0  0.1 23492  3504 v0  I+   10:20PM    0:00.04 -csh (csh)
root       886  0.0  0.0 14420  1988 v1  Is+  10:18PM    0:00.00 /usr/libexec/getty Pc ttyv1
root       887  0.0  0.0 14420  1988 v2  Is+  10:18PM    0:00.00 /usr/libexec/getty Pc ttyv2
root       888  0.0  0.0 14420  1988 v3  Is+  10:18PM    0:00.00 /usr/libexec/getty Pc ttyv3
root       889  0.0  0.0 14420  1988 v4  Is+  10:18PM    0:00.00 /usr/libexec/getty Pc ttyv4
root       890  0.0  0.0 14420  1988 v5  Is+  10:18PM    0:00.00 /usr/libexec/getty Pc ttyv5
root       891  0.0  0.0 14420  1988 v6  Is+  10:18PM    0:00.00 /usr/libexec/getty Pc ttyv6
root       892  0.0  0.0 14420  1988 v7  Is+  10:18PM    0:00.00 /usr/libexec/getty Pc ttyv7
admin    86531  0.0  0.1 23572  3560  0  Is    7:51AM    0:00.02 -tcsh (tcsh)
root     86533  0.0  0.1 50316  3476  0  I     7:51AM    0:00.02 sudo su
root     86534  0.0  0.1 47704  2820  0  I     7:51AM    0:00.01 su
root     86535  0.0  0.1 23572  3560  0  I     7:51AM    0:00.03 _su (csh)
root     86575  0.0  0.3 39212 10820  0  I+    7:59AM    0:00.06 /usr/local/bin/perl -w /usr/local/bin/rsnapshot -v -c
root     86576  0.0  0.2 16428  9172  0  D+    7:59AM    0:49.12 /bin/rm -rf /backup/remote/daily.3/
admin    86546  0.0  0.1 23572  3560  1  Is    7:52AM    0:00.01 -tcsh (tcsh)
root     86548  0.0  0.1 50316  3476  1  I     7:52AM    0:00.01 sudo su
root     86549  0.0  0.1 47704  2820  1  I     7:52AM    0:00.00 su
root     86550  0.0  0.1 23572  3732  1  S     7:52AM    0:00.03 _su (csh)
root     86710  0.0  0.1 18736  2332  1  R+   10:13AM    0:00.00 ps aux
[root@colleen:/usr/home/admin] ps aux
^C

I ran into the same issues when running the same tests on a fresh 10.1 install.
Running
rm -rf /backup/remote/daily.3/
manually results in the same behaviour (on 10.1).

This issue only seems to effect our systems running UFS.

Does anyone have any ideas of why this is happening or should I be asking on the mailing list?
 
Last edited:
Back
Top