I've run into what I think is a bug in FreeBSD 10.1 on hosts using UFS.
We recently upgraded our off-site storage server to FreeBSD 10.1 (from 10.0) and found that the system was unresponsive each morning. Assuming hardware failure, a new host was set up, again running FreeBSD 10.1. The system worked for a few days but then started to become unresponsive in the mornings as well, requiring a restart each day.
I found that the system was hanging during the rsnapshot task, specifically when it was trying to remove the oldest snapshot folder. I spun up a local VM and was able to reproduce the issue.
I performed the rsnapshot task on the VM host while it was still running FreeBSD 10.0:
Everything worked as it should. So I upgraded the VM to FreeBSD 10.1 and performed the same steps:
Here's a copy of the output from ps 2 hours after rsnapshot attempted to remove the folder and just before I lost access to the server:
I ran into the same issues when running the same tests on a fresh 10.1 install.
Running
manually results in the same behaviour (on 10.1).
This issue only seems to effect our systems running UFS.
Does anyone have any ideas of why this is happening or should I be asking on the mailing list?
We recently upgraded our off-site storage server to FreeBSD 10.1 (from 10.0) and found that the system was unresponsive each morning. Assuming hardware failure, a new host was set up, again running FreeBSD 10.1. The system worked for a few days but then started to become unresponsive in the mornings as well, requiring a restart each day.
I found that the system was hanging during the rsnapshot task, specifically when it was trying to remove the oldest snapshot folder. I spun up a local VM and was able to reproduce the issue.
I performed the rsnapshot task on the VM host while it was still running FreeBSD 10.0:
Code:
root@colleen:/usr/home/admin] pkg version -v | grep rs
rsnapshot-1.3.1_2 = up-to-date with index
rsync-3.1.1_3 = up-to-date with index
[root@colleen:/usr/home/admin] uname -a
FreeBSD colleen.sandbox.local 10.0-RELEASE-p17 FreeBSD 10.0-RELEASE-p17 #0: Tue Jan 27 13:45:18 UTC 2015 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64
[root@colleen:/usr/home/admin] du -d 1 -h -A /backup/remote/
150G /backup/remote/daily.1
8.2G /backup/remote/daily.0
91M /backup/remote/daily.2
91M /backup/remote/daily.3
158G /backup/remote/
[root@colleen:/usr/home/admin] /usr/local/bin/rsnapshot -v -c /usr/local/etc/rsnapshot.d/remote.conf sync ; /usr/local/bin/rsnapshot -v -c /usr/local/etc/rsnapshot.d/remote.conf daily
mkdir -m 0755 -p /backup/remote/.sync/severA.company.com/
/usr/local/bin/rsync -a --delete --bwlimit=40000 --numeric-ids --relative \
--delete-excluded --no-owner --no-group \
--chmod=Du+rwx,Dgo-rwx,Fu+rw,Fgo-rw --rsync-path="/usr/local/bin/sudo \
/usr/local/bin/rsync" \
--exclude-from=/usr/local/etc/rsnapshot.d/excludes.txt \
--rsh="/usr/bin/ssh -i /home/admin/.ssh/id_dsa" \
--link-dest=/backup/remote/daily.0/severA.company.com/mailserver/ \
rsync@192.168.23.178:/var/vmail \
/backup/remote/.sync/severA.company.com/mailserver/
touch /backup/remote/.sync/
/bin/rm -rf /backup/remote/daily.3/
mv /backup/remote/daily.2/ /backup/remote/daily.3/
mv /backup/remote/daily.1/ /backup/remote/daily.2/
mv /backup/remote/daily.0/ /backup/remote/daily.1/
mv /backup/remote/.sync/ /backup/remote/daily.0/
[root@colleen:/usr/home/admin] du -d 1 -h -A /backup/remote/
141G /backup/remote/daily.1
906M /backup/remote/daily.0
90M /backup/remote/daily.2
16G /backup/remote/daily.3
159G /backup/remote/
Everything worked as it should. So I upgraded the VM to FreeBSD 10.1 and performed the same steps:
Code:
[root@colleen:/usr/home/admin] uname -a
FreeBSD colleen.sandbox.local 10.1-RELEASE-p6 FreeBSD 10.1-RELEASE-p6 #0: Tue Fe b 24 19:00:21 UTC 2015 root@amd64-builder.daemonology.net:/usr/obj/usr/src/s ys/GENERIC amd64
[root@colleen:/usr/home/admin] du -d 1 -h -A /backup/remote/
141G /backup/remote/daily.1
906M /backup/remote/daily.0
90M /backup/remote/daily.2
16G /backup/remote/daily.3
159G /backup/remote/
[root@colleen:/usr/home/admin] /usr/local/bin/rsnapshot -v -c /usr/local/etc/ napshot.d/remote.conf sync ; /usr/local/bin/rsnapshot -v -c /usr/local/etc/rsn shot.d/remote.conf daily
mkdir -m 0755 -p /backup/remote/.sync/severA.company.com/
/usr/local/bin/rsync -a --delete --bwlimit=40000 --numeric-ids --relative \
--delete-excluded --no-owner --no-group \
--chmod=Du+rwx,Dgo-rwx,Fu+rw,Fgo-rw --rsync-path="/usr/local/bin/sudo \
/usr/local/bin/rsync" \
--exclude-from=/usr/local/etc/rsnapshot.d/excludes.txt \
--rsh="/usr/bin/ssh -i /home/admin/.ssh/id_dsa" \
--link-dest=/backup/remote/daily.0/severA.company.com/mailserver/ \
rsync@192.168.23.178:/var/vmail \
/backup/remote/.sync/severA.company.com/mailserver/
touch /backup/remote/.sync/
/bin/rm -rf /backup/remote/daily.3/
Here's a copy of the output from ps 2 hours after rsnapshot attempted to remove the folder and just before I lost access to the server:
Code:
[root@colleen:/usr/home/admin] ps aux
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
root 9 89.0 0.0 0 16 - RL 10:18PM 90:58.90 [syncer]
root 11 82.9 0.0 0 32 - RL 10:18PM 1123:52.13 [idle]
root 2 25.0 0.0 0 32 - DL 10:18PM 19:43.31 [cam]
root 7 2.0 0.0 0 32 - DL 10:18PM 3:55.53 [bufdaemon]
root 12 1.0 0.0 0 208 - WL 10:18PM 4:16.58 [intr]
root 0 0.0 0.0 0 160 - DLs 10:18PM 0:05.09 [kernel]
root 1 0.0 0.0 9428 868 - ILs 10:18PM 0:00.02 /sbin/init --
root 3 0.0 0.0 0 16 - DL 10:18PM 0:00.00 [sctp_iterator]
root 4 0.0 0.0 0 16 - DL 10:18PM 0:00.92 [pagedaemon]
root 5 0.0 0.0 0 16 - DL 10:18PM 0:00.00 [vmdaemon]
root 6 0.0 0.0 0 16 - DL 10:18PM 0:00.00 [pagezero]
root 8 0.0 0.0 0 16 - DL 10:18PM 0:05.41 [vnlru]
root 10 0.0 0.0 0 16 - DL 10:18PM 0:00.00 [audit]
root 13 0.0 0.0 0 48 - DL 10:18PM 0:00.16 [geom]
root 14 0.0 0.0 0 16 - DL 10:18PM 0:09.50 [rand_harvestq]
root 15 0.0 0.0 0 64 - DL 10:18PM 0:01.41 [usb]
root 129 0.0 0.0 12264 1752 - Is 10:18PM 0:00.00 adjkerntz -i
root 489 0.0 0.1 14556 2152 - Is 10:18PM 0:00.00 dhclient: em0 [priv] (dhclient)
_dhcp 525 0.0 0.1 14556 2264 - Is 10:18PM 0:00.00 dhclient: em0 (dhclient)
root 538 0.0 0.1 13584 4484 - Is 10:18PM 0:00.03 /sbin/devd
root 649 0.0 0.0 14424 2056 - Ss 10:18PM 0:00.11 /usr/sbin/syslogd -s
root 795 0.0 0.4 27968 17848 - Ds 10:18PM 0:10.47 /usr/local/sbin/ntpd -c /etc/ntp.conf -p /var/run/ntp
root 825 0.0 0.2 60816 6772 - Is 10:18PM 0:00.00 /usr/sbin/sshd
root 840 0.0 0.1 23984 5836 - Ss 10:18PM 0:00.75 sendmail: accepting connections (sendmail)
root 844 0.0 0.1 16520 2244 - Ss 10:18PM 0:00.21 /usr/sbin/cron -s
root 86527 0.0 0.2 86476 8544 - Is 7:51AM 0:00.03 sshd: admin [priv] (sshd)
admin 86530 0.0 0.2 86476 8532 - S 7:51AM 0:00.04 sshd: admin@pts/0 (sshd)
root 86542 0.0 0.2 86476 8556 - Is 7:52AM 0:00.03 sshd: admin [priv] (sshd)
admin 86545 0.0 0.2 86476 8628 - R 7:52AM 0:00.10 sshd: admin@pts/1 (sshd)
root 86598 0.0 0.1 16520 2288 - I 8:10AM 0:00.00 cron: running job (cron)
operator 86599 0.0 0.1 17064 2632 - Is 8:10AM 0:00.00 /bin/sh /usr/libexec/save-entropy
operator 86601 0.0 0.0 8236 1904 - D 8:10AM 0:07.93 unlink saved-entropy.8
root 86612 0.0 0.1 16520 2288 - I 8:22AM 0:00.00 cron: running job (cron)
operator 86613 0.0 0.1 17064 2628 - Ds 8:22AM 0:00.00 /bin/sh /usr/libexec/save-entropy
root 86622 0.0 0.1 16520 2288 - I 8:33AM 0:00.00 cron: running job (cron)
operator 86623 0.0 0.1 17064 2628 - Ds 8:33AM 0:00.00 /bin/sh /usr/libexec/save-entropy
root 86631 0.0 0.1 16520 2288 - I 8:44AM 0:00.00 cron: running job (cron)
operator 86632 0.0 0.1 17064 2628 - Ds 8:44AM 0:00.00 /bin/sh /usr/libexec/save-entropy
root 86641 0.0 0.1 16520 2288 - I 8:55AM 0:00.00 cron: running job (cron)
operator 86643 0.0 0.1 17064 2628 - Ds 8:55AM 0:00.00 /bin/sh /usr/libexec/save-entropy
root 86648 0.0 0.1 16520 2288 - I 9:00AM 0:00.00 cron: running job (cron)
operator 86651 0.0 0.1 17064 2628 - Ds 9:00AM 0:00.00 /bin/sh /usr/libexec/save-entropy
root 86659 0.0 0.1 16520 2288 - I 9:11AM 0:00.00 cron: running job (cron)
operator 86660 0.0 0.1 17064 2628 - Ds 9:11AM 0:00.00 /bin/sh /usr/libexec/save-entropy
root 86667 0.0 0.1 16520 2288 - I 9:22AM 0:00.00 cron: running job (cron)
operator 86668 0.0 0.1 17064 2628 - Ds 9:22AM 0:00.00 /bin/sh /usr/libexec/save-entropy
root 86674 0.0 0.1 16520 2288 - I 9:33AM 0:00.00 cron: running job (cron)
operator 86675 0.0 0.1 17064 2628 - Ds 9:33AM 0:00.04 /bin/sh /usr/libexec/save-entropy
root 86681 0.0 0.1 16520 2288 - I 9:44AM 0:00.00 cron: running job (cron)
operator 86682 0.0 0.1 17064 2628 - Ds 9:44AM 0:00.00 /bin/sh /usr/libexec/save-entropy
root 86690 0.0 0.1 16520 2288 - I 9:55AM 0:00.00 cron: running job (cron)
operator 86692 0.0 0.1 17064 2628 - Ds 9:55AM 0:00.02 /bin/sh /usr/libexec/save-entropy
root 86696 0.0 0.1 16520 2288 - I 10:00AM 0:00.00 cron: running job (cron)
operator 86699 0.0 0.1 17064 2628 - Ds 10:00AM 0:00.00 /bin/sh /usr/libexec/save-entropy
root 86707 0.0 0.1 16520 2288 - I 10:11AM 0:00.00 cron: running job (cron)
operator 86708 0.0 0.1 17064 2628 - Ds 10:11AM 0:00.00 /bin/sh /usr/libexec/save-entropy
root 885 0.0 0.1 47656 2800 v0 Is 10:18PM 0:00.02 login [pam] (login)
root 896 0.0 0.1 23492 3504 v0 I+ 10:20PM 0:00.04 -csh (csh)
root 886 0.0 0.0 14420 1988 v1 Is+ 10:18PM 0:00.00 /usr/libexec/getty Pc ttyv1
root 887 0.0 0.0 14420 1988 v2 Is+ 10:18PM 0:00.00 /usr/libexec/getty Pc ttyv2
root 888 0.0 0.0 14420 1988 v3 Is+ 10:18PM 0:00.00 /usr/libexec/getty Pc ttyv3
root 889 0.0 0.0 14420 1988 v4 Is+ 10:18PM 0:00.00 /usr/libexec/getty Pc ttyv4
root 890 0.0 0.0 14420 1988 v5 Is+ 10:18PM 0:00.00 /usr/libexec/getty Pc ttyv5
root 891 0.0 0.0 14420 1988 v6 Is+ 10:18PM 0:00.00 /usr/libexec/getty Pc ttyv6
root 892 0.0 0.0 14420 1988 v7 Is+ 10:18PM 0:00.00 /usr/libexec/getty Pc ttyv7
admin 86531 0.0 0.1 23572 3560 0 Is 7:51AM 0:00.02 -tcsh (tcsh)
root 86533 0.0 0.1 50316 3476 0 I 7:51AM 0:00.02 sudo su
root 86534 0.0 0.1 47704 2820 0 I 7:51AM 0:00.01 su
root 86535 0.0 0.1 23572 3560 0 I 7:51AM 0:00.03 _su (csh)
root 86575 0.0 0.3 39212 10820 0 I+ 7:59AM 0:00.06 /usr/local/bin/perl -w /usr/local/bin/rsnapshot -v -c
root 86576 0.0 0.2 16428 9172 0 D+ 7:59AM 0:49.12 /bin/rm -rf /backup/remote/daily.3/
admin 86546 0.0 0.1 23572 3560 1 Is 7:52AM 0:00.01 -tcsh (tcsh)
root 86548 0.0 0.1 50316 3476 1 I 7:52AM 0:00.01 sudo su
root 86549 0.0 0.1 47704 2820 1 I 7:52AM 0:00.00 su
root 86550 0.0 0.1 23572 3732 1 S 7:52AM 0:00.03 _su (csh)
root 86710 0.0 0.1 18736 2332 1 R+ 10:13AM 0:00.00 ps aux
[root@colleen:/usr/home/admin] ps aux
^C
I ran into the same issues when running the same tests on a fresh 10.1 install.
Running
rm -rf /backup/remote/daily.3/
manually results in the same behaviour (on 10.1).
This issue only seems to effect our systems running UFS.
Does anyone have any ideas of why this is happening or should I be asking on the mailing list?
Last edited: