FreeBSD 8.2 gets slower over time.

After I upgraded to FreeBSD 8.2 I'm seeing a strange behaviour where my system will after a few days of uptime start throttling and will eventually stop responding.

While it's throttling I can use the system, though it takes like 10 seconds before what I type will show on screen. I can't find any process which is under heavy load, nor are there any memory leaks. I can't find anything in the logs either.

The really strange thing though is that when I connect a screen directly to the server I see the flying chuck screensaver which work smooth. The system respond fast, right until that moment I try to login. After I've typed in the username and hit enter the system stops responding.

Have anyone had a similar problem before? What can I do?
 
It's a hardware issue, probably. Keep in mind that you get disk I/O while trying to login. Have you checked SMART attributes of your HDD?
 
This sounds a little bit like a dead lock. The strange thing is, that the system becomes slowly throttled. Maybe you should build a kernel with debugging options to figure it out.
 
This sounds like ... almost no data. Dmesg, logs (if possible), config ... etc. pp. Otherwise it's just wild guessing.
 
How did you upgrade your base system? Using buildworld or freebsd-update? Are you sure any system scripts and configuration files in /etc are up-to-date?
 
Hello, olav!
I've had similar problem on my desk with Radeon (x1950: its r500? if I remember correctly) in KDE4 (both with and without compositing), but not in KDE3 or non-de WMs.
 
If you don't have a monitor connected to the system, disable the console screen savers. All they are doing is wasting CPU/RAM/video resources. No point, if you can't see them. Just use the blank_saver.ko is you really need one; or simple configure the BIOS to turn off the video output after 15 minutes or whatever.

To help diagnose this, you should connect a monitor to the system, disable all screen savers and power saving, then login on separate virtual consoles and leave running:
  1. nothing, this is to catch console messages
  2. top(1)
  3. gstat(8)
  4. net-mgmt/iftop
  5. tail(1) -f of logs like /var/log/messages
  6. misc/gnu-watch running every 10-15 seconds outputting vmstat -i
  7. anything else that may be helpful
That way, when things slow down, you can just flip through the virtual consoles (ALT+F1 through ALT-F7) to get a snapshot of how the system is running, without having to login.
 
I used freebsd-update. I don't think there are any special configuration in /etc causing this.
It is a pure server, with no x-server.

Hey, I like the flying chuck screen saver. Everytime I see him, I feel proud as a FreeBSD user :)
My server is mostly idling and that screen saver doesn't steal that many cpu cycles :)

I've configured different virtual consoles as you suggested and will come back with more info when it happens again.
 
Not_relevant maybe, but if that server motherboard has onboard graphics, if you put in an aftermarket video card a *slight* chance the situation will improve.
 
Okey it happened again right now. Gstat showed me that the two mirrored OS disks have 100% load. I rebooted and now its fine again. What could be causing this? Gmirror status said that the mirror was okey.
 
I don't belive so as I use two different controllers and smart tests doesn't say anything.
 
As @phenix mentioned - what did gstat reported when you hit 100% disk utilization (which FS was busy)? What did top output say during that time? Did you verify the time when this started (maybe cron or periodic related) ?

You can use:
$ ps ax | grep fsck
to verify if fsck is running.
 
It's the swap partition which is causing this problem. Should I try to disable it?
 
I would not do that if I were you. Rather check what is actually using your swap. Sort the top output by size:

# top -o size
and check what is eating so much memory.

You can use # ps auxwww | awk '$8 ~ /.W.*/ { print $0}' to check swapped processes (once found this command in FreeBSD mailing lists).
 
Is the swap partition being used heavily? If so, you have something (or may things) using more memory than you have in the machine. You will definitely notice a slowdown if so.

What does this machine do all day?
 
The thing is, top show no activity. There are no visible processes causing the swap partition to overload. The server mostly idle, it runs a few jails, dns, ldap, ssh. Only the dns and ssh jails is exposed to the internet. It also act as a fileserver with ZFS. The server has 6GB ram, I've configured /boot/loader.conf with the vm.kmem_size="9G" property.

I get this output when I check swapped processes
Code:
[olav@zpool ~]$ ps auxwww | awk '$8 ~ /.W.*/ { print $0}'
root    124  0.0  0.0  2804     0  ??  IWs  -         0:00.00 adjkerntz -i
root   1044  0.0  0.0 16652     0  ??  IW   -         0:00.00 /usr/local/sbin/smartd -p /var/run/smartd.pid -c /usr/local/etc/smartd.conf
root   1591  0.0  0.0 38228     0  ??  IWs  -         0:00.00 sshd: olav [priv] (sshd)
smmsp  2602  0.0  0.0 12192     0  ??  IWs  -         0:00.00 sendmail: Queue runner@00:30:00 for /var/spool/clientmqueue (sendmail)
root   2609  0.0  0.0  8012     0  ??  SWs  -         0:00.00 /usr/sbin/cron -s

[CMD=""]top -o size[/CMD]
show this:
Code:
last pid: 18456;  load averages:  0.05,  0.01,  0.00  up 0+14:25:52  11:11:44
36 processes:  1 running, 35 sleeping
CPU:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 2228K Active, 56K Inact, 1164M Wired, 8640K Cache, 623M Buf, 144M Free
Swap: 4096M Total, 15M Used, 4081M Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
 1600 olav          1  44    0 38228K   548K select  1   0:00  0.00% sshd
 1591 root          1  44    0 38228K     0K sbwait  0   0:00  0.00% <sshd>
17370 root          1  44    0 34332K   704K select  0   0:01  0.00% smbd
 1068 root          1  44    0 34112K   160K select  0   0:00  0.00% smbd
 1072 root          1  44    0 34112K   116K select  0   0:00  0.00% smbd
 2756 root          1  44    0 26336K   132K select  1   0:00  0.00% winbindd
 1114 root          1  44    0 26308K   120K select  1   0:00  0.00% winbindd
18441 root          1  59    0 26260K  1004K select  0   0:00  0.00% sshd
 1073 root          1  44    0 26208K   176K select  0   0:00  0.00% winbindd
 2757 root          1  44    0 26196K   120K select  0   0:00  0.00% winbindd
 1062 root          1  44    0 24108K   608K select  0   0:02  0.00% nmbd
 1044 root          1  44    0 16652K     0K nanslp  0   0:00  0.00% <smartd>
 1601 olav          1  47    0 13356K     0K wait    0   0:00  0.00% <bash>
 2596 root          1  44    0 12192K   540K select  0   0:01  0.00% sendmail
 2602 smmsp         1  44    0 12192K     0K pause   0   0:00  0.00% <sendmail>
18454 olav          1  44    0  9408K   968K CPU0    0   0:00  0.00% top
  888 root          1  44    0  8012K   112K select  1   0:00  0.00% rpcbind
 2609 root          1  53    0  8012K     0K nanslp  0   0:00  0.00% <cron>
  866 root          1  44    0  7084K   156K select  0   0:00  0.00% syslogd
 1195 root          1  76    0  7020K    56K select  1   0:00  0.00% rsync
 1003 root          1  44    0  6952K    72K select  0   0:00  0.00% mountd
 2681 root          1  76    0  6952K    72K ttyin   0   0:00  0.00% getty

This is information which is available when the system starts throttling.
I should also mention that I've also noticed now that the /usr partition also show some activity when the system overuse the swap folder.

After reboot top show something interesting
Code:
last pid:  3277;  load averages:  0.05,  0.01,  0.00   up 0+00:39:05  12:21:15
80 processes:  1 running, 79 sleeping
CPU:  0.0% user,  0.0% nice,  0.4% system,  0.8% interrupt, 98.9% idle
Mem: 71M Active, 40M Inact, 1558M Wired, 428K Cache, 30M Buf, [color="Red"]4187M Free[/color]
Swap: 4096M Total, 4096M Free
 
Well, something is truly strange. You have 6 GB of RAM, but your first top output doesn't indicate more than about 2 GB...
 
Indeed it seems you've "lost" some memory between reboots. I bet you have bloody lot of swapping due to ZFS and very low memory.
Check if your system detects memory correctly each time:

# grep -i "real memory" /var/log/dmesg.*


You can also use sysutils/dmidecode from ports to check how the system seems memory banks and modules.

e.g. you can use:
# dmidecode --type=16,17
to list memory banks (Physical Memory Array) and it's modules (Memory Device).

You should reseat memory modules and do a memtest+ check to verify you have no (further) HW problem.
 
aragon said:
Well, something is truly strange. You have 6 GB of RAM, but your first top output doesn't indicate more than about 2 GB...
Yes, finally data that was asked for so long ago.

Usually this type of symptom can be resolved by a BIOS update.
 
Well if they haven't disabled sendfile it's a guarantee. And that patch doesn't resolve all ZFS sendfile issues, it should still be disabled. The was a recent thread on stable@ for anyone interested. However, that would have nothing to do with the limited amount of RAM made available to the system which is a separate problem, pretty common on re-purposed Dell's but not limited to them.

That's why when dmesg was requested and not given, it greatly extents the time to resolution.
 
Back
Top