Solved What is causing high loads on my server?

Hi,

I have a problem that users complain about the fileserver being very slow. So, naturally, I checked if there is something wrong. When I run top, I see that the average load is indeed very high. But I can't figure out why.
Code:
last pid: 12473;  load averages:  0.94,  0.94,  1.00                                                                                                                                                                  up 1+16:56:48  11:26:13
94 processes:  1 running, 93 sleeping
CPU:  0.9% user,  0.0% nice,  4.3% system,  0.8% interrupt, 94.0% idle
Mem: 212M Active, 157M Inact, 8654M Wired, 1755M Buf, 53G Free
Swap:

  PID USERNAME       THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
50667 root             1  24    0   292M 36332K zio->i  3  94:49  3.27% smbd
88348 root             1  20    0 48416K 17440K zio->i  2   1:53  0.39% afpd
74608 root             1  24    0   319M 38988K select  7 212:54  0.29% smbd
51674 root             1  20    0 64800K 34572K select  2   5:02  0.00% afpd
91756 root             1  20    0 40224K 10152K select  7   0:55  0.00% afpd
  218 root             1  20    0 40224K 10732K select  7   0:54  0.00% afpd
92716 root             1  20    0 44320K 11600K zio->i  2   0:52  0.00% afpd
88616 root             1  20    0 40224K  8416K zio->i  1   0:47  0.00% afpd
93286 root             1  20    0 40224K  9784K select  0   0:42  0.00% afpd
1647 root             1  52    0 32032K  4284K select  4   0:35  0.00% afpd
92328 root             2  20    0   327M 45444K kqread  6   0:17  0.00% smbd
92296 root             2  20    0   331M 43180K kqread  2   0:16  0.00% smbd
The 2 main daemons running in this server are samba41-4.1.17 and netatalk-2.2.5_5,1. As far as I can see, they are not using much CPU.

How can I figure out what is causing constant loads around 1 (and higher) and why?
 
Looking at CPU is one thing and smbd is consuming a lot. I'd look at the network too.
And of course at the logfiles. Try making them more verbose if necessary.
 
Ok, this questing is probably going to sound very stupid. But... As far as I can see, smbd is using 3.27% CPU. That is a lot? I figured it wasn't that much. :confused:

And how do I monitor networkload on FreeBSD.

Thanks for your help so far :)
 
Your load average is not high! In order to give you some tips, can you please provide the following:
  • Full system specs (including the ZPOOL configuration)
  • FreeBSD version
  • How many users are accessing the server?
  • Are you doing any tuning?
  • What type of service causes the most complaints? (afpd or smbd)
Last but not least. Have you tried to copy file from and to the server yourself?
 
The server has:
  • 2 Xeon E5-2407 CPU's
  • 64GB DDR3-1600 ram
  • 2 Seagate Cheetah 15K.7 300GB disks in gmirror, where FreeBSD is installed
  • 10 Seagate Constellation ES.2 3TB disks in a zpool, where the data is stored.

zpool configuration:
Code:
  pool: Octavo
state: ONLINE
  scan: scrub repaired 0 in 4h8m with 0 errors on Sun Apr  5 04:07:37 2015
config:

    NAME        STATE     READ WRITE CKSUM
    Octavo      ONLINE       0     0     0
     raidz1-0  ONLINE       0     0     0
       da1     ONLINE       0     0     0
       da0     ONLINE       0     0     0
       da4     ONLINE       0     0     0
     raidz1-1  ONLINE       0     0     0
       da11    ONLINE       0     0     0
       da9     ONLINE       0     0     0
       da7     ONLINE       0     0     0
     raidz1-2  ONLINE       0     0     0
       da5     ONLINE       0     0     0
       da10    ONLINE       0     0     0
       da8     ONLINE       0     0     0
    spares
     da6       AVAIL
Code:
               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
Octavo      4.48T  19.9T    246    167  14.1M  2.61M

FreeBSD version:
FreeBSD 9.1-RELEASE

Around 20-30 users are accessing the server. About 10 of those users use Macs and access the server through Netatalk. The rest uses Windows and access the server through Samba.

I did try some tuning on the ZFS:
Code:
kern.maxfiles="4096000"

zfs_load="YES"
vm.kmem_size="4096M"
vm.kmem_size_max="20480M"
vfs.zfs.arc_min="2048M"
vfs.zfs.arc_max="15360M"
The kern.maxfiles value is the result of some problem I got with Samba41 and zfs acls. But that seems to be fixed now, so perhaps I could restore that to the default value.

The Mac users complain the most. But they are also the most heavy users, with large files. But I also get complaints from the Windows users.

This morning I copied a 150GB file from the server to my own PC. That went with a steady 80MB/sec. I couldn't say that went very slow. Copying a 4GB file went with 40MB/sec in around 2 minutes. Also not very slow, if you ask me. Both files were copied using the SMB protocol.
 
Have you actually watched any of these users copy/access files? Users will often complain a file is slow to load/copy when they are moving massive amounts of data around and expect it to be as fast as a file stored locally.

From what you've posted it seems your load average is actually quite low (under 1.0), your disk wait percentage is low, your CPU usage is low and you are copying large files in reasonable amounts of time. It sounds as though everything is working fine and your users may have unreasonable expectations amount network transfer speeds.

My next step would be to visit the users and time for yourself how long it takes for them to perform operations and then decide if you have a technical problem or a "managing expectations" problem.
 
I did try some tuning on the ZFS:
Code:
kern.maxfiles="4096000"

zfs_load="YES"
vm.kmem_size="4096M"
vm.kmem_size_max="20480M"
vfs.zfs.arc_min="2048M"
vfs.zfs.arc_max="15360M"

Your tunables are actually "choking" the server. You have 64GB of RAM and you are limiting the ARC to 15.36GB.

Upgrade to 9.3-RELEASE and get rid of:
Code:
vm.kmem_size="4096M"
vm.kmem_size_max="20480M"
vfs.zfs.arc_min="2048M"
vfs.zfs.arc_max="15360M"
 
I'll do that tonight.

The upgrade to 9.3 is scheduled too. But not tonight :D

Don't wait that long. Your version is EOL. Also, there are a lot of ZFS improvements. Be sure to upgrade your pool after too.

If you need tuning, after you upgrade to 9.3, then and only then:

loader.conf: (for 64GB system)
Code:
vfs.zfs.arc_max=57381908480
vm.kmem_size=85820247040
sysctl.conf:
Code:
kern.ipc.maxsockbuf=2097152
net.inet.tcp.delayed_ack=0
net.inet.tcp.recvbuf_max=2097152
net.inet.tcp.sendbuf_max=2097152
 
Wow, thank you for thinking with me. That is so nice! :)

Perhaps you could point me to some info that explains in more detail what those parameters actually mean? And why you chose the values you're suggesting?

I just stripped those ZFS tweaks from loader.conf and rebooted the server. And it actually seems to have a positive result! I immediately got reactions from the users that the server feels faster and more responsive. So, thank you for your advice! :beer:
 
You are welcome!

The tunables regarding vfs.zfs.arc and vm.kmem_size were provided to me by IX Systems for a system (file serving) that I bought recently with 64GB of RAM. Actually, after FreeBSD 9.X you should not mess with vm.kmem_size unless you really know what you are doing. My server is sharing over NFS content, mainly videos and pictures to 4 web servers. I have managed to saturate 2Gbit with LACP and the load average is around 1.5 and it tops at ~3 during heavy writes.

The other tunables are pretty standard.

That said, I am running FreeBSD 10.1-STABLE. I strongly recommend that you upgrade to 9.3-RELEASE.
 
Back
Top