System responsiveness

frabron · Mar 5, 2014

Hi,

My FreeBSD system has a bad responsiveness when the hard drive load is high. With bad responsiveness I mean a long delay for the execution of commands, e.g. when commanding ls there is a delay for several minutes before I get the directory listing. The system is really close to unusable. I am monitoring the hard drive load with sysutils/atop. I am observing this behavior while importing a large dataset (results in +300 GB data) into my databases/postgresql93-server database. My server is running

Code:

>uname -a
FreeBSD bilbo.metrico 9.2-RELEASE-p3 FreeBSD 9.2-RELEASE-p3 #0: Sat Jan 11 03:25:02 UTC 2014     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

and has four harddrives in total, the system is on a gmirror software RAID 1 and the database data is stored on a gstripe RAID 0. The server has 32 GB memory and 4 GB swap, but which is never in use during the database import.

Here is the RAID configuration information:
Gmirror:

Code:

>gmirror list -a
Geom name: gm
State: COMPLETE
Components: 2
Balance: round-robin
Slice: 4096
Flags: NONE
GenID: 0
SyncID: 1
ID: 1477567550
Providers:
1. Name: mirror/gm
   Mediasize: 500107861504 (465G)
   Sectorsize: 512
   Mode: r5w5e14
Consumers:
1. Name: ada0
   Mediasize: 500107862016 (465G)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE
   Priority: 0
   Flags: DIRTY
   GenID: 0
   SyncID: 1
   ID: 3286445061
2. Name: ada1
   Mediasize: 500107862016 (465G)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE
   Priority: 0
   Flags: DIRTY
   GenID: 0
   SyncID: 1
   ID: 459509412

Geom name: gm.sync

Gstripe:

Code:

>gstripe list -a
Geom name: gs
State: UP
Status: Total=2, Online=2
Type: AUTOMATIC
Stripesize: 8192
ID: 1042782665
Providers:
1. Name: stripe/gs
   Mediasize: 1200254517248 (1.1T)
   Sectorsize: 512
   Stripesize: 8192
   Stripeoffset: 0
   Mode: r1w1e1
Consumers:
1. Name: ada2
   Mediasize: 600127266816 (558G)
   Sectorsize: 512
   Mode: r1w1e2
   Number: 0
2. Name: ada3
   Mediasize: 600127266816 (558G)
   Sectorsize: 512
   Mode: r1w1e2
   Number: 1

When monitoring the load with atop, the highest load is on the ada1 drive of the gmirror RAID and it is usually close to 90% or above. The program for the import is converters/osm2pgsql which uses 19.5 GB RAM of the total 32 GB, so there is still plenty for the database and system left. Any ideas why the system has such a bad response time and what I can do about it, and what the reason for the bad responsiveness might be?

SirDice · Mar 5, 2014

If your drives are more than 80% busy your bottleneck is I/O. The system will cache a lot in memory but it has to come from the harddrive first. This is were the bad responsiveness comes from. In short, you're pushing your drives to the limit and this has a detrimental effect on the entire system.

frabron · Mar 5, 2014

Thanks for your answer. Is this filesystem related? I don't have any numbers at hand, but I don't recall such a bad responsiveness from my other server which is running mostly on ZFS. And I don't remember such a behavior from my Linux systems, but my memory might fail me here. The concerned server is formatted with UFS:

Code:

>cat /etc/fstab
# Device                Mountpoint              FStype  Options Dump    Pass#                       
/dev/mirror/gms1a               /               ufs     rw      1       1                           
/dev/mirror/gms1b               none            swap    sw      0       0                           
/dev/mirror/gms1d               /var            ufs     rw      2       2                           
/dev/mirror/gms1e               /pgxlog         ufs     rw,noatime      2       2                   
/dev/mirror/gms1f               /usr            ufs     rw      2       2                           
/dev/stripe/gs                  /pgdata         ufs     rw,noatime      2       2                   
proc                            /proc           procfs  rw      0       0                           
fdesc                           /dev/fd         fdescfs rw      0       0

SirDice · Mar 5, 2014

Caching works a little different between UFS and ZFS, that may be the reason. Have a look with a command like iostat -x -t da 1 100. Specifically keep an eye on %b. If this gets over 80% your drives are extremely busy. Using faster drives and/or a faster controller might help. You can also try to nice(1) the import process. Importing may take a little longer but you should get better responsiveness on everything else that's running.

frabron · Mar 6, 2014

Hi,
I am still examining the bad responsiveness. It seems that the slow response times only apper when the RAID1 is close to its read/write maximum. The RAID0 doesn't suffer from the issue. Maybe because the disks are different? Or is it the RAID implementation?
Another thing I tried was to import a smaller dataset into my database. The initial data is 23GB in the Protobuf Format (Wiki, Data). The smaller dataset is "only" 12GB and I never experienced the bad response behavior. Now I wonder if, due to the large file size, there are other aspects coming into play, like the filesystem cache or other things. My server has 32GB Ram, so the 12GB might be better to handle than the larger, 23GB, file. Any ideas?

Crivens · Mar 6, 2014

How much memory is reported as wired when you do the import?

It may be (but unlikely) that the DB is trying to do this in-memory and thus has wired down the memory space (so no swapping) to help with this being an atomic transaction.

And maybe you can do some things with a geom scheduler, if it is still supported and available. That way you can do some traffic shaping on the IO.

frabron · Mar 10, 2014

I reduced the associated cache memory of the import program and that did help a bit, so it seems that there is a memory issue involved. But it seems I can't do much about it in the end, except bying faster disks maybe. I'll keep all your answers in mind, so thank you very much for them!

Crivens · Mar 10, 2014

On the weekend I did some re-installing of my file server at home (8.x -> 10-stable) where I also had some wired responsiveness problems. It seems to me that you get into trouble when you have 4k block discs and also a file system with compression. The block sizes do not pose much of a problem (I hope) as I did not partition them but let ZFS roam free on the whole platter. But with a file system set to use gzip compression, the responsiveness absolutely tanked when it was accessed. Reading/writing huge files there made the machine unusable up to the point where it needed rebooting.

It seems to me (read, this is a guesstimate, needs verification) that the compression code in ZFS is single-core code but it also locks vital data structures so a lot of threads get blocked out.

@OP: In your point, buying more memory and/or not letting the DB do all the caching by limiting it further) can get you a longer distance than buying more or faster discs. ZFS is usually very good at knowing what to cache, so in order to get the DB faster without throtteling the rest of the system I would first try to forbid the DB to cache itself and not limit the ARC to too small a number.

System responsiveness

frabron

SirDice

Administrator

frabron

SirDice

Administrator

frabron

Crivens

Administrator

frabron

Crivens

Administrator