ZFS and memory requirements for good performance

Ok I a currently have zfs running on a heavy server and some lighter servers.

On the heavy server which I have posted about before is ongoing zfs related issues and its already planned to change the filesystem due to provlems lasting for over a year, however before then I am keen to see if the issue is fixable. Only recently I made a discovery regarding the metadata and some causes of issues.

It emerged a few days ago that one such specific problem was listing directories with many files on zfs, doesnt matter whether its 'ls' or 'find' or 'rsync'. An example of real world usage that can create this scenario is a Maildir config with large inboxes of unread email's. The server gets rsynced to another server which uses UFS so I can make direct comparisons on performance.

This server has in particular 2 large maildir's.

One has 344948 files inside it.

On UFS to list the directory file list takes 1 second to start listing. (In terms of actually reading the list once its started is limited by output speed so I am ignoring that for this discussion.)

On ZFS if the server has recently been rebooted so ARC has no memory constraints it takes about 4-5 seconds to start listing, during that 4-5 seconds i/o load will be very high. If I relist very shortly after the first list it takes only 1 second and is no i/o load however even with no memory constraints the data isnt cached long maybe 2 minutes or so.

Now the real problem. If the server has been running a while lets say 1 day then the listing time goes through the roof, according to logs, at 1 day uptime this morning it took 1 hour 20 minutes to list the dir. For this time the i/o load is extremely high and affects everything running on the server in response time. The reason its listed every day is the control panel managing the accounts has to run disk tallies on a daily basis to calculate usage. This is normal real world usage not unusual for hosting servers.

So the first recent discovery is I found whats causing large issues (the large directories), the second is I found the possible cause.

Here is some current sysctl values.

Code:
vfs.zfs.arc_meta_limit: 4294967296
vfs.zfs.arc_meta_used: 4294928048

As can see its only been up 2 hours.

Code:
7:11AM  up  2:27, 3 users, load averages: 0.44, 0.58, 0.59

Originally the metadata limit was just 800meg auto tuned, and was very severely constrained with 1.6gig used, it was bumped to 2 gig which wasnt enough, now is bumped to 4 gig and is already saturated after 2 hours. The reason is after the reboot I ran a manual tally run and this was enough to saturate 4 gig of metadata cache. Listing files uses metadata cache.

Also note this value.

Code:
kstat.zfs.misc.arcstats.evict_skip: 606

This goes up during slowdowns of listing files when metadata cache is saturated, as it happens 606 is incredibly low so the 4 gig limit is possibly almost enough, normally its 100s of thousands.

However I then decided to manually list one of the large dirs again. It took 45 seconds to start listing very slow and the evict value only went from 606 to 608 this time and of course barely 1 second on a repeat list.

I did find a few complaints of similiar issues from both FreeBSD and solaris users, the FreeBSD users had no resolution they either switched to UFS or put up with it, the solaris had a developer who stated that it was a core zfs issue and it needed some major rewrite of code to resolve but then of course zfs development got stagnated shortly after. I dont own this server and have no control of these users cleaning up their mailboxes. That is a problem but also a problem that UFS doesnt have the same performance penalty.

Also to point out this issues is worsened by enabling prefetch.

Things I am considering.

Further increasing metadata limit but ram isnt unlimited.
Setting primarycache to metadata only to allow me to set metadata limit to same size as ARC and ensure no file cache pushes metadata out, I suspect this will have a large performance impact tho.
Toggling metadata compression setting, currently its on which is the default.

From what I have gathered lstat is the problem zfs is slow at it. IF I attach truss to processes taking a long time to run its always on lstat.

This server also still uses UFS as well, since FreeBSD still has no setting to restrict UFS cache usage I still have the problem of when UFS cache grows it makes the ARC shrink.

What zfs does have going for it is I have yet to see a single filesystem related corrupted file on this filesystem even with all the heavy load its tolerated, and fsck has never had to run on it either. zfs snapshots have been very useful as well.
 
Back
Top