zfs locking up regularly

Ok I have a server which has heavy i/o load. It is one I have posted about a few times before. Recently it has been locking up multiple times a week and usually when under heavy i/o load. The issue is zfs files becomes inaccessible the system just sits when trying to access them and as such effectively hangs.

There is no errors reported in logs, zfs just stops responding. Today I did a scrub and this proceeded to accelerate the hang and it stopped responding in under an hour.

Clearly my experience with zfs on a production system hasnt been great there has been several issues having to deal with, some resolved some not. This server is soon to be migrated to cloudlinux but ideally I would like to get to the bottom of this issue so I know what the problem might be, whether its a zfs bug I can do nothing about or something that has a patch or can be tuned.

Various settings have been tuned and back with little affect such as toggling ahci, queue depth, arc size, metadata cache size, write throttling, prefetch and probably more.

The majority of the filesets use fletcher4 checksum and lzjb compression. It is zfs raid mirror setup with 2 drives using zfsroot meaning I dont have the flexibility to add new log or cache devices.

I do help manage a 2nd zfs server as well which has none of the issues this server has had but the other server is a much lower i/o load.
 
Posting some more information like, FreeBSD version, ZFS version, pool(s) configuration, memory, dedup ... would really help us help you.

chrcol said:
The majority of the filesets use fletcher4 checksum and lzjb compression. It is zfs raid mirror setup with 2 drives using zfsroot meaning I dont have the flexibility to add new log or cache devices.

This is not true. You can add anything you want even in a ZFS on root installation.
 
Back
Top