I thought perhaps it would be useful to start a generic FAQ to help explain some common questions about ZFS in a somewhat non-technical medium ..
feel free to add your own or add incite into the thread.
Q: Who should use ZFS?
A: the development focus for zfs is: performance, capacity, and data integrity. The project its self is based on the sole concept of "the importance of DATA" .. However, zfs is not for EVERY work load. It has many uses. IE a UFS raid stripe will almost ALWAYS be faster than a zfs strip, simply put there is no overhead to ensure integrity, or copy on write or to ensure zfs permissions and such. So if raw data squishy power is what you need, perhaps UFS is a better choice.
To find out if zfs is right for you, you only really need to ask yourself one question.
"Is my data important?" if you answer yes than you probably will benefit from a zpool.
some examples of where zfs shines:
high availability across multiple machines/locations
improved syncing with zfssend/recieve
snapshots and recovery
replication
automated interaction with jails and bhyve instances
Q: I have 12x8TB drives in a pool.. should I not have 96TB?
A: NO drives in a zpool are still based on the formatted capacity of the raw drive.. for example, an 8TB drives formatted capacity regardless of the file system is 7.27TB. The total zpool capacity will also factor in the zpool type (aka raidz2 or zrad3) with use 2 or 3 vdevs for redundancy.
Example:
12x7.27 = 87.2T ... less 7.27x3 (for zraid3) = 65.45T of net useable space after "tax"
Q: Why is my free space different between zpool list and zfs list?
A: zpool list shows the total raw capacity of the drives AND the required overhead. For example your redundancy is accounted for, as well as ALL of the pointer data associated to snapshots. VS zfs list shows the free space that is available to the pool
Example:
when you add 23.8 + 37.6 = 61.4T .. which is actually the total raw free space from the original 65.45T .. this include ALL of the raw data associated required for the ENTIRE pool. (so in my case there is about 4T of snapshot and file system information)
vs
the zpool list command only reporting the total, formatted capacity of the entire pool. the total allocated includes all the overhead of the filesystem for example snapshot data less the space marked as free.
It is best to always refer to ZFS LIST for the most accurate human readable free space
Q: I deleted a file, but no space was recovered
A: zfs never actually deletes files. It deletes pointers and marks the space as free. that been said if the file exists within a snapshot, the file still actually consumes space.
Example:
before
AKA - Destroy snapshots for 2020-03 (all march 2020 snapshots)
after
As you can see 4.9T of space was released (aka marked as free) when the snapshots were actually merged/deleted
Q: What is the deal with memory?
A: as a rule the more memory the better. This is because zfs uses a variety of caching methods, all of which can take advantage of system resources. The most common one is called the ARC cache.
ARC on its own deserves an entire book. its confusing and often misunderstood. In short .. we care about 2 settings.
MFU - Most Frequently Used
MRU - Most Recently Used
when ever your pool accesses data it creates pointers to spaces on the disk. This cache is updated all the time and allows the system to use the pool in an efficient way. Its also important to ensure you have enough memory.
for example, zfs writes to disk in the largest possible chunks. so if you have 8GB of ram .. by default, it will write in about 7GB chunks. if you have 128GB then your writing 112 GB continuous writes will be drastically larger. This has many benefits such as better contiguous writes ..
do not work on 40GB red video files on a zpool with 8GB of ram and expect good performance... if your working on large files. make sure you have tons of ram.
Q: Do I need ECC Memory?
A: the short answer is "Yes and No". There are several threads that discuss this topic in depth. The long and the short of it is. ECC is always preferred however DDR ram will work in most cases. As stated above its way more important to have MORE ram.
ECC ram helps protect against memory bit flips and other issues that could arise by unchecked values in memory that change. ZFS as a filesystem uses those values in respect to a pointer in ARC cache. but when data is written, itself contained and will hash all data independently of what is in memory. As some people have noted bit flipping and having a cache pointer changed as EXCEEDINGLY rare.
I would generally live by the rule of .. if your pool is 24/7 prod environment. You should be using ECC ram.. but If its for a home user, or a nas or plex server. chances are you will never notice any issue between the two.
Q: HELP!!! ARC Cache is using ALL of my memory!
A: relax! that’s by design. By default the ARC cache will consume as much system memory as possible, leaving (I believe 5%) for the system at all times .. so, as your system requests memory the ARC will by default release it as needed. For the most part you do not want to mess with the settings, however if you want to specifically limit it. see https://docs.freebsd.org/en/books/handbook/zfs/#zfs-advanced
Hopefully this helps..
stay tuned for an update on understanding pool types and workloads in plain engrish.
feel free to add your own or add incite into the thread.
Q: Who should use ZFS?
A: the development focus for zfs is: performance, capacity, and data integrity. The project its self is based on the sole concept of "the importance of DATA" .. However, zfs is not for EVERY work load. It has many uses. IE a UFS raid stripe will almost ALWAYS be faster than a zfs strip, simply put there is no overhead to ensure integrity, or copy on write or to ensure zfs permissions and such. So if raw data squishy power is what you need, perhaps UFS is a better choice.
To find out if zfs is right for you, you only really need to ask yourself one question.
"Is my data important?" if you answer yes than you probably will benefit from a zpool.
some examples of where zfs shines:
high availability across multiple machines/locations
improved syncing with zfssend/recieve
snapshots and recovery
replication
automated interaction with jails and bhyve instances
Q: I have 12x8TB drives in a pool.. should I not have 96TB?
A: NO drives in a zpool are still based on the formatted capacity of the raw drive.. for example, an 8TB drives formatted capacity regardless of the file system is 7.27TB. The total zpool capacity will also factor in the zpool type (aka raidz2 or zrad3) with use 2 or 3 vdevs for redundancy.
Example:
Code:
root@abyss:/usr/local/sbin # zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
abyss 87.2T 37.7T 49.6T - - 1% 43% 1.00x ONLINE -
12x7.27 = 87.2T ... less 7.27x3 (for zraid3) = 65.45T of net useable space after "tax"
Q: Why is my free space different between zpool list and zfs list?
A: zpool list shows the total raw capacity of the drives AND the required overhead. For example your redundancy is accounted for, as well as ALL of the pointer data associated to snapshots. VS zfs list shows the free space that is available to the pool
Example:
Code:
zfs list
NAME USED AVAIL REFER MOUNTPOINT
abyss 23.8T 37.6T 256K /zroot
when you add 23.8 + 37.6 = 61.4T .. which is actually the total raw free space from the original 65.45T .. this include ALL of the raw data associated required for the ENTIRE pool. (so in my case there is about 4T of snapshot and file system information)
vs
Code:
root@abyss:/usr/local/sbin # zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
abyss 87.2T 32.8T 54.5T - - 1% 37% 1.00x ONLINE -
the zpool list command only reporting the total, formatted capacity of the entire pool. the total allocated includes all the overhead of the filesystem for example snapshot data less the space marked as free.
It is best to always refer to ZFS LIST for the most accurate human readable free space
Q: I deleted a file, but no space was recovered
A: zfs never actually deletes files. It deletes pointers and marks the space as free. that been said if the file exists within a snapshot, the file still actually consumes space.
Example:
before
Code:
root@abyss:/usr/local/sbin # zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
zpool 87.2T 42.6T 44.7T - - 1% 48% 1.00x ONLINE -
root@abyss:/usr/local/sbin # zfs list -H -o name -t snapshot | grep -i '2020-03' | xargs -n1 zfs destroy
AKA - Destroy snapshots for 2020-03 (all march 2020 snapshots)
after
Code:
root@abyss:/usr/local/sbin # zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
abyss 87.2T 37.7T 49.6T - - 1% 43% 1.00x ONLINE -
As you can see 4.9T of space was released (aka marked as free) when the snapshots were actually merged/deleted
Q: What is the deal with memory?
A: as a rule the more memory the better. This is because zfs uses a variety of caching methods, all of which can take advantage of system resources. The most common one is called the ARC cache.
Code:
ARC: 89G Total, 10G MFU, 78G MRU, 600K Anon, 193M Header, 28M Other
85G Compressed, 86G Uncompressed, 1.01:1 Ratio
ARC on its own deserves an entire book. its confusing and often misunderstood. In short .. we care about 2 settings.
MFU - Most Frequently Used
MRU - Most Recently Used
when ever your pool accesses data it creates pointers to spaces on the disk. This cache is updated all the time and allows the system to use the pool in an efficient way. Its also important to ensure you have enough memory.
for example, zfs writes to disk in the largest possible chunks. so if you have 8GB of ram .. by default, it will write in about 7GB chunks. if you have 128GB then your writing 112 GB continuous writes will be drastically larger. This has many benefits such as better contiguous writes ..
do not work on 40GB red video files on a zpool with 8GB of ram and expect good performance... if your working on large files. make sure you have tons of ram.
Q: Do I need ECC Memory?
A: the short answer is "Yes and No". There are several threads that discuss this topic in depth. The long and the short of it is. ECC is always preferred however DDR ram will work in most cases. As stated above its way more important to have MORE ram.
ECC ram helps protect against memory bit flips and other issues that could arise by unchecked values in memory that change. ZFS as a filesystem uses those values in respect to a pointer in ARC cache. but when data is written, itself contained and will hash all data independently of what is in memory. As some people have noted bit flipping and having a cache pointer changed as EXCEEDINGLY rare.
I would generally live by the rule of .. if your pool is 24/7 prod environment. You should be using ECC ram.. but If its for a home user, or a nas or plex server. chances are you will never notice any issue between the two.
Q: HELP!!! ARC Cache is using ALL of my memory!
A: relax! that’s by design. By default the ARC cache will consume as much system memory as possible, leaving (I believe 5%) for the system at all times .. so, as your system requests memory the ARC will by default release it as needed. For the most part you do not want to mess with the settings, however if you want to specifically limit it. see https://docs.freebsd.org/en/books/handbook/zfs/#zfs-advanced
Hopefully this helps..
stay tuned for an update on understanding pool types and workloads in plain engrish.
Last edited: