I gather a ZFS filesystem uses a variable block size, between ashift and recordsize, but that the block size of a zvol is fixed to volblocksize.
As such, when sharing a zvol to a iSCSI client, there are four distinct block sizes to consider:
It strikes me that disparity between these could lead to a lot of wasted writes, and also wasted space due to metadata. One or more of the following must surely happen:
Given these:
As such, when sharing a zvol to a iSCSI client, there are four distinct block sizes to consider:
- The volblocksize of the zvol (defaults to 8KB)
- The BlockLength of the iSCSI target (currently recommended 512B)
- The MTU of the iSCSI network layer (commonly ~1500B or ~9000B)
- The block size or allocation unit of the filesystem, as created by the OS of the iSCSI client (e.g. in WinNT/NTFS, this defaults to 4KB)
It strikes me that disparity between these could lead to a lot of wasted writes, and also wasted space due to metadata. One or more of the following must surely happen:
- The iSCSI client might break up its 4K writes into little 512B I/Os (suboptimal network transfer)
- The iSCSI client might break up its 4K writes into 1500B I/Os (requiring TCP recompilation in the network stack)
- The iSCSI server might write each of its 512B writes to a separate ZFS record (significant bloat from metadata, suboptimal compress/dedupe unit)
- The iSCSI server might batch up its 512B writes into 8K blocks (latency between receipt and commit)
- The iSCSI server might ignore its own block-length and accept a 4K block from the client OS filesystem, and write that as a single ZFS record
Given these:
- Which (if any) of the above best describes the actual I/O operations?
- What is the actual size of the record, written to the ZFS pool, for any arbitrary write (i.e. the one to which ZFS metadata is written and compression/dedupe might be applied)?
- What are the "best" values to "match up"? volblocksize and FS allocation unit?
- What is the average size of ZFS metadata per written record?