Solved To compress or not to compress a virtual disk?

mikkol

Member

Reaction score: 5
Messages: 77

If the filesystem is ZFS and the host is FreeBSD 12.1-RELEASE and I run virtual machines with VirtualBox-OSE, does it make sense to have the dataset in which the virtual drives are compressed or will that result in severe performance drop when new data is written to the disk images?
 

Sebulon

Aspiring Daemon

Reaction score: 142
Messages: 725

It may very well come at severe performance costs, depending on which compression algorithm you choose! For example if you choose 'compress=gzip9', well, it's going to suck, big time. But, if you choose 'lz4' instead, you may actually get better performance than without, since the host system won't have to spend as much time writing a bunch of pointless zeroes. Does 'lz4' have an impact on compute performance inside the guest? In my opinion, no, not at all. 'lz4' is as cheap as it gets, it's almost free in terms of compute power draw from the compression itself.

Hope it helps!
 
OP
M

mikkol

Member

Reaction score: 5
Messages: 77

Sorry, I was not clear. I'm thinking that if I have a fixed-size VDI or VMDK of 120GB and only 30GB of it is actually used, then what I have in the underlying filesystem is a file that is ostensibly 120GB but really occupies only 30GB and when I start writing data within the virtual computer to its hard drive, more space needs to be allocated for the virtual disk. I was worried about the performance penalty for this allocation of new space on the fly.
 

Sebulon

Aspiring Daemon

Reaction score: 142
Messages: 725

I have used a FreeBSD server with ZFS exporting NFS to a VMware cluster and have compared no compression vs. 'lz4', where the latter kicked the formers ass, big time, measuring with 'bonnie++' from inside a guest running on disks coming from that NAS, to put it differently :)

Sadly, I have since lost the numbers and changed employer since then, so you'll just have to take my word for it...
 

ralphbsz

Son of Beastie

Reaction score: 2,173
Messages: 3,128

It depends MASSIVELY on the data, whether it's compressible or not. For example, if the data is encrypted, it is very likely not compressible. If it is already compressed (often .bz2 files), it won't compress. Images (.jpg files) and movies can usually only be compressed by specialized compression methods, and even they don't get much compression out of them.

And then it depends on the access pattern. If your workload is archival (write once, read rarely, never delete or overwrite), the effort invested in compression is often worth it. If your workload is transactional (lots of little files that only live for seconds, like Hadoop processing, or even worse database updates in the middle of a file, then compression is likely performance killing. There are lots of things in between.

And then it depends on the underlying storage. You say you are running on top of VirtualBox. What do you know about the underlying storage? Is it already compressing internally? Compressing data multiple times in a row is not only pointless, it can be performance robbing if done carelessly.

There is no simple answer here.
 
OP
M

mikkol

Member

Reaction score: 5
Messages: 77

ralphbsz Thank you for your input. In my effort to be efficient, I have explained myself poorly. FreeBSD is the host operating system. The virtual computer is whatever it is and will be run by VirtualBox. My concern was not whether stuff will actually compress or not but whether creating a fixed-size virtual disk, which, initially, will mostly be filled with zeros, will later slow down the virtual machine if/when new data is written to it, thus increasing the actual data in the virtual disk (highly compressible pattern changes to poorly compressible pattern, FreeBSD will need to allocate more space for the virtual disk without its ostensible size changing).
 

Sebulon

Aspiring Daemon

Reaction score: 142
Messages: 725

I have never observed any issues with fixed size disks together with lz4-compression, since it tests whether or not the data is "worth" trying to compress. I do know however, that thin provisioned disks can be problematic, where there have been issues in QEMU/KVM where the guest is writing data faster or bigger than the pace of which the host is expanding it's virtual QCOW2 disk beneath, but that isn't an issue in this case, and has nothing to do with ZFS, or the compression.
 
OP
M

mikkol

Member

Reaction score: 5
Messages: 77

Sebulon Thanks. A note on the logic of the answer, though: LZ4 will have tested and found the disk image worth compressing because it is initially extremely compressible. So when the file is created, it will get compressed. However, when that compressible content is later replaced with something else, more space needs to be allocated for the virtual disk image regardless of what the compression algorithm will decide at that point on the necessity of compression. It's not the speed of compression or the compressibility that are the essence of my question. It's the speed of space allocation.
 

Sebulon

Aspiring Daemon

Reaction score: 142
Messages: 725

A note on the logic of the answer, though: LZ4 will have tested and found the disk image worth compressing because it is initially extremely compressible. So when the file is created, it will get compressed.
Yes, but ZFS doesn't compress "files", it works in 128k recordsizes, that gets compressed.
 
OP
M

mikkol

Member

Reaction score: 5
Messages: 77

I should mark this as double-solved. NOW it makes perfect sense.
 
Top