GPLv2 versus CDDLv1

jrm@

Developer
For anyone curious, here is a short attempt to clarify (according to the Software Freedom Conservancy) the incompatibilities between the GPLv2 and the CDDLv1.

https://sfconservancy.org/blog/2016/feb/25/zfs-and-linux/

I'm no lawyer, but here is my (overly simplified) interpretation.

According to the GPLv2, if you combine code under GPLv2 with code under some other license, the resulting product must be licensed under the GPLv2.
According to the CDDLv1, if you combine code under CDDLv1 with code under some other license, the resulting product must be under the CDDLv1.
 
As I understand it, the first statement, that GPL is infectious license is true, however the second one is not. CDDL require only not changing license to the covered code and it is source of given problem. GPL requires something (GPL license) for whole result and CDDL forbids it for its part at same time.
 
Yeah, after reading the text again that makes sense.

CDDLv1 said:
[§]3.1 Any Covered Software that You distribute or otherwise make available in Executable form must also be made available in Source Code form and that Source Code form must be distributed only under the terms of this License.
[§] 3.4 You may not offer or impose any terms on any Covered Software in Source Code form that alters or restricts the applicable version of this License

Software Freedom Conservancy said:
CDDLv1 is a weak copyleft license in that it allows you create a binary work with components under different terms (see CDDLv1§3.6). However, as seen in the text above, with regard to the specific copyrighted material already under CDDLv1, that material must remain only licensed under the terms of the CDDLv1. Furthermore, when redistributing the source code, you cannot alter the terms of the license on that copyrighted material.

In any case, it sounds like Canonical and the Conservancy will sort this one out. We're fortunate that the BSD and CDDL licenses are compatible.
 
CDDL might not be the biggest licensing problem with ZFS but this definitely is:

The reason why ZFS requires so much memory is because it includes its own separate cache system based on the ARC algorithm. It's a fantastic algorithm, but it certainly violates IBM patents, which is why it was removed from PostgreSQL and omitted from Linux. Perhaps Oracle has a license or a sufficient patent portfolio to protect Solaris and Unbreakable Linux users, but the CDDL does not confer any patent protection for ZoL or FreeBSD users.

and interesting discussion

https://lwn.net/Articles/631743/


It looks like Ubuntu 16.04 will come with ZFS kernel module loaded. Also interesting

http://www.bsdnow.tv/episodes/2015_01_06-zfs_in_the_trenches

which scared the shit out of me since I have few ZFS pools which are almost 80% full. If it really takes a month to resilver almost full ZFS pool in the case of HDD failure I will regret the day I picked ZFS over hardware RAID.
 
which scared the shit out of me since I have few ZFS pools which are almost 80% full. If it really takes a month to resilver almost full ZFS pool in the case of HDD failure I will regret the day I picked ZFS over hardware RAID.

But doesn't hardware RAID take even longer since it resilvers all blocks, not just the blocks that had data?
 
I've read that the compiled ZFS module effectively becomes GPLv2 when distributed with Linux, which mean the source code needs to be available. The source code is available of course but the GPL license also requires that to be GPL, which it isn't. Also I've seen claims from the people behind this issue that while non-GPL code can be distributed with Linux, it can only make use of 'Non-GPL' kernel APIs. Apparently Canonical have actually re-implemented some GPL-only Linux functions in ZoL (by basically copying code) in order to get around this so when ZoL is loaded, it can claim it doesn't need any GPL-only exports.

-- Begin Offtopic :/ --

But doesn't hardware RAID take even longer since it resilvers all blocks, not just the blocks that had data?

Unfortunately not. If you build a new system with a small amount of data on it, ZFS will resilver in seconds - great; A traditional RAID controller might take several hours. However, once you have a decent amount of data on it, and you've been writing to the pool for a few months/years (so the data is spread out all over the place), the hardware RAID will still take several hours, but ZFS may take days if not longer.

As someone who loves ZFS and has used it since FreeBSD 7, and use it all over the place, I consider this one of the biggest issues with ZFS at the moment. When you resilver a disk, ZFS has to read the metadata for every single record on the pool in order to find out which records need rebuilding. These records can be all over the place, and are read in transaction order which causes massive amounts of heavy IO. This stuff hits you at the worse possible time, when you're in a degraded state and having to put the pool under heavy load to get it back online.

Oracle have massively improved this in their ZFS. They now have a two stage resilver process. The first stage reads all the metadata for each record, and builds a list of the records that need to be resilvered. The second stage puts those records in disk offset order, then starts the rebuild, so that all the actual data gets read in sequential order (or as close to sequential as it can get). It still probably isn't as quick as standard RAID for near-full pools, as the standard RAID can just instantly start reading through the disks from block(0)-block(EOF), but it's a lot closer. I'm sure people like Matt Ahrens are aware of this and keep an eye on what Oracle are doing with their ZFS, but I really think something like this needs looking at in OpenZFS.
 
Unfortunately not. If you build a new system with a small amount of data on it, ZFS will resilver in seconds - great; A traditional RAID controller might take several hours. However, once you have a decent amount of data on it, and you've been writing to the pool for a few months/years (so the data is spread out all over the place), the hardware RAID will still take several hours, but ZFS may take days if not longer.

As someone who loves ZFS and has used it since FreeBSD 7, and use it all over the place, I consider this one of the biggest issues with ZFS at the moment. When you resilver a disk, ZFS has to read the metadata for every single record on the pool in order to find out which records need rebuilding. These records can be all over the place, and are read in transaction order which causes massive amounts of heavy IO. This stuff hits you at the worse possible time, when you're in a degraded state and having to put the pool under heavy load to get it back online.

Oracle have massively improved this in their ZFS. They now have a two stage resilver process. The first stage reads all the metadata for each record, and builds a list of the records that need to be resilvered. The second stage puts those records in disk offset order, then starts the rebuild, so that all the actual data gets read in sequential order (or as close to sequential as it can get). It still probably isn't as quick as standard RAID for near-full pools, as the standard RAID can just instantly start reading through the disks from block(0)-block(EOF), but it's a lot closer. I'm sure people like Matt Ahrens are aware of this and keep an eye on what Oracle are doing with their ZFS, but I really think something like this needs looking at in OpenZFS.

This was very informative. Sorry to continue with the off topic bits, but I wonder if it might be the case that adding a separate log device during the initial pool creation might mitigate this to some extent? I've read elsewhere that it seems to be the meta-data that gets bounced around a lot in the vdev as it is being written to, thus leading to all sorts of fragmentation. Supposedly by having a separate log device it is possible to transactionally mitigate fragmentation to some extent by having the data arranged in a somewhat more orderly fashion on it before feeding this data to the spinning disks. Is this the case?
 
Yet another pissing contest on what exactly a "derivative work" means.

The funny thing is that the CDDL is more "free" than the GPL, and it's the GPL that's the problem here if you ask me. Not the CDDL.

The position of the FSF/GNU folks is obvious: "Code that is not GPL is an abomination unto the LORD. If thou runs this code, it shall be an abomination unto you."
 
Back
Top