The following is something of a thought-exercise, hoping to prompt discussion from interested or informed parties.
I've been thinking a lot about the process involved in booting a ZFS-based FreeBSD system, specifically whether it could be made possible to boot a system from a pool comprised of whole-disks. I'm a strong believer in using whole-disks for pools whenever possible, due to the benefits it brings: elimination of partition alignment issues on devices with >512 byte sectors, greater ease of device replacement in the event of failure, to name but two. As we are aware though, there is currently no way of booting a system from a pool that spans whole disks, either in FreeBSD or in Solaris.
I've worked around this limitation on some of my systems by booting from a BSD or GPT partitioned USB stick containing a UFS filesystem with a copy of the /boot directory. The kernel and modules are loaded from the stick, then at the end of kernel initialisation, the root filesystem is mounted from the whole-disk pool. This is an acceptable solution on my system as I'm able to install the USB stick inside the system where it can't easily be removed, but this method might not suit everyone. There's also a slight inelegance in copying the /boot directory to the stick and then having to keep it synchronised with the original in the root filesystem.
The first thing I considered is whether it is possible to build an third type of zfs boot image. The existing two zfs boot images, zfsboot and gptzfsboot are clearly designed to probe for zpools based on MBR and GPT partitions respectively. This third type would probe only for whole-disk zpools.
I suspect that such a boot image could be quite a bit simpler and smaller than the others, without needing the ability to interpret partition schemes. It could either be directly dd'able to a USB stick (probably leaving the rest of the device unusable) or, like gptzfsboot, could be written into a freebsd-boot partition on a stick. This would eliminate the need to copy /boot to the stick - there would only be a bare minimum amount of boot code needed on it.
I then considered the possibility of booting from the actual pools disks themselves. From experimentation, it appeared that almost the first 16KB bytes of disks used for whole-disk pools is left unused by ZFS:
The ZFS On-Disk Specification document partially confirms the non-use of this 16KB region. ZFS places two 256KB vdev labels at the beginning of a device (and two more at the end). Of each vdev label, the first 8KB is unused by design and the second 8KB is reserved for future "Boot Block Headers".
Even so, this 16KB region might not be big enough for boot code, judging by the sizes of zfsboot and gptzfsboot, but might simpler whole-disk boot code be small enough to fit there?
If not, the on-disk specification also describes a 3.5MB region of unused space following the initial two 256KB vdev labels. Could a small "zpmbr" be placed in sector zero, akin to GPT's pmbr, and containing code to jump to the 3.5MB region and run larger second stage boot code from there?
Would be interested in hearing anyone's thoughts on this.
References:
ZFS On-Disk Specification - http://hub.opensolaris.org/bin/download/Community+Group+zfs/docs/ondiskformat0822.pdf
I've been thinking a lot about the process involved in booting a ZFS-based FreeBSD system, specifically whether it could be made possible to boot a system from a pool comprised of whole-disks. I'm a strong believer in using whole-disks for pools whenever possible, due to the benefits it brings: elimination of partition alignment issues on devices with >512 byte sectors, greater ease of device replacement in the event of failure, to name but two. As we are aware though, there is currently no way of booting a system from a pool that spans whole disks, either in FreeBSD or in Solaris.
I've worked around this limitation on some of my systems by booting from a BSD or GPT partitioned USB stick containing a UFS filesystem with a copy of the /boot directory. The kernel and modules are loaded from the stick, then at the end of kernel initialisation, the root filesystem is mounted from the whole-disk pool. This is an acceptable solution on my system as I'm able to install the USB stick inside the system where it can't easily be removed, but this method might not suit everyone. There's also a slight inelegance in copying the /boot directory to the stick and then having to keep it synchronised with the original in the root filesystem.
The first thing I considered is whether it is possible to build an third type of zfs boot image. The existing two zfs boot images, zfsboot and gptzfsboot are clearly designed to probe for zpools based on MBR and GPT partitions respectively. This third type would probe only for whole-disk zpools.
I suspect that such a boot image could be quite a bit simpler and smaller than the others, without needing the ability to interpret partition schemes. It could either be directly dd'able to a USB stick (probably leaving the rest of the device unusable) or, like gptzfsboot, could be written into a freebsd-boot partition on a stick. This would eliminate the need to copy /boot to the stick - there would only be a bare minimum amount of boot code needed on it.
I then considered the possibility of booting from the actual pools disks themselves. From experimentation, it appeared that almost the first 16KB bytes of disks used for whole-disk pools is left unused by ZFS:
Code:
# mdconfig -a -t malloc -s 128M
md0
# hexdump /dev/md0
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
8000000
# zpool create mdpool md0
# hexdump /dev/md0 | head -3
[red]0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
0003fd0 0000 0000 0000 0000[/red] 7a11 b10c da7a 0210
The ZFS On-Disk Specification document partially confirms the non-use of this 16KB region. ZFS places two 256KB vdev labels at the beginning of a device (and two more at the end). Of each vdev label, the first 8KB is unused by design and the second 8KB is reserved for future "Boot Block Headers".
Even so, this 16KB region might not be big enough for boot code, judging by the sizes of zfsboot and gptzfsboot, but might simpler whole-disk boot code be small enough to fit there?
If not, the on-disk specification also describes a 3.5MB region of unused space following the initial two 256KB vdev labels. Could a small "zpmbr" be placed in sector zero, akin to GPT's pmbr, and containing code to jump to the 3.5MB region and run larger second stage boot code from there?
Would be interested in hearing anyone's thoughts on this.
References:
ZFS On-Disk Specification - http://hub.opensolaris.org/bin/download/Community+Group+zfs/docs/ondiskformat0822.pdf