HOWTO: FreeBSD ZFS Madness

0. This is SPARTA!

Some time ago I found a good, reliable way of using and installing FreeBSD and described it in my Modern FreeBSD Install [1] [2] HOWTO. Now, more then a year later I come back with my experiences about that setup and a proposal of newer and probably better way of doing it.

1. Introduction

Same as year ago, I assume that You would want to create fresh installation of FreeBSD using one or more hard disks, but also with (laptops) and without GELI based full disk encryption.

This guide was written when FreeBSD 9.0 and 8.3 were available and definitely works for 9.0, but I did not try all this on the older 8.3, if You find some issues on 8.3, let me know I will try to address them in this guide.

Earlier, I was not that confident about booting from the ZFS pool, but there is some very neat feature that made me think ZFS boot is now mandatory. If You just smiled, You know that I am thinking about Boot Environments feature from Illumos/Solaris systems.

In case You are not familiar with the Boot Environments feature, check the Managing Boot Environments with Solaris 11 Express PDF white paper [3]. Illumos/Solaris has the [font="Courier New"]beadm(1M)[/font] [4] utility and while Philipp Wuensche wrote the manageBE script as replacement [5], it uses older style used at times when OpenSolaris (and SUN) were still having a great time.
I last couple of days writing an up-to-date replacement for FreeBSD compatible beadmutility, and with some tweaks from today I just made it available at SourceForge [6] if you wish to test it. Currently it's about 200 lines long, so it should be pretty simple to take a look at it. I tried to make it as compatible as possible with the 'upstream' version, along with some small improvements, it currently supports basic functions like list, create, destroy and activate.

Code:
# beadm
usage:
  beadm activate <beName>
  beadm create [-e nonActiveBe | -e beName@snapshot] <beName>
  beadm create <beName@snapshot>
  beadm destroy [-F] <beName | beName@snapshot>
  beadm list [-a] [-s] [-D] [-H]
  beadm rename <origBeName> <newBeName>
  beadm mount <beName> [mountpoint]
  beadm { umount | unmount } [-f] <beName>

There are several subtle differences between mine implementation and Philipp's one, he defines and then relies upon ZFS property called freebsd:boot-environment=1 for each boot environment, I do not set any other additional ZFS properties. There is already org.freebsd:swap property used for SWAP on FreeBSD, so we may use org.freebsd:be in the future, but is just a thought, right now its not used. My version also supports activating boot environments received with zfs recvcommand from other systems (it just updates appreciate /boot/zfs/zpool.cache file).

My implementation is also style compatible with current Illumos/Solaris beadm(1M) which is like the example below.
Code:
# beadm create -e default upgrade-test
Created successfully

# beadm list
BE           Active Mountpoint Space Policy Created
default      N      /          1.06M static 2012-02-03 15:08
upgrade-test R      -           560M static 2012-04-24 22:22
new          -      -             8K static 2012-04-24 23:40

# zfs list -r sys/ROOT
NAME                    USED  AVAIL  REFER  MOUNTPOINT
sys/ROOT                562M  8.15G   144K  none
sys/ROOT/default       1.48M  8.15G   558M  legacy
sys/ROOT/new              8K  8.15G   558M  none
sys/ROOT/upgrade-test   560M  8.15G   558M  none

# beadm activate default
Activated successfully

# beadm list
BE           Active Mountpoint Space Policy Created
default      NR     /          1.06M static 2012-02-03 15:08
upgrade-test -      -           560M static 2012-04-24 22:22
new          -      -             8K static 2012-04-24 23:40
The boot environments are located in the same plase as in Illumos/Solaris, at pool/ROOT/environment place.

2. Now You're Thinking with Portals

The main purpose of the Boot Environments concept is to make all risky tasks harmless, to provide an easy way back from possible troubles. Think about upgrading the system to newer version, an update of 30+ installed packages to latest versions, testing software or various solutions before taking the final decision, and much more. All these tasks are now harmless thanks to the Boot Environments, but this is just the tip of the iceberg.

You can now move desired boot environment to other machine, physical or virtual and check how it will behave there, check hardware support on the other hardware for example or make a painless hardware upgrade. You may also clone Your desired boot environment and ... start it as a Jail for some more experiments or move Your old physical server install into FreeBSD Jail because its not that heavily used anymore but it still have to be available.

Other good example may be just created server on Your laptop inside VirtualBox virtual machine. After you finish the creation process and tests, You may move this boot environment to the real server and put it into production. Or even move it into VMware ESX/vSphere virtual machine and use it there.

As You see the possibilities with Boot Environments are unlimited.

3. The Install Process

I created 3 possible schemes which should cover most demands, choose one and continue to the next step.

3.1. Server with Two Disks

I assume that this server has 2 disks and we will create ZFS mirror across them, so if any of them will be gone the system will still work as usual. I also assume that these disks are ada0 and ada1. If You have SCSI/SAS drives there, they may be named da0 and da1 accordingly. The procedures below will wipe all data on these disks, You have been warned.

Code:
 1. Boot from the FreeBSD USB/DVD.
 2. Select the 'Live CD' option.
 3. login: root
 4. # sh
 5. # DISKS="ada0 ada1"
 6. # for I in ${DISKS}; do
    > NUMBER=$( echo ${I} | tr -c -d '0-9' )
    > gpart destroy -F ${I}
    > gpart create -s GPT ${I}
    > gpart add -t freebsd-boot -l bootcode${NUMBER} -s 128k ${I}
    > gpart add -t freebsd-zfs -l sys${NUMBER} ${I}
    > gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ${I}
    > done
 7. # zpool create -f -o cachefile=/tmp/zpool.cache sys mirror /dev/gpt/sys*
 8. # zfs set mountpoint=none sys
 9. # zfs set checksum=fletcher4 sys
10. # zfs set atime=off sys
11. # zfs create sys/ROOT
12. # zfs create -o mountpoint=/mnt sys/ROOT/default
13. # zpool set bootfs=sys/ROOT/default sys
14. # cd /usr/freebsd-dist/
15. # for I in base.txz kernel.txz; do
    > tar --unlink -xvpJf ${I} -C /mnt
    > done
16. # cp /tmp/zpool.cache /mnt/boot/zfs/
17. # cat << EOF >> /mnt/boot/loader.conf
    > zfs_load=YES
    > vfs.root.mountfrom="zfs:sys/ROOT/default"
    > EOF
18. # cat << EOF >> /mnt/etc/rc.conf
    > zfs_enable=YES
    > EOF
19. # :> /mnt/etc/fstab
20. # zfs umount -a
21. # zfs set mountpoint=legacy sys/ROOT/default
22. # reboot

After these instructions and reboot we have these GPT partitions available, this example is on a 512MB disk.

Code:
# gpart show
=>     34  1048509  ada0  GPT  (512M)
       34      256     1  freebsd-boot  (128k)
      290  1048253     2  freebsd-zfs  (511M)

=>     34  1048509  ada1  GPT  (512M)
       34      256     1  freebsd-boot  (128k)
      290  1048253     2  freebsd-zfs  (511M)

# gpart list | grep label
   label: bootcode0
   label: sys0
   label: bootcode1
   label: sys1

# zpool status
  pool: sys
 state: ONLINE
 scan: none requested
config:

        NAME          STATE     READ WRITE CKSUM
        sys           ONLINE       0     0     0
          mirror-0    ONLINE       0     0     0
            gpt/sys0  ONLINE       0     0     0
            gpt/sys1  ONLINE       0     0     0

errors: No known data errors

3.2. Server with One Disk

If Your server configuration has only one disk, lets assume its ada0, then You need different points 5. and 7. to make, use these instead of the ones above.

Code:
5. # DISKS="ada0"
7. # zpool create -f -o cachefile=/tmp/zpool.cache sys /dev/gpt/sys*

All other steps are the same.
 
3.3. Road Warrior Laptop

The procedure is quite different for a laptop because we will use the full disk encryption mechanism provided by GELI and then set up the ZFS pool. Its not currently possible to boot from the ZFS pool on top of an encrypted GELI provider, so we will use a setup similar to the server with ... one but with additional local pool for /home and /root partitions. It will be password based and you will be asked to type-in that password at every boot. The install process is generally the same with new instructions added for the GELI encrypted local pool.

Code:
 1. Boot from the FreeBSD USB/DVD.
 2. Select the 'Live CD' option.
 3. login: root
 4. # sh
 5. # DISKS="ada0"
 6. # for I in ${DISKS}; do
    > NUMBER=$( echo ${I} | tr -c -d '0-9' )
    > gpart destroy -F ${I}
    > gpart create -s GPT ${I}
    > gpart add -t freebsd-boot -l bootcode${NUMBER} -s 128k ${I}
    > gpart add -t freebsd-zfs -l sys${NUMBER} -s 10G ${I}
    > gpart add -t freebsd-zfs -l local${NUMBER} ${I}
    > gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ${I}
    > done
 7. # zpool create -f -o cachefile=/tmp/zpool.cache sys /dev/gpt/sys0
 8. # zfs set mountpoint=none sys
 9. # zfs set checksum=fletcher4 sys
10. # zfs set atime=off sys
11. # zfs create sys/ROOT
12. # zfs create -o mountpoint=/mnt sys/ROOT/default
13. # zpool set bootfs=sys/ROOT/default sys
[B]14. # geli init -b -s 4096 -e AES-CBC -l 128 /dev/gpt/local0
15. # geli attach /dev/gpt/local0
16. # zpool create -f -o cachefile=/tmp/zpool.cache local /dev/gpt/local0.eli
17. # zfs set mountpoint=none local
18. # zfs set checksum=fletcher4 local
19. # zfs set atime=off local
20. # zfs create local/home
21. # zfs create -o mountpoint=/mnt/root local/root[/B]
22. # cd /usr/freebsd-dist/
23. # for I in base.txz kernel.txz; do
    > tar --unlink -xvpJf ${I} -C /mnt
    > done
24. # cp /tmp/zpool.cache /mnt/boot/zfs/
25. # cat << EOF >> /mnt/boot/loader.conf
    > zfs_load=YES
[B]  > geom_eli_load=YES[/B]
    > vfs.root.mountfrom="zfs:sys/ROOT/default"
    > EOF
26. # cat << EOF >> /mnt/etc/rc.conf
    > zfs_enable=YES
    > EOF
27. # :> /mnt/etc/fstab
28. # zfs umount -a
29. # zfs set mountpoint=legacy sys/ROOT/default
[B][color="green"]30. # zfs set mountpoint=/home local/home
31. # zfs set mountpoint=/root local/root[/color][/B]
32. # reboot
After these instructions and reboot we have these GPT partitions available, this example is on a 4GB disk.

Code:
# gpart show
=>     34  8388541  ada0  GPT  (4.0G)
       34      256     1  freebsd-boot  (128k)
      290  2097152     2  freebsd-zfs  (1.0G)
  2097442  6291133     3  freebsd-zfs  (3G)

# gpart list | grep label
   label: bootcode0
   label: sys0
   label: local0

# zpool status
  pool: local
 state: ONLINE
 scan: none requested
config:

        NAME              STATE    READ WRITE CKSUM
        sys               ONLINE      0     0     0
          gpt/local0.eli  ONLINE      0     0     0

errors: No known data errors

  pool: sys
 state: ONLINE
 scan: none requested
config:

        NAME        STATE    READ WRITE CKSUM
        sys         ONLINE      0     0     0
          gpt/sys0  ONLINE      0     0     0

errors: No known data errors

4. Basic Setup after Install

1. Login as root with empty password.
Code:
login: root
password: [ENTER]

2. Create initial snapshot after install.
# zfs snapshot -r sys/ROOT/default@install

3. Set new root password.
# passwd

4. Set machine's hostname.
# echo hostname=hostname.domain.com >> /etc/rc.conf

5. Set proper timezone.
# tzsetup

6. Add some swap space.
If you used the Server with ... type, then use this to add swap.

Code:
# zfs create -V 1G -o org.freebsd:swap=on \
                   -o checksum=off \
                   -o sync=disabled \
                   -o primarycache=none \
                   -o secondarycache=none sys/swap[/color]
# swapon /dev/zvol/sys/swap

If you used the Road Warrior Laptop one, then use this one below, this way the swap space will also be encrypted.

Code:
# zfs create -V 1G -o org.freebsd:swap=on \
                   -o checksum=off \
                   -o sync=disabled \
                   -o primarycache=none \
                   -o secondarycache=none local/swap[/color]
# swapon /dev/zvol/local/swap

7. Create a snapshot called configured or production
After you configured your fresh FreeBSD system, added needed packages and services, create a snapshot called configured or production so if you mess something, you can always go back in time to bring working configuration back.

# zfs snapshot -r sys/ROOT/default@configured

5. Enable Boot Environments

Here are some simple instructions on how to download and enable the beadmcommand line utility for easy Boot Environments administration.

Code:
# fetch -o /usr/sbin/beadm https://downloads.sourceforge.net/project/beadm/beadm
# chmod +x /usr/sbin/beadm
# rehash
# beadm list
BE      Active Mountpoint Space Policy Created
default NR     /           592M static 2012-04-25 02:03

6. WYSIWTF

Now we have a working ZFS only FreeBSD system, I will put some example here about what you now can do with this type of installation and of course the Boot Environments feature.

6.1. Create New Boot Environmnent Before Upgrade

1. Create new environment from the current one.
Code:
# beadm create upgrade
Created successfully

2. Activate it.
Code:
# beadm activate upgrade
Activated successfully

3. Reboot into it.
Code:
# shutdown -r now

4. Mess with it.

You are now free to do anything you like for the upgrade process, but even if you break everything, you still have a working default working environment.

6.2. Perform Upgrade within a Jail

This concept is about creating new boot environment from the desired one, lets call it jailed, then start that new environment inside a FreeBSD jail and perform upgrade there. After you have finished all tasks related to this upgrade and you are satisfied with the achieved results, shut down that Jail, set the boot environment into that just upgraded jail called jailed and reboot into just upgraded system without any risks.

1. Create new boot environment called jailed.
Code:
# beadm create -e default jailed
Created successfully

2. Create /usr/jails directory.
Code:
# mkdir /usr/jails

3. Set mount point of new boot environment to /usr/jails/jailed dir.
Code:
# zfs set mountpoint=/usr/jails/jailed sys/ROOT/jailed

3.1. Make new jail dataset mountable.
Code:
# zfs set canmount=noauto sys/ROOT/jailed

3.2. Mount new Jail dataset.
Code:
# zfs mount sys/ROOT/jailed

4. Enable FreeBSD Jails mechanism and the jailed jail in /etc/rc.conf file.
Code:
# cat << EOF >> /etc/rc.conf
> jail_enable=YES
> jail_list="jailed"
> jail_jailed_rootdir="/usr/jails/jailed"
> jail_jailed_hostname="jailed"
> jail_jailed_ip="10.20.30.40"
> jail_jailed_devfs_enable="YES"
> EOF

5. Start the jails mechanism.
Code:
# /etc/rc.d/jail start
Configuring jails:.
Starting jails: jailed.

6. Check if the jailed jail started.
Code:
# jls
   JID  IP Address      Hostname                      Path
     1  10.20.30.40     jailed                        /usr/jails/jailed

7. Log in to the jailed jail.
Code:
# jexec 1 tcsh

8. PERFORM ACTUAL UPGRADE.

9. Stop the jailed jail.
Code:
# /etc/rc.d/jail stop
Stopping jails: jailed.

10. Disable Jails mechanism in /etc/rc.conf file.
Code:
# sed -i '' -E s/"^jail_enable.*$"/"jail_enable=NO"/g /etc/rc.conf

11. Activate just upgraded jailed boot environment.
Code:
# beadm activate jailed
Activated successfully

12. Reboot into upgraded system.
 
6.3. Import Boot Environment from Other Machine

Lets assume, that You need to upgrade or do some major modification to some of Your servers, You will then create new boot environment from the default one, move it to other 'free' machine, perform these tasks there and after everything is done, move the modified boot environment to the production without any risks. You may as well transport that environment into You laptop/workstation and upgrade it in a Jail like in step 6.2 of this guide.

1. Create new environment on the production server.
Code:
# beadm create upgrade
Created successfully.

2. Send the upgrade environment to test server.
Code:
# zfs send sys/ROOT/upgrade | ssh TEST zfs recv -u sys/ROOT/upgrade

3. Activate the upgrade environment on the test server.
Code:
# beadm activate upgrade
Activated successfully.

4. Reboot into the upgrade environment on the test server.
Code:
# shutdown -r now

5. PERFORM ACTUAL UPGRADE AFTER REBOOT.

6. Sent the upgraded upgrade environment onto production server.
Code:
# zfs send sys/ROOT/upgrade | ssh PRODUCTION zfs recv -u sys/ROOT/upgrade

7. Activate upgraded upgrade environment on the production server.
Code:
# beadm activate upgrade
Activated successfully.

8. Reboot into the upgrade environment on the production server.
Code:
# shutdown -r now


7. References

[1] http://forums.freebsd.org/showthread.php?t=10334
[2] http://forums.freebsd.org/showthread.php?t=12082
[3] http://docs.oracle.com/cd/E19963-01/pdf/820-6565.pdf
[4] http://docs.oracle.com/cd/E19963-01/html/821-1462/beadm-1m.html
[5] http://anonsvn.h3q.com/projects/freebsd-patches/wiki/manageBE
[6] https://sourceforge.net/projects/beadm/

The last part of the HOWTO remains the same as year ago...

You can now add your users, services and packages as usual on any FreeBSD system, have fun ;)

Added GIT repository: https://github.com/vermaden/beadm
 
As FreeBSD progresses I thought I would post updated to FreeBSD 10 / 9.2 procedure that I currently use.

The only 'problem' with ZFS is now its fragmentation which was supposed to be fixed by 'Block Pointer Rewrite' but as we know that did not happened. One of the sources of this fragmentation is that before the data gets written to the pool, ZFS first writes metadata there, then copies the data and then finally removes the metadata. That removal of metadata is the main cause of ZFS fragmentation. To eliminate this problem I suggest using a separate ZIL device for each pool. In the perfect case ZIL should be mirrored, but if You do setup for a single disk, then creating redundant ZIL for non redundant pool is useless ...

ZIL can be grow up to half of RAM, while my current box has 16 GB of RAM I do not think that I will be able to see ZIL filled up to 8 GB, so I have chosen to create 4 GB of ZIL for the 'data' pool and 1 GB for the rather small 16 GB 'root' pool.

As GRUB2 becomes more popular in BSD world (thanks to PC-BSD) You may want to consider using it in the future, that is why I suggest leaving 1 MB space at the beginning for GRUB2 if necessary, in other words the root pool starts after 1 MB.

Code:
       ada0p1  512k  bootcode
       -free-  512k  -free- (total 1 MB in case of GRUB2)
(boot) ada0p2   16g  sys.LZ4
       ada0p3    1g  sys.ZIL
       ada0p4    4g  local.ZIL
       ada0p5     *  local.GELI.LZ4

Here are the commands that I used.

Code:
gpart destroy -F ada0
gpart create -s gpt ada0
gpart add -t freebsd-boot -s   1m -l boot      ada0
gpart add -t freebsd-zfs  -s  16g -l sys       ada0
gpart add -t freebsd-zfs  -s   1g -l sys.zil   ada0
gpart add -t freebsd-zfs  -s   4g -l local.zil ada0
gpart add -t freebsd-zfs          -l local     ada0
gpart delete -i 1 ada0
gpart add -t freebsd-boot -s 128k -l boot      ada0
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada0
geli init -b -s 4096 /dev/gpt/local
geli attach          /dev/gpt/local
zpool create -f local /dev/gpt/local.eli log /dev/gpt/local.zil
zpool create -f sys /dev/gpt/sys log /dev/gpt/sys.zil
zfs set compression=lz4 sys
zfs set compression=lz4 local
zfs set atime=off sys
zfs set atime=off local
zfs set mountpoint=none sys
zfs set mountpoint=none local
zfs create sys/ROOT
zfs create sys/ROOT/default
zpool set bootfs=sys/ROOT/default sys
zfs create local/home
zfs set mountpoint=/mnt sys/ROOT/default
zfs mount sys/ROOT/default
zfs set mountpoint=/mnt/home local/home
zfs mount local/home
cd /usr/freebsd-dist/
tar --unlink -xvpJf base.txz   -C /mnt
tar --unlink -xvpJf src.txz    -C /mnt
tar --unlink -xvpJf lib32.txz  -C /mnt
tar --unlink -xvpJf kernel.txz -C /mnt --exclude '*.symbols'
echo zfs_enable=YES > /mnt/etc/rc.conf
:> /mnt/etc/fstab
cat > /mnt/boot/loader.conf << EOF
zfs_load=YES
aio_load=YES
geom_eli_load=YES
EOF
zfs umount -a
zfs set mountpoint=legacy sys/ROOT/default
zfs set mountpoint=/home local/home
reboot

Code:
pkg (answer 'y' to bootstrap)
pkg add beadm
chmod 1777 /tmp /var/tmp
cp /usr/share/zoneinfo/Europe/Warsaw /etc/localtime
newaliases
passwd
(...)
 
  • Thanks
Reactions: Oko
Do you also need to update vfs.root.mountfrom in the new /boot/loader.conf? May want to add a comment about that.
 
Very nice work!

I was dreading scripting the "roll-back" functionality so that I wouldn't make mistakes, and it looks like you've got a huge amount of it done already.

One thing that might be nice in the future would be to provide an option store the snapshots as read-only and then clone them to activate them. I don't know if the "upstream" version does that. It would allow incremental updating of the snapshots. (I'm planning on keeping the backups on a different machine.)

I may just end up "shipping" a remotely stored read-only snapshot to the target pool for mounting; the incremental approach should work just fine if I do that.
 
jef said:
Very nice work!

I was dreading scripting the "roll-back" functionality so that I wouldn't make mistakes, and it looks like you've got a huge amount of it done already.

Thanks.

jef said:
One thing that might be nice in the future would be to provide an option store the snapshots as read-only and then clone them to activate them. I don't know if the "upstream" version does that. It would allow incremental updating of the snapshots. (I'm planning on keeping the backups on a different machine.)

It's possible that it's already possible to do that with the beadm utility.

You can create as many snapshots as you like with the beadm create beName@snapshot command, which is generally the same as the zfs snapshot -r pool/ROOT/beName@snapshot command.

You can then create boot environments from these snapshots with beadm create -e beName@snapshot beNamecommand (or use zfs clone ...).

You can then activate them to reboot into them by beadm activate beName.

Don't know if you wanted that functionality or something different ;)
 
  • Thanks
Reactions: Oko
I'm about to upgrade to 9-0-RELEASE and intend to take the opportunity to start using ZFS for the first time so I've found beadm to be very useful while trying different ideas but there's one aspect of my setup that it doesn't manage to deal with.

I initially configured the system with a number of child filesystems as described in the FreeBSD Wiki so I have things like sys/ROOT/usr, sys/ROOT/var etcetera and have the mountpoints defined in /etc/fstab. To get things to work correctly I had to modify the script to update /etc/fstab in each new BE as it is created.

I've managed to produce a script that does what I need with the following changes:
Code:
*** beadm       2012-05-07 17:54:27.000000000 +0100
--- beadm-patched       2012-05-07 21:05:23.000000000 +0100
***************
*** 59,65 ****
--- 59,105 ----
    echo "${1}" | grep -q "@"
  }

+ __be_fstab () {
+ # edit fstab to use the mounts of the BE children
+ MNT="/tmp/BE-$(date +%Y%m%d%H%M%S)"
+ if mkdir ${MNT}
+ then
+    if mount -t zfs ${TARGET_SYSTEM} ${MNT}
+    then
+       if [ $(grep -c ^${SOURCE_SYSTEM} ${MNT}/etc/fstab) != 0 ]
+       then
+          sed -I "" s+^${SOURCE_SYSTEM}+${TARGET_SYSTEM}+ ${MNT}/etc/fstab
+          FSTAB_STATUS=$?
+          if [ ${FSTAB_STATUS} != 0 ]
+          then
+             echo Failed to update ${MNT}/etc/fstab
+          fi
+       else
+          FSTAB_STATUS=0
+       fi
+       umount ${MNT}
+       rmdir ${MNT}
+    else
+       FSTAB_STATUS=1
+       echo "ERROR: Cannot mount ${TARGET_SYSTEM}"
+       rmdir ${MNT}
+    fi
+ else
+    echo "ERROR: Cannot create '${MNT}' directory"
+    FSTAB_STATUS=1
+ fi
+ if [ ${FSTAB_STATUS} != 0 ]
+ then
+    zfs destroy -r ${TARGET_SYSTEM}
+    zfs destroy -r ${SOURCE_SNAPSHOT}
+ fi
+ return ${FSTAB_STATUS}
+ }
+
  __be_new() { # 1=SOURCE 2=TARGET
+   SOURCE_SYSTEM=$(echo ${1} | sed s+@.*++)
+   SOURCE_SNAPSHOT=${1}@${2##*/}
+   TARGET_SYSTEM=${2}
    if __be_snapshot ${1}
    then
      zfs clone ${1} ${2}
***************
*** 94,100 ****
          fi
          zfs clone -o canmount=off ${OPTS} ${FS}@${2##*/} ${DATASET}
        done
!   echo "Created successfully"
  }

  ROOTFS=$( mount | awk '/ \/ / {print $1}' )
--- 134,143 ----
          fi
          zfs clone -o canmount=off ${OPTS} ${FS}@${2##*/} ${DATASET}
        done
!   if __be_fstab
!   then
!      echo "Created successfully"
!   fi
  }

  ROOTFS=$( mount | awk '/ \/ / {print $1}' )
***************
*** 269,290 ****
        (Y|y|[Yy][Ee][Ss])
          if __be_snapshot ${POOL}/ROOT/${2}
          then
!           if ! zfs destroy ${POOL}/ROOT/${2} 1> /dev/null 2> /dev/null
            then
              echo "ERROR: Snapshot '${2}' is origin for other boot environment(s)"
              exit 1
            fi
          else
            ORIGINS=$( zfs list -r -H -o origin ${POOL}/ROOT/${2} )
!           if zfs destroy ${POOL}/ROOT/${2} 1> /dev/null 2> /dev/null
            then
              zfs destroy -r ${POOL}/ROOT/${2} 2>&1 \
!               | grep "${POOL}/ROOT/" \
                | grep -v "@" \
                | while read I
                  do
!                   zfs promote ${I} 2> /dev/null
                  done
            fi
            echo "${ORIGINS}" \
              | while read I
--- 312,334 ----
        (Y|y|[Yy][Ee][Ss])
          if __be_snapshot ${POOL}/ROOT/${2}
          then
!           if ! zfs destroy -r ${POOL}/ROOT/${2} 1> /dev/null 2> /dev/null
            then
              echo "ERROR: Snapshot '${2}' is origin for other boot environment(s)"
              exit 1
            fi
          else
            ORIGINS=$( zfs list -r -H -o origin ${POOL}/ROOT/${2} )
!           if ! zfs destroy -r ${POOL}/ROOT/${2} 1> /dev/null 2> /dev/null
            then
              zfs destroy -r ${POOL}/ROOT/${2} 2>&1 \
!               | grep "^${POOL}/ROOT/" \
                | grep -v "@" \
                | while read I
                  do
!                   zfs promote ${I}
                  done
+             zfs destroy -r ${POOL}/ROOT/${2}
            fi
            echo "${ORIGINS}" \
              | while read I
The main change is the introduction of the __be_fstab function. This works OK if you create a BE from an existing one but there is a problem creating BE's from snapshots, I need to do a bit more work to sort this out.

The other changes further down the script were to fix problems I came across when deleting BE's, I don't think they are directly related to my fix for fstab, could they be bugs which crept in when the script was recently changed from using && {} || {} syntax to if/then/else syntax?
 
rawthey said:
I had to modify the script to update /etc/fstab in each new BE as it is created.

I've managed to produce a script that does what I need with the following changes (...)

Using /etc/fstab workaround was already used at manageBE, I wanted to avoid that and I succeeded. Now while cloning the boot environment (along with its child datasets) I clone their properties (also mountpoints) and manipulate the canmount property to avoid double/unwanted mounts.

Currently I am working on beadm with Bryan Drewery, the latest efforts are available here: https://github.com/vermaden/beadm

If we do not find other issues we will 'brand' that as 0.5 and update the port also.

rawthey said:
The other changes further down the script were to fix problems I came across when deleting BE's, I don't think they are directly related to my fix for fstab, could they be bugs which crept in when the script was recently changed from using && {} || {} syntax to if/then/else syntax?

Yes there was a BUG that was introduced in the process of 'porting' beadm from &&-|| to if-then-fi, precisely, if ... was used instead of if ! ..., it's fixed now.

rawthey said:
(...) sys/ROOT/usr, sys/ROOT/var (...)
These should be sys/ROOT/beName/usr, sys/ROOT/beName/var to properly use boot environments, You can of course 'migrate' your sys/ROOT to sys/ROOT/beName with ZFS.
 
vermaden said:
Using /etc/fstab workaround was already used at manageBE, I wanted to avoid that and I succeeded. Now while cloning the boot environment (along with its child datasets) I clone their properties (also mountpoints) and manipulate the canmount property to avoid double/unwanted mounts.

Being very new to ZFS I think I must have messed up the mountpoints when converting to the sys/ROOT/bename structure and ended up using legacy mounts. That was before beadm had been changed to copy properties while cloning. Although I noticed the change I failed to realise the significance of it and continued working on my fixes. I've now reset my ZFS mountpoints correctly, abandoned my fixes and downloaded the latest version.

Everything is working fine with the new version except for an issue with creating BE's from snapshots. Using beadm create be6@snaptest only produced the top level snapshot without any descendents. I managed to create all the descendent snapshots and get rid of a spurious "ERROR: Cannot create 'be6@snaptest' snapshot" message by changing line 173 to

if [b]![/b] zfs snapshot [b]-r[/b] ${POOL}/ROOT/${2} 2> /dev/null

When I tried to create a new BE from an existing snapshot with beadm create -e be6@snaptest fromsnap it failed at line 78 with "cannot open 'sys/ROOT/be6@snaptest': operation not applicable to datasets of this type".

vermaden said:
These should be sys/ROOT/beName/usr, sys/ROOT/beName/var to properly use boot environments, You can of course 'migrate' your sys/ROOT to sys/ROOT/beName with ZFS.
Yes, that was my typo in the post, I did use sys/ROOT/beName/usr in the system.
 
So, there is a difference in file system layout? At first, I used the layout from the wiki, resulting in something like this:
Code:
NAME                                         USED  AVAIL  REFER  MOUNTPOINT
rpool                                       50.2G   241G    22K  none
rpool/HOME                                   235K   241G    33K  /home
rpool/HOME/alvin                             170K   241G   170K  /home/alvin
rpool/ROOT                                  3.01G   241G    22K  none
rpool/ROOT/9.0-RELEASE                      3.01G   241G   349M  legacy
rpool/ROOT/9.0-RELEASE/tmp                   720K   241G   720K  /tmp
rpool/ROOT/9.0-RELEASE/usr                  1.31G   241G   309M  /usr
rpool/ROOT/9.0-RELEASE/usr/local             459M   241G   459M  /usr/local
rpool/ROOT/9.0-RELEASE/usr/ports             573M   241G   269M  /usr/ports
rpool/ROOT/9.0-RELEASE/usr/ports/distfiles   300M   241G   300M  /usr/ports/distfiles
rpool/ROOT/9.0-RELEASE/usr/ports/packages   3.19M   241G  3.19M  /usr/ports/packages
rpool/ROOT/9.0-RELEASE/usr/src                23K   241G    23K  /usr/src
rpool/ROOT/9.0-RELEASE/var                   832M   241G  1.17M  /var
rpool/ROOT/9.0-RELEASE/var/crash            23.5K   241G  23.5K  /var/crash
rpool/ROOT/9.0-RELEASE/var/db                829M   241G   827M  /var/db
rpool/ROOT/9.0-RELEASE/var/db/pkg           1.46M   241G  1.46M  /var/db/pkg
rpool/ROOT/9.0-RELEASE/var/empty              22K   241G    22K  /var/empty
rpool/ROOT/9.0-RELEASE/var/log              1.86M   241G  1.86M  /var/log
rpool/ROOT/9.0-RELEASE/var/mail               86K   241G    86K  /var/mail
rpool/ROOT/9.0-RELEASE/var/run              63.5K   241G  63.5K  /var/run
rpool/ROOT/9.0-RELEASE/var/tmp                36K   241G    36K  /var/tmp

Then I discovered manageBE, and installed like this:
Code:
NAME                        USED  AVAIL  REFER  MOUNTPOINT
rpool                      2.79G   222G   144K  none
rpool/ROOT                  687M   222G   144K  none
rpool/ROOT/9.0-RELEASE      687M   222G   687M  legacy
rpool/home                  352K   222G   152K  /home
rpool/home/alvin            200K   222G   200K  /home/alvin
rpool/tmp                   176K   222G   176K  /tmp
rpool/usr                  1.98G   222G   144K  /usr
rpool/usr/local             351M   222G   351M  /usr/local
rpool/usr/ports             849M   222G   848M  /usr/ports
rpool/usr/ports/distfiles   144K   222G   144K  /usr/ports/distfiles
rpool/usr/ports/packages    144K   222G   144K  /usr/ports/packages
rpool/usr/src               826M   222G   826M  /usr/src
rpool/var                   145M   222G   724K  /var
rpool/var/crash             148K   222G   148K  /var/crash
rpool/var/db                143M   222G   143M  /var/db
rpool/var/db/pkg            292K   222G   292K  /var/db/pkg
rpool/var/empty             144K   222G   144K  /var/empty
rpool/var/log               472K   222G   472K  /var/log
rpool/var/mail              156K   222G   156K  /var/mail
rpool/var/run               224K   222G   224K  /var/run
rpool/var/tmp               152K   222G   152K  /var/tmp

If I understand correctly, for beadm, the first method is needed and all filesystems below rpool/9.0-RELEASE will also be cloned. Is that right?
 
rawthey said:
Everything is working fine with the new version except for an issue with creating BE's from snapshots. Using beadm create be6@snaptest only produced the top level snapshot without any descendents. I managed to create all the descendent snapshots and get rid of a spurious "ERROR: Cannot create 'be6@snaptest' snapshot" message by changing line 173 to

if [b]![/b] zfs snapshot [b]-r[/b] ${POOL}/ROOT/${2} 2> /dev/null

When I tried to create a new BE from an existing snapshot with beadm create -e be6@snaptest fromsnap it failed at line 78 with "cannot open 'sys/ROOT/be6@snaptest': operation not applicable to datasets of this type".

Thanks for finding these BUGs, I fixed them and also fixed several others, and even added a new rename feature, the latest work is available at github/sourceforge.

rawthey said:
Yes, that was my typo in the post, I did use sys/ROOT/beName/usr in the system.
Ok.


serverhamster said:
So, there is a difference in file system layout?

(...)

If I understand correctly, for beadm, the first method is needed and all filesystems below rpool/9.0-RELEASE will also be cloned. Is that right?

That depends how You want to use boot environments. If You want to clone EVERYTHING, then put all other mountpoints into $pool/ROOT/$beName/* but you may want to use Boot Environments on some more basic level and use it only for base system keeping the /usr or /var aside, it's up to you.

I personally experienced with many ZFS concepts, for example I tried a way that I call 'Cloneable ZFS Namespaces' which is something like that:

Code:
% zfs list -o name
NAME
sys
sys/PORTS
sys/PORTS/current
sys/PORTS/current/compat
sys/PORTS/current/usr
sys/PORTS/current/usr/local
sys/PORTS/current/usr/ports
sys/PORTS/current/var
sys/PORTS/current/var/db
sys/PORTS/current/var/db/pkg
sys/PORTS/current/var/db/ports
sys/PORTS/current/var/db/portsnap
sys/PORTS/release90
sys/PORTS/release90/compat
sys/PORTS/release90/usr
sys/PORTS/release90/usr/local
sys/PORTS/release90/usr/ports
sys/PORTS/release90/var
sys/PORTS/release90/var/db
sys/PORTS/release90/var/db/pkg
sys/PORTS/release90/var/db/ports
sys/PORTS/release90/var/db/portsnap
sys/PORTS/usr/ports/obj
sys/ROOT
sys/ROOT/default
sys/ROOT/default-upgrade
sys/ROOT/jailed
sys/ROOT/upgrade-jailed
sys/SRC
sys/SRC/release90
sys/SRC/release90/usr
sys/SRC/release90/usr/src
sys/SRC/stable90
sys/SRC/stable90/usr
sys/SRC/stable90/usr/src
sys/SRC/current10
sys/SRC/current10/usr
sys/SRC/current10/usr/src
sys/SRC/usr/obj
sys/HOME
sys/HOME/vermaden
sys/SWAP

With these 'Cloneable ZFS Namespaces' You can mix/change on-the-fly various parts of the FreeBSD system, change the source tree You are using without the need to redownload everything or just to keep several source trees.

You can use Ports/packages from the RELEASE, but at the same time You can have a full set of up-to-date packages that You can switch to and go back again to the RELEASE ones.

It's possible to implement these 'Cloneable ZFS Namespaces' into the beadm utility of course.

So you would beadm list -t SRC for example, to list the source trees on the system, or beadm list -t PORTS to list available package sets.
 
There still seems to be an issue with beadm create -e beName@snapshot beName which fails if the new BE name doesn't match the name of the source snapshot.

This works
Code:
fbsd9:/root# beadm list
BE    Active Mountpoint Space Policy Created
oldbe -      -          49.5K static 2012-05-06 21:00
be3   -      -          49.5K static 2012-05-06 21:32
be4   -      -           264K static 2012-05-06 21:41
be5   -      -          1.07M static 2012-05-06 21:42
be6   NR     /          7.56G static 2012-05-08 09:34
fbsd9:/root# beadm create be4@snaptest
Created successfully
fbsd9:/root# beadm create -e be4@snaptest snaptest
Created successfully
fbsd9:/root# beadm list
BE       Active Mountpoint Space Policy Created
oldbe    -      -          49.5K static 2012-05-06 21:00
be3      -      -          49.5K static 2012-05-06 21:32
be4      -      -           264K static 2012-05-06 21:41
be5      -      -          1.07M static 2012-05-06 21:42
be6      NR     /          7.56G static 2012-05-08 09:34
snaptest -      -            15K static 2012-05-10 09:53

but this doesn't

Code:
bsd9:/root# beadm create -e be4@snaptest fromsnap
cannot open 'sys/ROOT/be4@fromsnap': dataset does not exist
fbsd9:/root# exit

I was able to get it to handle all cases with this patch

Code:
*** /sbin/beadm 2012-05-10 09:52:32.199568612 +0100
--- /tmp/beadm  2012-05-10 10:16:06.190956035 +0100
***************
*** 101,107 ****
          then
            local OPTS=""
          fi
!         zfs clone -o canmount=off ${OPTS} ${FS}@${2##*/} ${DATASET}
        done
    echo "Created successfully"
  }
--- 101,112 ----
          then
            local OPTS=""
          fi
!       if  __be_snapshot ${1}
!       then
!           zfs clone -o canmount=off ${OPTS} ${FS}@${1##*@} ${DATASET}
!       else
!           zfs clone -o canmount=off ${OPTS} ${FS}@${2##*/} ${DATASET}
!       fi
        done
    echo "Created successfully"
  }
 
I had a bit of a problem when I came to copy my test system from VirtualBox onto a real disk. Firstly I created a minimal system as sys/ROOT/default as described above then I copied my BE from VirtualBox with zfs receive -u and ended up with this...
Code:
# beadm list

BE      Active Mountpoint Space Policy Created
default NR     /           592M static 2012-05-11 13:15
be6     -      -          7.56G static 2012-05-11 22:05

# zfs list -o name,canmount,mountpoint

NAME                              CANMOUNT  MOUNTPOINT
sys                                     on  none
sys/ROOT                                on  legacy
sys/ROOT/be6                        noauto  legacy
sys/ROOT/be6/tmp                    noauto  /tmp
sys/ROOT/be6/usr                    noauto  /usr
sys/ROOT/be6/usr/ports              noauto  /usr/ports
sys/ROOT/be6/usr/ports/distfiles    noauto  /usr/ports/distfiles
sys/ROOT/be6/usr/ports/packages     noauto  /usr/ports/packages
sys/ROOT/be6/usr/src                noauto  /usr/src
sys/ROOT/be6/var                    noauto  /var
sys/ROOT/be6/var/db                 noauto  /var/db
sys/ROOT/be6/var/db/pkg             noauto  /var/db/pkg
sys/ROOT/be6/var/empty              noauto  /var/empty
sys/ROOT/be6/var/log                noauto  /var/log
sys/ROOT/be6/var/mail               noauto  /var/mail
sys/ROOT/be6/var/run                noauto  /var/run
sys/ROOT/be6/var/tmp                noauto  /var/tmp
sys/ROOT/default                        on  legacy
sys/swap

Then I used beadm to activate the new BE. This completed without any error messages but without the expected "Activated successfully" message. The output from beadm list gave the impression that the BE had been activated OK but further investigation showed that all of the descendent filesystems still had canmount set to noauto.

Code:
# beadm activate be6
# beadm list

BE      Active Mountpoint Space Policy Created
default N      /           592M static 2012-05-11 13:15
be6     R      -          7.56G static 2012-05-11 22:05

# zfs list -o name,canmount,mountpoint

NAME                              CANMOUNT  MOUNTPOINT
sys                                     on  none
sys/ROOT                            noauto  legacy
sys/ROOT/be6                            on  legacy
sys/ROOT/be6/tmp                    noauto  /tmp
sys/ROOT/be6/usr                    noauto  /usr
sys/ROOT/be6/usr/ports              noauto  /usr/ports
sys/ROOT/be6/usr/ports/distfiles    noauto  /usr/ports/distfiles
sys/ROOT/be6/usr/ports/packages     noauto  /usr/ports/packages
sys/ROOT/be6/usr/src                noauto  /usr/src
sys/ROOT/be6/var                    noauto  /var
sys/ROOT/be6/var/db                 noauto  /var/db
sys/ROOT/be6/var/db/pkg             noauto  /var/db/pkg
sys/ROOT/be6/var/empty              noauto  /var/empty
sys/ROOT/be6/var/log                noauto  /var/log
sys/ROOT/be6/var/mail               noauto  /var/mail
sys/ROOT/be6/var/run                noauto  /var/run
sys/ROOT/be6/var/tmp                noauto  /var/tmp
sys/ROOT/default                    noauto  legacy
sys/swap
The problem arises in the loop at the end of the activation section where zfs promote ${I} 2> /dev/null fails due to sys/ROOT/be6 not being a cloned filesystem. Since errexit was set at the start of the script it silently bails out without processing the remaining file systems because the command is not explicitly tested.

This patch seems to fix things:
Code:
*** beadm	2012-05-11 22:36:44.000000000 +0100
--- /tmp/beadm	2012-05-11 22:35:46.000000000 +0100
***************
*** 260,270 ****
            zfs set canmount=noauto ${I}
          done
      # Enable mounting for the active BE and promote it
!     zfs list -H -o name -t filesystem -r ${POOL}/ROOT/${2} \
!       | while read I
          do
            zfs set canmount=on ${I} 2> /dev/null
!           zfs promote ${I} 2> /dev/null
          done
      echo "Activated successfully"
      ;;
--- 260,273 ----
            zfs set canmount=noauto ${I}
          done
      # Enable mounting for the active BE and promote it
!     zfs list -H -o name,origin -t filesystem -r ${POOL}/ROOT/${2} \
!       | while read I ORIGIN
          do
            zfs set canmount=on ${I} 2> /dev/null
!           if [ ${ORIGIN} != "-" ]
!           then
!             zfs promote ${I}
!           fi
          done
      echo "Activated successfully"
      ;;
 
  • Thanks
Reactions: Oko
vermaden said:
Merged, thanks again for in-depth testing ;)

It's been a really interesting exercise. As a newcomer to ZFS it's been a good way to learn about it, and discover some more Bourne shell scripting tricks.

I've just discovered that "interesting" things happen if you try to activate a BE while one of its filesystems is already mounted. This might happen if you want to explore the filesystem to confirm that you've chosen the right BE to activate and then forget to unmount it. ZFS remounts the filesystem on its defined mountpoint when canmount is set to on :(
Code:
fbsd9:/root# beadm list
BE  Active Mountpoint Space Policy Created
be6 NR     /          7.56G static 2012-05-08 09:34
be7 -      -           445K static 2012-05-14 14:54

fbsd9:/root# mount | grep be7
sys/ROOT/be7/tmp on [B]/mnt[/B] (zfs, local, noatime, nosuid, nfsv4acls)

fbsd9:/root# beadm activate be7
Activated successfully
fbsd9:/root# beadm list
BE  Active Mountpoint Space Policy Created
be6 N      /           537K static 2012-05-08 09:34
be7 R      -          7.56G static 2012-05-14 14:54

fbsd9:/root# mount | grep be7
sys/ROOT/be7/tmp on [B]/tmp[/B] (zfs, local, noatime, nosuid, nfsv4acls)

Would it be worth including something along the likes of this patch?
Code:
*** beadm       2012-05-14 17:28:57.967886169 +0100
--- /tmp/beadm  2012-05-14 22:16:41.864615636 +0100
***************
*** 56,61 ****
--- 56,80 ----
    fi
  }

+ __be_is_unmounted() { # 1=BE name
+   local MOUNTED=0
+   mount | awk "/^${POOL}\/ROOT\/${1}/ {print \$1,\$3}" \
+     |{ while read FILESYSTEM MOUNTPOINT
+       do
+         if [ ${MOUNTED} == 0 ]
+         then
+           echo "ERROR: The following filesystem(s) must be unmounted before ${1} can be activated"
+           MOUNTED=1
+         fi
+         echo "     ${FILESYSTEM} on ${MOUNTPOINT}"
+       done
+   if [ ${MOUNTED} != 0 ]
+     then
+     exit 1
+   fi
+   }
+ }
+
  __be_snapshot() { # 1=DATASET/SNAPSHOT
    echo "${1}" | grep -q "@"
  }
***************
*** 207,212 ****
--- 226,235 ----
        echo "Already activated"
        exit 0
      else
+       if [ $2 != ${ROOTFS##*/} ]
+       then
+         __be_is_unmounted ${2}
+       fi
        if [ "${ROOTFS}" != "${POOL}/ROOT/${2}" ]
        then
          TMPMNT="/tmp/BE"

In the process of testing this out I came across another side effect, probably a result of the same ZFS bug/feature with setting canmount. Starting with be6 as the active BE if I activated be7 and then reactivated be6 without rebooting I ended up with canmount set to noauto for all the filesystems in both BE's. Deleting redirection to /dev/null from the line zfs set canmount=on ${I} 2> /dev/null produced the error message "cannot unmount '/': Invalid argument". I suspect the only way round this is to change the test for $2 != ${ROOTFS##*/} to fail with an error if they are equal, in which case I think the subsequent if [ "${ROOTFS}" != "${POOL}/ROOT/${2}" ] test might become redundant.
 
  • Thanks
Reactions: Oko
Back
Top