Installation of a FreeBSD 9.2 system with ZFS-on-root over GELI

This article explains how to install a FreeBSD 9.2 system on a remote server known as a dedibox. It is closely related to my previous HOWTO on 8.2 and HOWTO on 8.2/dedibox but with several changes due to FreeBSD 9.1 & 9.2.

!! Work has basically stopped on this one, we are at 11.1 now and I need a different setup !! Please go to this article on the new 11.1-based setup.

Table of content

Installation of a FreeBSD 9.2 system with ZFS-on-root over GELI
Table of content
Prerequisites
Installed system
Hardware
Notes on ZFS, disks and all that
Constraints
Custom mfsbsd images
Booting off the mfsbsd image
Creation of the customized distribution
Actual FreeBSD installation
Finishing up
Things to remember
Resources
Feedback
History
Credits

Prerequisites

Like many dedicated servers in an hosting datacenter, access to a console (such as iLO or iDRAC) is mandatory to be able to manipulate BIOS settings, access to some kind of rescue mode where you can upload an ISO image and boot from it. The example used here will be the Dedibox rescue system (such as described there and used in this howto)

You must have an mfsbsd generated image or the ability to generate one (which means an entire /usr/src tree or the files from a release). See the above mfsbsd URL or below for details.

Installed system

ZFS-only system with Root-on-ZFS
two disks are in the machine, running with ZFS/mirror
the main zpool is encrypted with geli(8) so that if a disk needs to be replaced, data on it will be secure (like there)
Zpool v28/5000 along with ZFS v5 to get deduplication and performance fixes (as standard in 9.1-9.2).
two ZFS pools are defined, one encrypted to contain the minimal booting system and the real one that is mounted over as /

Hardware

The hardware of choice is the Dedibox PRO R210 system (See there for reference), a rather powerful system with the following characteristics:

L3426 Nehalem quad-core CPU running at 1.86 GHz
16 GB of RAM
2 disks of 2 TB each (Hitachi HUA72202 or WDC WD2003FYYS-1) on a LSI2008/H200 HBA

The mps driver that we will be using for the H200 HBA controller does now support the RAID1 option installed by default by the Online people. Still, I intend to keep on using the disks in JBOD mode to get the most benefits from using ZFS. The first step is to break that RAID1 setup and configure the BIOS in passthrough mode to get the drives “back” as separate devices (da0 and da1). You will have to open a ticket if you are using the same Online.net hosting provider as I do. Procedure maybe completely different elsewhere if not fully manual.

Do not be confused by the “SCSI” naming of the drives. The H200 controller, made by LSI and also called the LSI SAS 2009, is a SAS controller but SATA2 drives are compatible and will appear as SAS drives (hence the da name).

NOTE: This is the old dedibox model, now they have the new HP-based DL-120G7 system with new CPUs but I do not know how compatible they are with FreeBSD yet. From various discussions, the controller used by the HP boxes is a P410/P420 driven by the ciss driver and if you want to use ZFS, you have to fiddle with the RAID settings. The main point is that the ciss driver is less versatile and you will probably have to create a single volume RAID for each disk, adding a lot of overhead for nothing.

UPDATE: I got my hands on a new dedibox model and the NOTE above applies. The main interest of this new machine for us is that its CPU (Xeon E3-1220@3.1 GHz) in addition to being faster (Sandy Bridge family), it also has the AES-NI instructions in the CPU! Which means that we need to load another driver module, called aesni which does allow geli(8) to use the hardware instructions for AES. Expect at least a 2x performance enhancement. Even more with another patch by jmg (soon to be committed to head).

Here are the characteristics of the new dedibox model:

HP DL120G7 1U system
Intel Core E5-1220 quad-core CPU running at 3.1 GHz (up to 3.5 in Turbo Boost mode)
16 GB of RAM
2 disks of 2 TB (no way to know the brand due to the note above.

Notes on ZFS, disks and all that

Please go read this article to find useful information on ZFS, disks and how to use it. It is not specific to FreeBSD ZFS but will apply as well for most things.

Constraints

The fact that we want to use encryption to protect our data is a major constraint on the on-disk architecture and how we lay down/use partitions. /boot/loader has the ability to ask for encryption keys at boot-time but that also means that it must be unencrypted… The alternative would be to have a more complete system installed in the encrypted pool and using ssh to connect to the system to load the encrypted pool.

Main choice to be made is whether we use a plain UFS booting partition then mount everything from the ZFS pool or we use ZFS pools for both partitions.

We choose to have two separate ZFS pools to be able to use the other ZFS features such as snapshots, mirroring and al.

Custom `mfsbsd` images

If you choose to build your own mfsbsd images (to add a missing driver or equivalent, please see this tutorial (which was at the beginning inside this howto but it makes sense to separate the two).

The regular mfsbsd generated from 9.2 should have more things than the one I used before but it will still lack the crypto stuff needed by geli(8).

Installation of the mfsbsd image

Put the dedibox server in “rescue mode”. Typically it will be running some form of Linux system like Ubuntu. Now get the image and install it:

sudo -s
wget -O - <url>/mfsimage.img | dd of=/dev/sda bs=1048576

An alternate way of booting the mfsbsd image uses qemu to run the mfsbsd image through the kvm system:

boot in rescue mode
install kvm (through apt-get(8)if you use Debian)

then

kvm -hda /dev/sda -cdrom openbsd.iso -boot d -curses

If you have the smaller version of the dedibox (named Dedibox SC), you’ll have to use yet another way because the Ubuntu-based rescue system would crash with plain kvm (suggested by iMil based on this page:

boot in rescue mode with Ubuntu
apt-get update
apt-get install qemu

then:

qemu-system-x86_64 -no-kvm -hda /dev/sda -cdrom FreeBSD-9.2-RELEASE-amd64-bootonly.iso -curses -boot d

Booting off the mfsbsd image

Reboot your dedibox normally by exiting the recue mode, it should now boot off the mfsbsd image. A few minutes later you should be able to access the system through ssh.

Creation of the customized distribution

In parallel to what we are going to do on the target machine, you can generate your custom distribution (incl. your modified source and kernel configuration) by visiting the release directory and running the following command after creating a big enough space somewhere to hold the result:

mkdir /data/work/release
make release EXTSRCDIR=/data/work/freebsd/9 EXTPORTSDIR=/usr/ports \
	CHROOTDIR=/data/work/release NODOC=yes NOPORTSATALL=yes NOPORTS=yes

Now, you can find the result in /data/work/release under the snapshot’s name:

ls /data/work/release/R/cdrom
bootonly/       disc1/          disc2/          dvd1/           livefs/

and in the one we are most interested in (dvd1):

ls /data/work/release/R/cdrom/dvd1
.cshrc              etc/          sbin/
.profile            lib/          stand@
9.2-20130309-SNAP/  libexec/      sys@
COPYRIGHT           media/        tmp/
bin/                mnt/          usr/
boot/               proc/         var/
rescue/             dev/          root/
cdrom.inf

The main distribution is located even further below in the 9.2-* snapshot:

base/           doc/            kernels/        proflibs/
catpages/       games/          lib32/          src/
dict/           info/           manpages/

That will need to be copied over to your shiny new server somewhere.

From what I have been recently experiencing, you can even use a regular distribution disk like the -dvd1or the -memstick one. Files have been moved slightly around in 9.2 so the base distribution files are now in /usr/freebsd-dist and are in tar/xz (aka .txz) format.

Actual FreeBSD installation

We will more or less follow the instructions in there. Things that we will change are not essential but reflect our special requirements.

We will later be using the dvd1 ISO image to install the system. For now, the mfsbsd has a subset of the installation disk with enough commands to get you going.

If you have used the generic mfsbsd image described earlier, you will need to get the kernel modules out of the kernel.txz file you got earlier in /boot.

cd /tmp
fetch ftp://ftp.fr.freebsd.org/pub/FreeBSD/releases/amd64/amd64/9.1-RELEASE/kernel.txz
tar xfj kernel.txz

Now inside /tmp/boot/kernel, you have all the modules from a standard 9.1 installation. You can now load the missing modules:

cd boot/kernel
kldload ./zlib.ko
kldload ./crypto.ko
kldload ./geom_eli.ko
kldload ./aesni.ko

NOTE: in recent versions of mfsbsd (the so-called Special Edition named -se), there is a script called zfsinstall that will do most of what is explained there. The reason we are not using it is that it does not support encrypted partitions at all. If you don’t need that, just use the script.

Partitioning the drives

As we will be using the two disks in mirror mode, we will be replicating the commands we do on da0 on da1. That way, if either disk is broken at some point, the system will be able to find all the information it needs to boot.

Later on, when finished with the partitioning and encryption phases, we will be transferring the dvd1 image in memory and mount it as a directory with the mdconfig command. In the meantime, let’s begin.

Before we install our own GPT partition table, we must wipe out the previous partition table installed by the Dedibox installation system.

dd if=/dev/zero of=/dev/da0 bs=512 count=10
dd if=/dev/zero of=/dev/da1 bs=512 count=10

then install our own:

gpart create -s gpt da0
gpart create -s gpt da1

scoite# gpart show
=>        34  3907029101  da0  GPT  (1.8T)
          34  3907029101       - free -  (1.8T)

=>        34  3907029101  da1  GPT  (1.8T)
          34  3907029101       - free -  (1.8T)

Now create the boot, 1st freebsd-zfs, swap then the 2nd freebsd-zfs partition. The first ZFS partition is not big because we need only what is necessary for booting (but we still need some space because of all the kernel modules and the symbols). As we will be also encrypting the swap (it makes no sense encrypting the data and not the swap as well) and also mirror it for safety reasons, swap will be twice the RAM (32 GB in our case).

An issue you want to look for is that now that we have really big hard drives (2 and 3 TB now and soon more), these have been getting 4 KB sectors and run really slowly in 512-bytes sectors so you want your partitions aligned on 4 KB boundaries so, from now on, I will be adding “-a 4k” to the gpart(8) command lines:

gpart add -s 64K -a 4k -t freebsd-boot da0
gpart add -s 2G -a 4k -t freebsd-zfs -l boot0 da0
gpart add -s 32G -a 4k -t freebsd-swap -l swap0 da0
gpart add -a 4k -t freebsd-zfs -l tank0 da0

We can just do the alignment part because we are using geli(8) on the disks below ZFS, geli use 4k sectors anyway for ZFS will pick that up at zpool creation time and set the right ashift value (12 in this case). For plain disks zpool, you will have to use the gnop(8) trick to present 4k sectors to ZFS regardless of the actual sector size.

It is probably only needed on the first call as the rest will be aligned due to their sizes. You may also encounter smaller drives that protest when the partitions are not aligned. In these cases (recently seen on a Seagate Barracuda 500 GB), you can specify -b 40 which will give a 4k-aligned partition (128+40 = 168 which is dividable by 4 instead of 34+128 = 162).

We mirror that configuration on da1:

gpart add -s 64K -a 4k -t freebsd-boot da1
gpart add -s 2G -a 4k -t freebsd-zfs -l boot1 da1
gpart add -s 32G -a 4k -t freebsd-swap -l swap1 da1
gpart add -a 4k -t freebsd-zfs -l tank1 da1

NOTE: You can not use something like gpart backup da0 | gpart restore -F da1 to copy the entire partition table in one go because the name of the labels would be the same.

You can check that the different partitions and labels do now exist in /dev/gpt:

scoite# ls /dev/gpt
boot0	boot1	swap0	swap1	tank0	tank1

As we want to be able to boot from either disk, we mark the 2nd partition as a boot candidate:

gpart set -a bootme -i 2 da0
gpart set -a bootme -i 2 da1

You should end up with something like this:

=>        34  3907029101  da0  GPT  (1.8T)
          34         128    1  freebsd-boot  (64K)
         162     4194304    2  freebsd-zfs  [bootme]  (2.0G)
     4194466    67108864    3  freebsd-swap  (32G)
    71303330  3835725805    4  freebsd-zfs  (1.8T)

=>        34  3907029101  da1  GPT  (1.8T)
          34         128    1  freebsd-boot  (64K)
         162     4194304    2  freebsd-zfs  [bootme]  (2.0G)
     4194466    67108864    3  freebsd-swap  (32G)
    71303330  3835725805    4  freebsd-zfs  (1.8T)

We will load the bootcode in place on both disks, remember that if the first drive fail, you want to be able to boot on the second one.

gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da0
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da1

Encrypting the disks

Creating the keyfile for the partitions, using the same passphrase for convenience.

mkdir /root/keys
dd if=/dev/random of=/root/keys/boot.key bs=128k count=1

Now, you have to choose a passphrase (as usual, not too short, not guessable, remember you need it only at bootime). You could choose a different passphrase for each disk but I’d not recommend it because that would be giving two ciphertexts for the same cleartext (as the two partitions are going to be mirrored thus contain the exact same data).

geli init -b -K /root/keys/boot.key -s 4096 -l 256 /dev/gpt/tank0
geli init -b -K /root/keys/boot.key -s 4096 -l 256 /dev/gpt/tank1

You will find backups of the metadata in /var/backups, it is probably a good idea not to forget to copy them elsewhere for security just in case.

Attach both drives:

geli attach -k /root/keys/boot.key /dev/gpt/tank0
geli attach -k /root/keys/boot.key /dev/gpt/tank1

NOTE: remember that we did load the aesni kernel module? If it is used, the above commands should display something about “hardware” crypto being used for AES-XTS mode like the following (run dmesg(8)):

cryptosoft0: <software crypto> on motherboard
aesni0: <AES-CBC,AES-XTS> on motherboard
GEOM_ELI: Device gpt/tank0.eli created.
GEOM_ELI: Encryption: AES-XTS 256
GEOM_ELI:     Crypto: hardware
GEOM_ELI: Device gpt/tank1.eli created.
GEOM_ELI: Encryption: AES-XTS 256
GEOM_ELI:     Crypto: hardware

Pool creation

We will also use a different way to create/mount the datasets to make the last part of the install (switching mountpoints) much easier.

Now, create the 1st ZFS partition in mirror mode for the unencrypted part:

zpool create zboot mirror gpt/boot0 gpt/boot1

and the encrypted also mirrored 2nd ZFS partition:

zpool create -o altroot=/mnt -O mountpoint=none tank mirror gpt/tank0.eli gpt/tank1.eli

NOTE: if you used a regular distribution boot disk (like -dvd1) instead of a mfsbsd one, you will find that /boot is read-only meaning that you will not be able to create a proper /boot/zfs/zpool.cache file. In this case, use the -o cachefile=/tmp/zpool.cache at zpool creation. You will move this file in its proper place before reboot.

The two pools should be appearing like this:

NAME    SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
tank   1.78T  1.76G  1.78T     0%  1.01x  ONLINE  -
zboot  1.98G  92.5K  1.98G     0%  1.00x  ONLINE  -

When we will have created some filesystems on the disks, we will set the bootfs property on both pools. I use a separate root filesystem on my pools, it makes changing the / fs much more simple and allow to have different ones.

Now that the pools have been created, we switch the algorithm used to checksum disk blocks. “fletcher4” is only slightly slower but better (just like CRC16 vs CRC32).

zfs set checksum=fletcher4 zboot
zfs set checksum=fletcher4 tank

Encrypted swap

Swap is slightly different, we will use the “onetime” command to geli(8) through the way we declare swap in /etc/fstab, that way we do not need to enter any passphrase because it is not needed to know it after attaching the partitions (see geli(8) for details).

As I said earlier, we will use encrypted swap and geli has automatic setup for that by adding the .eli suffix in /etc/fstab. So let’s create the gmirror configuration for swap.

gmirror label swap gpt/swap0 gpt/swap1

The /etc/fstab entry will look like the following:

/dev/mirror/swap.eli	none swap sw 0 0

In this schema, we do not encrypt each swap partition but the mirror itself (swap over geli over gmirror) to avoid doing things twice.

Boot Environments (BE)

There is an interesting program called beadm, directly inspired by the Illumos utility of the same name that will be very handy to manage multiple versions of the OS (called boot environments) and upgrades. It will be interesting to see whether using beadm is possible in our two pool environment.

What we will do is create our datasets according to the naming scheme of beadm to ease migration to BE later.

Filesystems

We will globally follow the filesystem layout we used in HOWTO on 8.2 with more or less the same options.

Compression seems to create issues for kernel loading so we will avoid that on tank/root. All other FS will inherit the compression property though.

As for compression scheme, we can select different algorithms for compression. lz4 is the fastest available (it replaces lzjb which is still available) but gzip is better compression-wise. FreeBSD9.2 has a new version of ZFS, using a feature-based numbering scheme instead of a single version number (see the New features in 10 page).

I have not done any benchmarking yet on this pool-wide compression. It may be too slow to use the default gzip compression (-6), please feel free to experiment there. You may wish to only enable compression on selected filesets. For now, I will use lz4 everywhere and disable it for specific cases like distfiles.

If you are installing 9.1 (which I do not really recommend), replace lz4 by lzjb.

zfs set compression=lz4 tank
zfs create -o compression=off tank/root
zfs create -o mountpoint=/tank/root/usr tank/usr
zfs create -o mountpoint=/tank/root/usr/obj tank/usr/obj
zfs create -o mountpoint=/tank/root/usr/local tank/usr/local

The reason why I create a separate /usr fileset is that I want to have different policies for compression, atime property, snapshots and all that. You can also create another fileset for /usr/local, once again to be able to snapshot separately the base system and the ports you will be using.

To complete what we want under /usr, we will create /usr/src to hold the system’s sources. We will need these to recompile a new trimmed-down kernel.

zfs create -o mountpoint=/tank/root/usr/src tank/usr/src

Now /var and a few useful filesets here with special properties we care about to avoid security issues. Do not set exec=offon /tmp as it would prevent things like installworld to run properly.

zfs create -o mountpoint=/tank/root/var tank/var
zfs create -o exec=off -o setuid=off tank/var/empty
zfs create -o exec=off -o setuid=off tank/var/named
zfs create -o exec=off -o setuid=off tank/var/run

zfs create -o mountpoint=/tank/root/var/tmp tank/var/tmp
zfs set exec=off tank/var/tmp
zfs set setuid=off  tank/var/tmp
chmod 1777 /tank/root/var/tmp

zfs create -o mountpoint=/tank/root/tmp tank/tmp
zfs set setuid=off  tank/tmp
chmod 1777 /tank/root/tmp

I would also recommend to put users’ home in a separate fileset for the same reason, if not a fileset per user if you want to limit users area to specific sizes.

zfs create -o mountpoint=/tank/root/home tank/home

Later, you will want to create tank/usr/ports/{distfiles,packages} w/o compression as well. Properties like snapdir can be changed later on so we are not forced to set them right now. If you are planning to use the new pkg(1) command to deal with binary packages (aka pkgng) then /usr/ports is not needed.

zfs create -o mountpoint=/tank/root/usr/ports -o setuid=off tank/usr/ports
zfs create -o mountpoint=/tank/root/usr/ports/distfiles  -o compression=off -o exec=off -o setuid=off tank/usr/ports/distfiles
zfs create -o mountpoint=/tank/root/usr/ports/packages -o compression=off -o exec=off -o setuid=off tank/usr/ports/packages

If you plan to have jails on this machine, it is a good idea to create a /jails as well:

zfs create -o mountpoint=/tank/root/jails tank/jails

One of the nice things about ZFS is that for many things, you can use zfs create instead of mkdir. It won’t take that much diskspace and will allow you to specify different policies for backups/snapshots/compression for every filesystem.

One thing you want to know about ZFS is that it uses the Copy-On-Write principle and never overwrite data. Anytime you rewrite a block, a fresh one is written elsewhere and pointers updated (very fast summary, see the ZFS docs for more details). The main result is that when you have a completely filled up fileset, you can not remove files to make space as it would require some free space first. A way to mitigate that is ensuring you do not filled up a fileset and you can reserve some space in the “root” fileset.

zfs set reservation=512m tank

Deduplication

ZFSv28 and later support one interesting feature among many called deduplication (see deduplication on WP for more details). It needs to be enabled on all filesets you want to have deduplication on. Beware though that enabling deduplication will make ZFS use much more memory than before and that you can’t really go back.

zfs set dedup=on tank/usr/src

Afterwards, when you have put some files in there, you can check the deduplication status with zpool. On a 16 GB system with “only” 2 TB of data, deduplication can be enabled without too much trouble.

1008 [15:05] roberto@centre:~> zpool list
NAME    SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
tank   1.78T  23.0G  1.76T     1%  1.09x  ONLINE  -
zboot  1.98G   390M  1.60G    19%  1.00x  ONLINE  -

In v5000 of ZFS, dedup can be achieved through byte-per-byte comparison or through the sha256 hash function. The latter is much faster.

Installing the system

There are several ways to extract the various parts of the distribution. You can get the -memstick images, one of the cd9660 images or just download the *.txz files from a FTP site.

Just like in HOWTO on 8.2, we will extract all distributions manually and fetch everything else from the ‘net.

In the example below, I have retrieved one of the -memstick version, mounted it as an md device in /mnt and therefore I can find the *.txz files in /mnt/usr/freebsd-dist directory. Direct download from where you got the kernel.txz works fine as well of course.

-rw-r--r--  1 root  wheel       782 May  5 07:34 MANIFEST
-rw-r--r--  1 root  wheel  66513440 May  5 07:34 base.txz
-rw-r--r--  1 root  wheel   1442744 May  5 07:34 doc.txz
-rw-r--r--  1 root  wheel   1117756 May  5 07:34 games.txz
-rw-r--r--  1 root  wheel  81061820 May  5 07:34 kernel.txz
-rw-r--r--  1 root  wheel  12291244 May  5 07:34 lib32.txz
-rw-r--r--  1 root  wheel  35504984 May  5 07:34 ports.txz
-rw-r--r--  1 root  wheel  98151876 May  5 07:34 src.txz

Extract all distributions

By default, root is using /bin/csh as login shell, you need to type sh now in order to cut&paste the examples below.

cd /mnt/usr/freebsd-dist
for i in base doc games kernel lib32 src; do \
   xz -d -c $i.txz | tar -C /tank/root/ -xf - ; \
done

Install configuration variables at proper places

We need to add variables to several files needed for the boot phase, you can use echo(1) or even vi(1).

In /boot/loader.conf:

# fs modules
zfs_load="YES"
geom_mirror_load="YES"
fdescfs_load="YES"
nullfs_load="YES"

# Crypto stuff
geom_eli_load="YES"
crypto_load="YES"
aesni_load="YES"
cryptodev_load="YES"

# pf stuff
pf_load="YES"
pflog_load="YES"

# tuning
vm.kmem_size="32G"

Using the following should not be necessary anymore as the loader is able to find which dataset will be used automatically:

vfs.root.mountfrom="zfs:tank/root"

For the first boot in order to verfiy everything is correct, you use the following variable to make the passphrase appear:

kern.geom.eli.visible_passphrase="1"

to be removed in production of course.

Then you can add some tunables for your ZFS installation:

# http://lists.freebsd.org/pipermail/freebsd-stable/2011-February/061388.html
vfs.zfs.txg.timeout="5"

You also need to point geli(4) at the right bits for it:

geli_da0p4_keyfile0_load="YES"
geli_da0p4_keyfile0_type="da0p4:geli_keyfile0"
geli_da0p4_keyfile0_name="/boot/keys/boot.key"
geli_da1p4_keyfile0_load="YES"
geli_da1p4_keyfile0_type="da1p4:geli_keyfile0"
geli_da1p4_keyfile0_name="/boot/keys/boot.key"

Current recommendations for FS tuning includes setting kmem_size at between 1.5x and 2x the available RAM. Be careful to use the right device name above for the geli_* lines or you will not be able to attach the encrypted partitions. Do not use geli_tank0_ with gpt/tank0 for example, that will NOT work.

Do not forget to add the following (or another value) to /etc/sysctl.conf or you will have issues at boot-time with vnode depletion:

##-- tuning
kern.maxvnodes=260000

Your /etc/rc.conf should have some variables defined to properly boot:

zfs_enable="YES"
sshd_enable="YES"
hostname="hostname.example.com"
ntpd_enable="YES"
ntpd_sync_on_start="YES"
# or ifconfig_em0="DHCP"
ifconfig_em0="inet a.b.c.d netmask 0xffffff00"   # or "DHCP"
geli_swap_flags="-e aes -l 256 -s 4096 -d"

(do not forget things like defaultrouter and all that).

Exit the chroot area if you were in one for easy editing of the previous files.

Finishing up

There are several steps to follow before even rebooting the first time (you do remember that every time you reboot, you have to log on the console and enter the encryption passphrase, right?).

Generate zpool.cache

cd /
mkdir /boot/zfs
zpool export tank && zpool import tank
cp /boot/zfs/zpool.cache /tank/root/boot/zfs/

Copy the /boot bits into place for real boot

cp -pR /root/keys /tank/root/boot/

cd /tank/root
mkdir /zboot/boot
cp -Rp boot/* /zboot/boot/

Configuring the encrypted swap in `/tank/root/etc/fstab`

scoite# cat /tank/root/etc/fstab
/dev/mirror/swap.eli	none swap sw 0 0

Another issue to look for is that by default, you won’t be able to have kernel crash dumps on a gmirror device (see gmirror(8) for the details and solution). We need to use two special scripts used in the boot process to work around that limitation (as we do not want to always use the prefer setting for mirrored swap):

echo 'gmirror configure -b prefer swap'>>/tank/root/etc/rc.early
echo 'gmirror configure -b round-robin swap'>>/tank/root/etc/rc.local

Fixing mount points

cd /
zfs umount -a
zfs set mountpoint=legacy tank
zfs set mountpoint=/jails tank/jails
zfs set mountpoint=/tmp tank/tmp
zfs set mountpoint=/var tank/var
zfs set mountpoint=/var/empty tank/var/empty
zfs set mountpoint=/var/named tank/var/named
zfs set mountpoint=/var/run tank/var/run
zfs set mountpoint=/var/tmp tank/var/tmp
zfs set mountpoint=/usr tank/usr
zfs set mountpoint=/usr/local tank/usr/local
zfs set mountpoint=/usr/obj tank/usr/obj
zfs set mountpoint=/usr/ports tank/usr/ports
zfs set mountpoint=/usr/ports/distfiles tank/usr/ports/distfiles
zfs set mountpoint=/usr/ports/packages tank/usr/ports/packages

…and for all other filesets you added back above without forgetting to set the bootfs property on the right fileset:

zpool set bootfs=tank/root tank

Things to remember

Whe you system is up and working, you will at some point want to update, either to stay close to the branch or because of a security issue or whataver.

Some things to keep in mind:

zboot is the main source for early stage of booting, if you update /boot (which is in the other, encrypted pool tank), you will need to update zboot/boot with the new binaries BUT do not replace without copying the GELI keys which are in /zboot/boot/keys or you will not be able to boot.
Be careful when booting, most of the KVM devices have issues with some special keyboard (like iDRAC6 with a Mac one) so when typing your GELI encryption keys, check before that the keys are the good ones :)
you still need to create users, install packages, create your jails (if needed) and so on. Do not forget to also assign a password to the root account (personally I almost never use that account, I prefer using my own Calife or sudo(8) for that).

Resources

I have begun writing this script to put everything into a single .sh.

NOTE: This is a work-in-progress, please check it regularly for updates.

After discovering Ansible as an automation tool, I’m writing an Ansible playbook to make everything mentioned above easier to configure.

You can find this work in progress on Github. All feedback is welcome, patches or pull requests even more so :)

Feedback

Please send any comment, addition, correction to my FreeBSD mail or my personal mail. Thanks to all who have done it already.

History

1.0 Creation
1.1 Update for recent mfsbsd images
1.2 Update after recent experiment on 9.1/mfsbsd
1.3 Updated with boot environments and easier way to deal with initial mountpoints by Thomas Quinot
1.4 Mention the Ansible playbook as work in progress.
1.5 Mention the gnop(8)trick

Credits

Thanks to these people for input and corrections.

Stéphane “KingBug” Clodic
Paul Guyot paul@semiocast.com
Thomas Quinot thomas@quinot.org
iMil on Twitter