[freebsd banner]

Complete ZFS-based installation on local FreeBSD 8.2

This article explains how to configure and install a ZFS-only FreeBSD system, either in mirror or in raidz mode. This is for a local installation. If you are interested in installing a remote (hosted) machine and/or are planning to use FreeBSD 9.x see the other HOWTO.

Table of content

  1. Complete ZFS-based installation on local FreeBSD 8.2
  2. Table of content
  3. Prerequesites
  4. VMWare setup
  5. Important notice
  6. Architecture
  7. Mirror setup
  8. Raidz setup
  9. Common steps now that we have pool setup
  10. Checking everything is correct
  11. Feedback
  12. History
  13. Credits

Prerequesites

The dvd1 ISO image is enough to get you started, you don’t need the livefs one as this one can be used as a “fixit” media.

NOTE: this HOWTO has only been tested with 8.0 snapshots issued after BETA4 because some essential ZFS fixes were merged. Latest 7.2 branch has more or less the same version of ZFS (pool v13) so it should be possible to use a 7.2-STABLE snapshot (maybe even 7.2-RELEASE) for the installation.

I’m going the GPT way mostly because I think it is more flexible and more portable (although it may or not be an issue for you). Note also that MBR slices are limited to 2TB (again, this may not be an issue).

If you don’t see any added value to GPT, you can still use your usual MBR-based setup, see the FreeBSD Wiki

I’m not going to swap on zvol as I am not sure it is mature enough so a partition will be allocated on each drive for swap. I do not see the advantages of including swap in the zpool anyway, you want it to be contiguous (which is not garanteed in the pool) for performance reasons.

In both mirror and raidz cases you’ll need to compute the various values to feed gpart(8) command to. I haven’t tried raidz2 though I do not expected to be different from raidz1. The values to gpart(8) can be inferred by looking at gpart show. It is mostly to find the remaining free space till the end.

Modern gpart(8) is able to find start and end points automatically (like using the rest of disk for the last partition) and can take values such as 1G which makes it easier to setup the partitions.

After booting the dvd1 ISO image on a CD (or in my VMWare case, attach the image into the VM), you must stop the installation at the loader prompt by pressing >6 at the prompt. That will get you at the /boot/loader prompt. Just issue the following command

    load zfs

You should see the loader open zfs.ko and opensolaris.ko as a dependency. It is now time to change the default vm.kmem_size loader tunable. On amd64 (aka x86_64), you can set it rather high to give lots of memory for ZFS.

Now complete the boot process by typing

    boot

Now, you enter sysinstall the main installation program. Go to the “Fixit” menu entry and specify that you want to load the fixit data from CD/DVD drive. You are now that the Fixit# prompt. You are now ready to partition the disks. BTW don’t forget to change the keyboard type if you are on a non QWERTY keyboard.

There is work in progress by Alexander Motin (mav@) to write a CAM interface to ATA/SATA devices. The result is the AHCI driver. With AHCI, these devices are seen through the CAM layer and available the same way as SCSI/SAS devices (with pass devices and so on). Thus we can use camcontrol(8) to talk to the devices.

The first result is that the devices will appear as ada{0,1,2,...} instead of the usual (and not very logical IMHO) ATA way as ad{4,6,8}. The nice thing with using GPT and ZFS is that it is entirely transparent and ZFS will happily mount pools even though the names have changed.

You will need to have an AHCI-compatible motherboard/chipset and BIOS and change the BIOS settings to use AHCI instead of what is called generally “legacy mode” ATA. One of the other advantages is that many modern disks will offer NCQ (aka command tagging in the SCSI/SAS world TCQ).

VMWare setup

I use VMWare for Mac, named Fusion 2.0.something. I use SCSI “disks” as it is more flexible. The current target is a 768 MB amd64 VM.

In the following, da0..da2 means that you have to repeat the command for each devices, it is not meant to be used literally

Important notice

Booting on ZFS has one important limitation that one must be aware of: the booting pool can only contain one vdev (i.e. one mirror or one raidz). You can not add another mirror later and have the pool be raid 0+1.

You can have multiple pools of course.

Architecture

I’m going to present two main modes of constructing the booting pool:

  1. mirror
  2. raidz

There are also three interesting types of vdev (bear in mind that this is not a full explanation on how ZFS works):

  1. cache
  2. log
  3. spare

Cache vdev

To speed up I/Os, one can add some caching devices (disks, partitions, others) to a given pool. They can not be part of a raidz or a mirror, just independent devices. It can help especially if the device is faster than the regular disks (like a SSD). Failures on a cache device are flagged but will be ignored (i.e. the I/O will take place with the regular vdevs).

Log vdev

ZFS has an Intent Log called ZIL. It can be put independantly of the main devices of the pool, like, again, on an SSD. Log devices can (and should IMO) be mirrored (not raidz). Errors on log vdevs are mostly fatal so be careful there.

Spares

You can have hot-spares as well of course. Spares will not be used to replace log vdevs!

Mirror setup

2x 8 GB drives

gpart create -s GPT da0..da1

Now gpart show will show you the entire disk. Take its size, substract (swapsize + 162) then you have the remaining argument for the freebsd-zfs partition below.

gpart add -b 34 -s 128 -t freebsd-boot da0..da1            64 KB boot
gpart add -b 162 -s 1G -t freebsd-swap -l swapN da0..da1   1 GB swap
gpart add  -t freebsd-zfs da0..da1                         7.5 GB rest

The -l swapN parameter will be swap0 for da0 and so on.

Installing the boot code may be delayed till they have been rebuild but it does not hurt.

gpart bootcode -b /dist/boot/pmbr -p /dist/boot/gptzfsboot -i 1 da0..da1
zpool create tank mirror da0p3 da1p3

Raidz setup

NEWS ALERT: mostly working now!

It will work in 9.0 (which is CURRENT for now) and probably will be merged to 8.1. There was some 64 bit miscaculations that have been fixed now.

3x 4 GB drives

gpart create -s GPT da0..da2

Now gpart show will show you the entire disk. Take its size, substract (sizesize + 162) then you have the remaining argument for the freebsd-zfs partition below.

gpart add -b 34 -s 128 -t freebsd-boot da0..da2             64 KB boot
gpart add -b 162 -s 1G -t freebsd-swap -l swapN da0..da2    1 GB swap
gpart add -t freebsd-zfs da0..da2                           3.5 GB rest

The -l swapN parameter will be swap0 for da0 and so on.

Same here.

gpart bootcode -b /dist/boot/pmbr -p /dist/boot/gptzfsboot -i 1 da0..da2
zpool create tank raidz da0p3 da1p3 d2p3

Common steps now that we have pool setup

I am going to have a separate root filesystem, it does not seem a good idea to “abuse” the root of the pool for that. Compression seems to create issues for kernel loading so we will avoid that on tank/root. All other FS will inherit the compression property though.

As for compression scheme, in zpool v13, we can select different algorithms for compression. lzjb is the fastest available but gzip is better compression-wise so pjd recommends using lzjb on things like /usr/obj where speed is more interesting than actual compression ratio.

I have not done any benchmarking yet on this pool-wide compression. It may be too slow to use the default gzip compression (-6), please feel free to experiment there. You may wish to only enable compression on selected filesets (aka file-system for ZFS)

We will also change the checksum algorithm from the default fletcher2 to the newer fletcher4. Performance difference is apparently negligeable but should be more robust.

zfs set compression=gzip tank
zfs set checksum=fletcher4 tank
zfs create -o compression=off tank/root
zfs create -o mountpoint=/tank/root/usr tank/usr
zfs create -o mountpoint=/tank/root/usr/obj -o compression=lzjb tank/usr/obj
zfs create -o mountpoint=/tank/root/var tank/var
zfs create -o mountpoint=/tank/root/var/tmp -o compression=lzjb  tank/var/tmp
chmod 1777 /tank/root/var/tmp
zfs create -o mountpoint=/tank/root/tmp -o compression=lzjb  tank/tmp
chmod 1777 /tank/root/tmp

The reason why I create a separate /usr fileset is that I want to have different policies for compression, atime property, snapshots and all that.

I would also recommend to put users’ home in a separate fileset for the same reason, if not a fileset per user if you want to limit users area to specific sizes.

Later, you will want to create tank/usr/ports/{distfiles,packages} w/o compression as well. Properties like snapdir can be changed later on so we are not forced to set them right now.

Extract all distributions

cd /dist/8.0-*
export DESTDIR=/tank/root
for i in base dict doc games info lib32 manpages; do \
   (cd $i && sh ./install.sh) \
done
You are about to extract the base distribution into /tank/root- are you SURE
you want to do this over your installed system (y/n)? y
You are about to extract the doc distribution into /tank/root - are you SURE
you want to do this over your installed system (y/n)? y

cd kernels
sh ./install.sh generic
cd /tank/root/boot
cp -Rp GENERIC/* kernel/

cd /dist/8.0-*/src
sh ./install.sh all
Extracting sources into /usr/src...
  	Extracting source component: base
...
  	Extracting source component: usbin
Done extracting sources.

Install configuration variables at proper places

You can use echo(1) or even vi(1) for the following:

echo 'zfs_enable="YES"' > /tank/root/etc/rc.conf
echo 'LOADER_ZFS_SUPPORT= yes' > /tank/root/etc/src.conf
echo 'zfs_load="YES"' > /tank/root/boot/loader.conf
echo 'vfs.root.mountfrom="zfs:tank/root"' >> /tank/root/boot/loader.conf

Don’t forget to add any necessary line to loader.conf if needed (like vm.kmem_size).

For every disk with a swap partition:

echo "/dev/da0p2 none swap sw 0 0">> /tank/root/etc/fstab
echo "/dev/da1p2 none swap sw 0 0">> /tank/root/etc/fstab
...

Something interesting to mention as it can save time later: one can define labels in GPT partitions. These labels will be recognized by glabel (if loaded) and devices will get created in /dev/gpt automatically. This is interesting for swap partitions as it creates device-independant pathnames.

So if you have defined labels as suggested above, this will become

echo "/dev/gpt/swap0 none swap sw 0 0">> /tank/root/etc/fstab
echo "/dev/gpt/swap1 none swap sw 0 0">> /tank/root/etc/fstab
...

Generate zpool.cache

mkdir /boot/zfs
zpool export tank && zpool import tank
cp /boot/zfs/zpool.cache /tank/root/boot/zfs/

Recompile gptzfsboot/loader (chroot)

chroot /tank/root
mount -t devfs devfs /dev
unset DESTDIR
cd /usr/src/sys/boot/
make obj
make depend
make | tee /tmp/loader.msg

check in loader.msg that everything is fine WRT ZFS support

cd i386/loader
make install

umount /dev
exit

NOTE: If you are installing a version of FreeBSD after Dec. 7th, 2009, this step is now obsolete because FreeBSD now installs a ZFS-aware bootloader called zfsloader in /boot.

export LD_LIBRARY_PATH=/dist/lib

reinstall newsly compiled bootblocks

gpart bootcode -b /tank/root/boot/pmbr -p /tank/root/boot/gptzfsboot -i 1 da0..daN

Fix mount points

zfs umount -a
zfs set mountpoint=legacy tank
zfs set mountpoint=/tmp tank/tmp
zfs set mountpoint=/var tank/var
zfs set mountpoint=/var/tmp tank/var/tmp
zfs set mountpoint=/usr tank/usr
zfs set mountpoint=/usr/obj tank/usr/obj
zpool set bootfs=tank/root tank

    reboot

You don’t need to set mountpoint=legacy for tank/root, the bootfs property does that for you apparently.

Checking everything is correct

You are now rebooting. To check that you have the correct /boot/loader and that everything on disk is correct, break again to the loader prompt by pressing 6. You should have seen that the zfs & opensolaris modules have been automatically loaded so no need to do it again. Issue

    lsdev

You should see a zfs0 entry at the bottom of the output along with disk0, disk1 and so on. If you do not see it, your /boot/loader does not have the correct ZFS support.

NOTE: I managed to get raidz booting to work! Apparently there is a remaining problem because it seems to have issues loading /boot/loader.conf, error is

    ZFS i/o error
    ZFS: all block copies unavailable

After breaking into the loader, I then set the variables from loader.conf (zfs.ko loading and setting vfs.root.mountfrom=zfs:tank/root) and.. lo and behold, it boots off raidz!

For the moment, it is probably safer to use only the mirrored setup till this issue is sorted out

Feedback

Please send any comment, addition, correction to my FreeBSD mail or my personal mail. Thanks to all who have done it already.

History

v1.0 Creation
v1.1 Fix incorrect commands (Thanks to Douglas Berry )
v1.2 More details on early boot process (Thanks to Douglas again), typos (Thanks to Matteo Riondato)
v1.3 Mention keymap change, add compression parameters following discussion with pjd.
v1.4 Booting off raidz works! (see end of page for details)
v1.5 Easier gpart(8) usage, still some raidz troubles and compression-related musings (Thanks to Thomas Backman)
v1.6 Simplifications, comments from Thomas Quinot
v1.7 Mention AHCI, GPT labels
v1.8 Mention the main limitation of booting off a ZFS pool
v1.9 Mention the cache and log vdev types
v1.10 More data on AHCI/TCQ/NCQ with WP links
v1.11 Mention the obsolescence of the recompilation or the loader, we have it built by default now.

Credits

This HOWTO is inspired by several blog entries by other people (mainly [1][] and [2][]) and messages from the freebsd-current & freebsd-fs mailing-lists (like [3][] and [4][]). Thanks to these people to have started looking into this!

[1]: http://blogs.freebsdish.org/lulf/2008/12/16/setting-up-a-zfs-only-system/ “setting up a zfs-only system”
[2]: http://blog.etoilebsd.net/2009/05/27/Migrer_sur_du_full_ZFS.html “Migrer sur du full ZFS (French)”
[3]: http://lists.freebsd.org/pipermail/freebsd-current/2009-September/011331.html “Message #11331”
[4]: http://lists.freebsd.org/pipermail/freebsd-stable/2009-November/052799.html “Message #52799”