A while back, I bought a ReadyNAS device for my network, attracted by the idea of RAID I can grow over time and mirrored disks.
Today I just finished building the same thing “by hand”, using FreeBSD and ZFS. At a fraction of the cost. Here’s how.
First off, I bought this amazing bargain: an HP ProLiant MicroServer. These would be cheap even at list price, but with the current ÂŁ100 cashback offer, they’re just stupidly cheap. And rather nice.
Since I want to cater for a realistic future, I am assuming by the time I need to replace a drive I will no longer be able to buy a matching device, so I started from day one with a different second drive (the primary is 250 GB, secondary is 500 GB – both Seagate, which was not the plan, but I’ll remedy that in the next episode). I also added an extra 1GB of RAM to the machine (this is important for ZFS which is apparently not happy with less than 2GB of system RAM).
I then followed, more or less, Pawel’s excellent instructions for creating a fully mirrored setup. However, I had to deviate from them somewhat, so here’s my version.
The broad overview of the process is as follows
- Install FreeBSD on the primary disk, using a standard sysinstall.
- Create and populate gmirror and ZFS partitions on the secondary disk.
- Boot from the primary disk, but mount the secondary.
- Create and populate gmirror and ZFS partitions on the primary disk.
- Use excess secondary disk as scratch.
In my case the two disks are ad4 (primary, 250 GB) and ad8 (secondary, 500 GB). Stuff I typed is in italic
.
Since we need identical size partitions for the mirror, we need to simulate the first disk (since it happens to be smaller). Get the disk’s size
# diskinfo -v /dev/ad4
/dev/ad4
512 # sectorsize
250059350016 # mediasize in bytes (233G)
488397168 # mediasize in sectors
0 # stripesize
0 # stripeoffset
484521 # Cylinders according to firmware.
16 # Heads according to firmware.
63 # Sectors according to firmware.
9VMQN8T5 # Disk ident.
Create a memory disk the same size. Note that the sector sizes must match!
# mdconfig -a -t swap -s 488397168
md0
Verify they are the same.
# diskinfo -v /dev/ad4 /dev/md0
/dev/ad4
512 # sectorsize
250059350016 # mediasize in bytes (233G)
488397168 # mediasize in sectors
0 # stripesize
0 # stripeoffset
484521 # Cylinders according to firmware.
16 # Heads according to firmware.
63 # Sectors according to firmware.
9VMQN8T5 # Disk ident.
/dev/md0
512 # sectorsize
250059350016 # mediasize in bytes (233G)
488397168 # mediasize in sectors
0 # stripesize
0 # stripeoffset
Now partition the memory disk as we will the first disk later on.
# gpart create -s gpt md0
md0 created
# gpart add -b 34 -s 128 -t freebsd-boot md0
md0p1 added
# gpart add -s 2g -t freebsd-swap -l swap1 md0
md0p2 added
# gpart add -t freebsd-zfs -l systemx md0
md0p3 added
and show the resulting sizes
# gpart show md0
=> 34 488397101 md0 GPT (233G)
34 128 1 freebsd-boot (64K)
162 4194304 2 freebsd-swap (2.0G)
4194466 484202669 3 freebsd-zfs (231G)
Now blow away the memory disk, we don’t need it any more.
# mdconfig -d -u 0
Create the partitions on the second disk.
# gpart create -s gpt ad8
ad8 created
# gpart add -b 34 -s 128 -t freebsd-boot ad8
ad8p1 added
# gpart add -s 2g -t freebsd-swap -l swap1 ad8
ad8p2 added
# gpart add -s 484202669 -t freebsd-zfs -l system8 ad8
ad8p3 added
And eat the rest of the disk as a scratch area (this area will not be mirrored, and so should only be used for disposable stuff).
# gpart add -t freebsd-zfs -l scratch8 ad8
ad8p4 added
Check it matches the md0 simulation
# gpart show ad8
=> 34 976773101 ad8 GPT (466G)
34 128 1 freebsd-boot (64K)
162 4194304 2 freebsd-swap (2.0G)
4194466 484202669 3 freebsd-zfs (231G)
488397135 488376000 4 freebsd-zfs (233G)
And don’t forget to set up the bootloader
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ad8
bootcode written to ad8
I realised as this point I had intended to label everything with an 8, to match the unit number, and had not done so for swap, so for completeness, here’s how you fix it
# gpart modify -i 2 -l swap8 ad8
ad8p2 modified
# gpart show -l ad8
=> 34 976773101 ad8 GPT (466G)
34 128 1 (null) (64K)
162 4194304 2 swap8 (2.0G)
4194466 484202669 3 system8 (231G)
488397135 488376000 4 scratch8 (233G)
Note that the label change is not reflected by the device names in /dev/gpt
, which is needed for the next step, so at this point I rebooted.
Now set up the swap mirror.
# gmirror label -F -h -b round-robin swap /dev/gpt/swap8
Create the ZFS storage pool called system, consisting only of our system8 partition.
# zpool create -O mountpoint=/mnt -O atime=off -O setuid=off -O canmount=off system /dev/gpt/system8
And create a dataset – “mountpoint=legacy” stops ZFS from managing it.
# zfs create -o mountpoint=legacy -o setuid=on system/root
Mark it as the default bootable dataset.
# zpool set bootfs=system/root system
Mount it
# mount -t zfs system/root /mnt
# mount
/dev/ad4s1a on / (ufs, local)
devfs on /dev (devfs, local, multilabel)
system/root on /mnt (zfs, local, noatime)
And create the remaining mountpoints according to Pawel’s suggested layout…
# zfs create -o compress=lzjb system/tmp
# chmod 1777 /mnt/tmp
# zfs create -o canmount=off system/usr
# zfs create -o setuid=on system/usr/local
# zfs create -o compress=gzip system/usr/src
# zfs create -o compress=lzjb system/usr/obj
# zfs create -o compress=gzip system/usr/ports
# zfs create -o compress=off system/usr/ports/distfiles
# zfs create -o canmount=off system/var
# zfs create -o compress=gzip system/var/log
# zfs create -o compress=lzjb system/var/audit
# zfs create -o compress=lzjb system/var/tmp
# chmod 1777 /mnt/var/tmp
# zfs create -o canmount=off system/usr/home
And create one for each user:
# zfs create system/usr/home/ben
Now, at a slightly different point from Pawel, I edit the various config files. First /boot/loader.conf
. Note that some of these are commented out: this is because, although they appear in Pawel’s version, they are already built into the kernel (this is because I use a GENERIC kernel and he uses a stripped-down one). Including them seems to cause problems (particularly geom_part_gpt
, which causes a hang during boot if present).
geom_eli_load=YES
#geom_label_load=YES
geom_mirror_load=YES
#geom_part_gpt_load=YES
zfs_load=YES
vm.kmem_size=3G # This should be 150% of your RAM.
Enable ZFS
# echo zfs_enable=YES >> /etc/rc.conf
Change fstab for the new layout (note, you might want to edit these in – for example, my system had an entry for cd drives).
# cat > /etc/fstab
system/root / zfs rw,noatime 0 0
/dev/mirror/swap.eli none swap sw 0 0
^D
The .eli
extension here is magic: geom_eli
finds it at startup and automatically encrypts it.
Set the work directory for ports (so that it uses the faster compression scheme during builds).
# echo WRKDIRPREFIX=/usr/obj >> /etc/make.conf
These need to be done now because the next step is to copy the entire install to the new ZFS filesystem. Note that this particular command pastes completely incorrectly from Pawel’s blog post so be careful!
# tar -c --one-file-system -f - . | tar xpf - -C /mnt/
Tar can’t copy some types of file, so expect an error or two at this point:
tar: ./var/run/devd.pipe: tar format cannot archive socket
tar: ./var/run/log: tar format cannot archive socket
tar: ./var/run/logpriv: tar format cannot archive socket
Just for fun, take a look at the ZFS we’ve created so far…
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
system 1.12G 225G 21K /mnt
system/root 495M 225G 495M legacy
system/tmp 30K 225G 30K /mnt/tmp
system/usr 652M 225G 21K /mnt/usr
system/usr/home 50K 225G 21K /mnt/usr/home
system/usr/home/ben 29K 225G 29K /mnt/usr/home/ben
system/usr/local 297M 225G 297M /mnt/usr/local
system/usr/obj 21K 225G 21K /mnt/usr/obj
system/usr/ports 190M 225G 159M /mnt/usr/ports
system/usr/ports/distfiles 30.8M 225G 30.8M /mnt/usr/ports/distfiles
system/usr/src 165M 225G 165M /mnt/usr/src
system/var 100K 225G 21K /mnt/var
system/var/audit 21K 225G 21K /mnt/var/audit
system/var/log 35K 225G 35K /mnt/var/log
system/var/tmp 23K 225G 23K /mnt/var/tmp
Unmount ZFS
# zfs umount -a
And the one we mounted by hand
# umount /mnt
And set the new ZFS-based system to be mounted on /
# zfs set mountpoint=/ system
And … reboot! (this is the moment of truth)
After the reboot, you should see
$ mount
system/root on / (zfs, local, noatime)
devfs on /dev (devfs, local, multilabel)
system/tmp on /tmp (zfs, local, noatime, nosuid)
system/usr/home/ben on /usr/home/ben (zfs, local, noatime, nosuid)
system/usr/local on /usr/local (zfs, local, noatime)
system/usr/obj on /usr/obj (zfs, local, noatime, nosuid)
system/usr/ports on /usr/ports (zfs, local, noatime, nosuid)
system/usr/ports/distfiles on /usr/ports/distfiles (zfs, local, noatime, nosuid)
system/usr/src on /usr/src (zfs, local, noatime, nosuid)
system/var/audit on /var/audit (zfs, local, noatime, nosuid)
system/var/log on /var/log (zfs, local, noatime, nosuid)
system/var/tmp on /var/tmp (zfs, local, noatime, nosuid)
$ zfs list
NAME USED AVAIL REFER MOUNTPOINT
system 1.79G 225G 21K /
system/root 763M 225G 763M legacy
system/tmp 43K 225G 43K /tmp
system/usr 1.04G 225G 21K /usr
system/usr/home 50.5K 225G 21K /usr/home
system/usr/home/ben 29.5K 225G 29.5K /usr/home/ben
system/usr/local 297M 225G 297M /usr/local
system/usr/obj 416M 225G 416M /usr/obj
system/usr/ports 190M 225G 159M /usr/ports
system/usr/ports/distfiles 30.8M 225G 30.8M /usr/ports/distfiles
system/usr/src 165M 225G 165M /usr/src
system/var 106K 225G 21K /var
system/var/audit 21K 225G 21K /var/audit
system/var/log 41.5K 225G 41.5K /var/log
system/var/tmp 23K 225G 23K /var/tmp
$ swapinfo
Device 1K-blocks Used Avail Capacity
/dev/mirror/swap.eli 2097148 0 2097148 0%
Note that system
is not actually mounted (it has canmount=off
) – it is used to allow all the other filesystems to inherit the /
mountpoint. The one that is actually mounted on /
is system/root
, which is marked as legacy because it is mounted before zfs is up.
Now we’re up on the second disk, time to get the first disk back in the picture (we’re using it for boot but nothing else right now).
First blow away the MBR
# dd if=/dev/zero of=/dev/ad4 count=79
79+0 records in
79+0 records out
40448 bytes transferred in 0.008059 secs (5018970 bytes/sec)
and create the GPT partitions:
# gpart create -s GPT ad4
ad4 created
# gpart add -b 34 -s 128 -t freebsd-boot ad4
ad4p1 added
# gpart add -s 2g -t freebsd-swap -l swap4 ad4
ad4p2 added
# gpart add -t freebsd-zfs -l system4 ad4
ad4p3 added
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ad4
bootcode written to ad4
No scratch partition on this one, there’s no room. Now the two disks should match
# gpart show
=> 34 976773101 ad8 GPT (466G)
34 128 1 freebsd-boot (64K)
162 4194304 2 freebsd-swap (2.0G)
4194466 484202669 3 freebsd-zfs (231G)
488397135 488376000 4 freebsd-zfs (233G)
=> 34 488397101 ad4 GPT (233G)
34 128 1 freebsd-boot (64K)
162 4194304 2 freebsd-swap (2.0G)
4194466 484202669 3 freebsd-zfs (231G)
apart from the scratch partition, of course.
Add the mirrored swap
# gmirror insert -h -p 1 swap /dev/gpt/swap4
And when rebuilding is finished, you should see
# gmirror status
Name Status Components
mirror/swap COMPLETE gpt/swap8
gpt/swap4
Now add the second disk’s zfs partition
# zpool attach system /dev/gpt/system8 /dev/gpt/system4
If you boot from pool 'system', you may need to update
boot code on newly attached disk '/dev/gpt/system4'.
Assuming you use GPT partitioning and 'da0' is your new boot disk
you may use the following command:
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da0
We already did this part, so no need to do anything. Wait for it to finish. Here it is partway through
# zpool status
pool: system
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scrub: resilver in progress for 0h0m, 39.20% done, 0h0m to go
config:
NAME STATE READ WRITE CKSUM
system ONLINE 0 0 0
mirror ONLINE 0 0 0
gpt/system8 ONLINE 0 0 0
gpt/system4 ONLINE 0 0 0 718M resilvered
errors: No known data errors
and now done
# zpool status
pool: system
state: ONLINE
scrub: resilver completed after 0h2m with 0 errors on Fri Mar 18 12:13:19 2011
config:
NAME STATE READ WRITE CKSUM
system ONLINE 0 0 0
mirror ONLINE 0 0 0
gpt/system8 ONLINE 0 0 0
gpt/system4 ONLINE 0 0 0 1.79G resilvered
errors: No known data errors
And we’re done. Reboot one last time to check everything worked.
One final task not relevant to the mirroring is to mount the scratch disk area.
Create a mountpoint
# mkdir /scratch
And a pool
# zpool create -O mountpoint=/scratch -O atime=off -O setuid=off scratch /dev/gpt/scratch8
This filesystem has no redundancy, as previously mentioned. (edit: I am told that the mkdir and mountpoint are both redundant – zfs will create the directory as needed, and uses the pool name as the mount point by default)
In the next installment I will fail and replace one of the disks.
Edit:
daily_status_zfs_enable="YES"
daily_status_gmirror_enable="YES"
should be added to /etc/periodic.conf
so checks are added to the daily mails.