A FreeBSD install for ZFS root script,support RAID0/1/5/10 and 4k alignment

This is a FreeBSD install for ZFS root script.
https://code.google.com/p/iceblood/source/browse/FreeBSD_ZFS

Welcome to test.

  1. boot from CD-ROM
  2. select "Shell"
  3. Setting network:
    Code:
    ifconfig ETH x.x.x.x netmask 255.255.255.0
    route add default x.x.x.y
    mkdir /tmp/bsdinstall_etc
    echo 'nameserver 8.8.8.8' > /etc/resolv.conf
  4. Download script:
    Code:
    cd /tmp
    fetch [url]http://iceblood.googlecode.com/svn/FreeBSD_ZFS/freebsd_zfs_inst.sh[/url]
    chmod 555 freebsd_zfs_inst.sh
  5. run script:
    Code:
    ./freebsd_zfs_inst.sh
    freebsd_zfs_inst.sh {normal|raid1|raid5|raid10}
    normal  <---- stripe mode
    raid1  <---- mirror mode
    raid5  <---- raidz1 mode
    raid10   <---- mirror and stripe mode
attachment.php

thanks for Sebulon.
 

Attachments

  • zfs.png
    zfs.png
    4.3 KB · Views: 4,523
gkontos said:
  • Do not play with vm.kmem_size

In general yes, but in this case I think it falls within best practice though, since it is only applied if system is i386. And looks like it was basically copy/pasted from FreeBSD ZFS Wiki.

@iceblood

Nice work man! A few observations;

  • gpart has a flag to wipe a disk even if there are partitions on it; gpart destroy -F
  • I usually use tmpfs for /tmp, and you´d need an entry in fstab for that.
  • These tunings are unnecessary for amd64, in my opinion:
    Code:
    vfs.zfs.prefetch_disable=0
    vfs.zfs.vdev.cache.size="10M"
    Leave that to the OS to automatically tune them on amd64.
  • As gkontos said, you need to partition 4k aligned. And use the "gnop-trick" to get ashift=12 on every vdev in the pool. Otherwise performance will be severly crippled for people with 4k(Advanced Format) drives.
  • You should use labels instead of the partition names when creating the pool.
  • I may be alone on this, but I follow the Solaris way to create first a root filesystem for / in every pool, like pool/root/usr, pool/root/var, etc. I can´t remember exactly from where I read it, definitely a Sun/Oracle ZFS document (maybe the admin guide) that it should be considered bad practice using that top(pool) filesystem for anything, but I´ve forgotten why:). Might have been because there are values that you cannot change, or isn´t there on the top(pool) filesystem.
  • Also, like Solaris, I use a more generic name for the pool, like "system", "rpool", or "pool0".

@vermaden

Would creating a separate root filesystem be interfering with the Boot Environments philosophy? Just checking...

/Sebulon
 
Sebulon said:
@vermaden

Would creating a separate root filesystem be interfering with the Boot Environments philosophy? Just checking...
For Boot Environments You need that schema: ${POOL}/ROOT/${BENAME}, and You need to boot from that pool using the bootfs=${POOL}/ROOT/${BENAME} property set to that. Of course that is changed by the beadm script for different BE.

You can of course add these AFTER the installation, even if the root (/) was placed directly on zroot for example. I have made beadm smart enought, so You can zfs send|zfs recv the BE from other system and as well from local system and after activation, it will just work (beadm takes care about /boot/zfs/zpool.cache thingy).

But IMHO its far more better to setup that from the start. Boot Environments (even in its limited - without boot menu form) best thing since sliced bread, You can do EVERYTHING to the working system and have a time machine that will take You back if You mess something.
 
Sebulon said:
In general yes, but in this case I think it falls within best practice though, since it is only applied if system is i386. And looks like it was basically copy/pasted from FreeBSD ZFS Wiki.
Oh...is not copy or pasted,the value only for i386, and from my experience.

Sebulon said:
Nice work man! A few observations;

  • gpart has a flag to wipe a disk even if there are partitions on it; gpart destroy -F
  • I usually use tmpfs for /tmp, and you´d need an entry in fstab for that.
  • These tunings are unnecessary for amd64, in my opinion:
    Code:
    vfs.zfs.prefetch_disable=0
    vfs.zfs.vdev.cache.size="10M"
    Leave that to the OS to automatically tune them on amd64.
  • As gkontos said, you need to partition 4k aligned. And use the "gnop-trick" to get ashift=12 on every vdev in the pool. Otherwise performance will be severly crippled for people with 4k(Advanced Format) drives.
  • You should use labels instead of the partition names when creating the pool.
  • I may be alone on this, but I follow the Solaris way to create first a root filesystem for / in every pool, like pool/root/usr, pool/root/var, etc. I can´t remember exactly from where I read it, definitely a Sun/Oracle ZFS document (maybe the admin guide) that it should be considered bad practice using that top(pool) filesystem for anything, but I´ve forgotten why:). Might have been because there are values that you cannot change, or isn´t there on the top(pool) filesystem.
  • Also, like Solaris, I use a more generic name for the pool, like "system", "rpool", or "pool0".
Thanks for the observations.
about 4k alignment,I gradually improve.
about tmpfs I do not agree with.
 
iceblood said:
now 4k alignment added.

More like "now 4k alignment added?"

Because I could only see it added in one case called "normal)", that should instead be called "stripe)". There´s nothing normal with creating a striped raid, you have no redundancy whatsoever. The slightest error on any of the disks in the pool and you would be toast. Stripe is dangerous and therefore should be clear as to what it is you are choosing.

It´s good you used gnop to create a 4k provider but there are a few more steps that needs to be done to have it truly 4k optimized. And use labels when creating the pool. Let me give you an example:

# gpart create -s gpt da0
# gpart create -s gpt da1
# gpart create -s gpt da2
# gpart create -s gpt da3
# gpart add -t freebsd-boot -s 64k da0
# gpart add -t freebsd-boot -s 64k da1
# gpart add -t freebsd-boot -s 64k da2
# gpart add -t freebsd-boot -s 64k da3
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da0
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da1
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da2
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da3
# gpart add -t freebsd-zfs -l disk0 -b 2048 -a 4k da0
# gpart add -t freebsd-zfs -l disk1 -b 2048 -a 4k da1
# gpart add -t freebsd-zfs -l disk2 -b 2048 -a 4k da2
# gpart add -t freebsd-zfs -l disk3 -b 2048 -a 4k da3
# gnop create -S 4096 /dev/gpt/disk0
# gnop create -S 4096 /dev/gpt/disk2
# zpool create -o autoexpand=on pool0 mirror gpt/disk0.nop gpt/disk1 mirror gpt/disk2.nop gpt/disk3
# zpool export pool0
# gnop destroy /dev/gpt/disk0.nop
# gnop destroy /dev/gpt/disk2.nop
# zpool import -d /dev/gpt/ pool0

This example was for a RAID10, and I showed you this because you only need to have gnop'ed every first disk in each vdev, since the ashift value is set on vdev basis. But since you´re using a "for" it might be easier just to do them all; your call.

/Sebulon
 
It dismays me to see yet another FreeBSD-on-ZFS installation article promoting the use of the top-level dataset for the root filesystem.

It's bad practice and seriously limits flexibility to change things around later.

Sun used the dataset 'rpool/ROOT/solaris' for the root filesystem in Solaris for a good reason.
 
jem said:
It dismays me to see yet another FreeBSD-on-ZFS installation article promoting the use of the top-level dataset for the root filesystem.

It's bad practice and seriously limits flexibility to change things around later.

Sun used the dataset 'rpool/ROOT/solaris' for the root filesystem in Solaris for a good reason.

Solaris uses BE so placing the root pool in the top level is not possible. Do you think there are other limitations in this practice?
 
iceblood said:
But i think "-l" is not must.because it is lable.

It is very bad practice using the raw partition names.

Imagine a person has:
  • HDD0 -> ada0
  • HDD1 -> ada1
  • HDD2 -> ada2

Then HDD0 dies and for whatever reason person reboots server. The pool relation now looks like:
  • HHD0 -> dead
  • HDD1 -> ada0
  • HDD2 -> ada1

Now, in worst case, ZFS will respond like "WTF just happened!?!?", refuse to boot and curls up in a fetal position, feeling very sorry for itself.

This is just one scenario.



If person had used labels (as he should have), it would instead have looked like:
  • HDD0 -> disk0(ada0)
  • HDD1 -> disk1(ada1)
  • HDD2 -> disk2(ada2)

Then HDD0 dies and for whatever reason person reboots server. But this time, the pool relation is unchanged:
  • HHD0 -> disk0,dead
  • HDD1 -> disk1(ada0)
  • HDD2 -> disk2(ada1)

ZFS happy:)

Now, ZFS is supposed to have mechanisms to prevent these sort of errors but I´ve seen that fail. Better safe than sorry.

/Sebulon
 
gkontos said:
Solaris uses BE so placing the root pool in the top level is not possible. Do you think there are other limitations in this practice?

It's often cited as good practice to keep your system files and data files logically separate. On a single-pool ZFS system, this means keeping them in different branches of your ZFS hierarchy, but if you're using the top level dataset then you can't do that.

Take the following example of a dataset hierarchy:

Code:
rpool				(container dataset - not mounted)
rpool/ROOT			(container dataset - not mounted)
rpool/ROOT/freebsd		OS root filesystem - mounted at /
rpool/ROOT/freebsd/usr		OS /usr filesystem
rpool/ROOT/freebsd/var		OS /var filesystem
rpool/DATA			(container dataset - not mounted)
rpool/DATA/home			Home directory container, mounted at /home
rpool/DATA/home/joe		Joe's homedir
rpool/DATA/mediafiles		Media files, music, movies etc
rpool/DATA/database		MySQL files
rpool/DATA/www			Webserver content
rpool/SWAP			zvol for swapspace

Here, the top-level rpool dataset isn't used for storing any files. It's just a container for more datasets. Likewise, rpool/ROOT and rpool/DATA are also containers. These containers split the ZFS hierarchy into two main branches and allow you to manage them more independently of eachother.

Now if I want to make a recursive snapshot of only my OS files, I can 'zfs snapshot -r rpool/ROOT@snapname' and my data files aren't touched.

I could also ZFS send all my data files by recursively sending rpool/DATA, without including any OS files.

If I want to install a new version of FreeBSD alongside the existing version and switch between them, I can create a new rpool/ROOT/freebsd10 dataset alongside rpool/ROOT/freebsd and install to that. That wouldn't be possible if I had used the top-level dataset for my OS root filesystem.

In general, it just eases management and flexibility to do things this way, and I suspect it's why Sun did it.
 
Back
Top