A FreeBSD install for ZFS root script,support RAID0/1/5/10 and 4k alignment

iceblood · Nov 24, 2012

This is a FreeBSD install for ZFS root script.
https://code.google.com/p/iceblood/source/browse/FreeBSD_ZFS

Welcome to test.

boot from CD-ROM
select "Shell"

Setting network:

Code:

ifconfig ETH x.x.x.x netmask 255.255.255.0
route add default x.x.x.y
mkdir /tmp/bsdinstall_etc
echo 'nameserver 8.8.8.8' > /etc/resolv.conf

Download script:

Code:

cd /tmp
fetch [url]http://iceblood.googlecode.com/svn/FreeBSD_ZFS/freebsd_zfs_inst.sh[/url]
chmod 555 freebsd_zfs_inst.sh

run script:

Code:

./freebsd_zfs_inst.sh
freebsd_zfs_inst.sh {normal|raid1|raid5|raid10}
normal  <---- stripe mode
raid1  <---- mirror mode
raid5  <---- raidz1 mode
raid10   <---- mirror and stripe mode

thanks for Sebulon.

vermaden · Nov 25, 2012

If You would like to tweak You script to use the ZFS Boot Environments with sysutils/beadm, then check instructions from this howto (generally instructions from this howto can be directly put into the script): http://forums.freebsd.org/showthread.php?t=31662

Also, think about adding mirror option for Your script, and RAID10 possibly.

gkontos · Nov 25, 2012

You are using wrong gpart() values.
You are missing the 4K alignment.
Do not hardcode any tuning.
Do not play with vm.kmem_size

wblock@ · Nov 25, 2012

iceblood said:
Code:

echo 'nameserver 8.8.8.8' > /etc/resolv.conf

The privacy implications of using Google's public DNS server ought to be mentioned.

Code:
Code:

chmod 555 freebsd_zfs_inst.sh

Why not just chmod +x ?

iceblood · Nov 26, 2012

wblock@ said:
The privacy implications of using Google's public DNS server ought to be mentioned.

Why not just chmod +x ?

This is a temp DNS,reboot after clear

iceblood · Nov 26, 2012

now add raid10 done.
./freebsd_zfs_inst.sh raid10

Sebulon · Nov 26, 2012

gkontos said:
Do not play with vm.kmem_size

In general yes, but in this case I think it falls within best practice though, since it is only applied if system is i386. And looks like it was basically copy/pasted from FreeBSD ZFS Wiki.

@iceblood

Nice work man! A few observations;

gpart has a flag to wipe a disk even if there are partitions on it; gpart destroy -F
I usually use tmpfs for /tmp, and youÂ´d need an entry in fstab for that.
These tunings are unnecessary for amd64, in my opinion:
Code:
```
vfs.zfs.prefetch_disable=0
vfs.zfs.vdev.cache.size="10M"
```
Leave that to the OS to automatically tune them on amd64.
As gkontos said, you need to partition 4k aligned. And use the "gnop-trick" to get ashift=12 on every vdev in the pool. Otherwise performance will be severly crippled for people with 4k(Advanced Format) drives.
You should use labels instead of the partition names when creating the pool.
I may be alone on this, but I follow the Solaris way to create first a root filesystem for / in every pool, like pool/root/usr, pool/root/var, etc. I canÂ´t remember exactly from where I read it, definitely a Sun/Oracle ZFS document (maybe the admin guide) that it should be considered bad practice using that top(pool) filesystem for anything, but IÂ´ve forgotten why. Might have been because there are values that you cannot change, or isnÂ´t there on the top(pool) filesystem.
Also, like Solaris, I use a more generic name for the pool, like "system", "rpool", or "pool0".

@vermaden

Would creating a separate root filesystem be interfering with the Boot Environments philosophy? Just checking...

/Sebulon

vermaden · Nov 26, 2012

Sebulon said:
@vermaden

Would creating a separate root filesystem be interfering with the Boot Environments philosophy? Just checking...

For Boot Environments You need that schema: ${POOL}/ROOT/${BENAME}, and You need to boot from that pool using the bootfs=${POOL}/ROOT/${BENAME} property set to that. Of course that is changed by the beadm script for different BE.

You can of course add these AFTER the installation, even if the root (/) was placed directly on zroot for example. I have made beadm smart enought, so You can zfs send|zfs recv the BE from other system and as well from local system and after activation, it will just work (beadm takes care about /boot/zfs/zpool.cache thingy).

But IMHO its far more better to setup that from the start. Boot Environments (even in its limited - without boot menu form) best thing since sliced bread, You can do EVERYTHING to the working system and have a time machine that will take You back if You mess something.

iceblood · Nov 26, 2012

Sebulon said:
In general yes, but in this case I think it falls within best practice though, since it is only applied if system is i386. And looks like it was basically copy/pasted from FreeBSD ZFS Wiki.

Oh...is not copy or pasted,the value only for i386, and from my experience.

Sebulon said:
Nice work man! A few observations;

gpart has a flag to wipe a disk even if there are partitions on it; gpart destroy -F

I usually use tmpfs for /tmp, and youÂ´d need an entry in fstab for that.

These tunings are unnecessary for amd64, in my opinion:

Code:

vfs.zfs.prefetch_disable=0 vfs.zfs.vdev.cache.size="10M"

Leave that to the OS to automatically tune them on amd64.

As gkontos said, you need to partition 4k aligned. And use the "gnop-trick" to get ashift=12 on every vdev in the pool. Otherwise performance will be severly crippled for people with 4k(Advanced Format) drives.

You should use labels instead of the partition names when creating the pool.

I may be alone on this, but I follow the Solaris way to create first a root filesystem for / in every pool, like pool/root/usr, pool/root/var, etc. I canÂ´t remember exactly from where I read it, definitely a Sun/Oracle ZFS document (maybe the admin guide) that it should be considered bad practice using that top(pool) filesystem for anything, but IÂ´ve forgotten why. Might have been because there are values that you cannot change, or isnÂ´t there on the top(pool) filesystem.

Also, like Solaris, I use a more generic name for the pool, like "system", "rpool", or "pool0".

Thanks for the observations.
about 4k alignment,I gradually improve.
about tmpfs I do not agree with.

Sebulon · Nov 27, 2012

vermaden said:
For Boot Environments You need that schema: ${POOL}/ROOT/${BENAME}...

Sweet! So it was quite the opposite then

/Sebulon

iceblood · Nov 28, 2012

now 4k alignment added.

Sebulon · Nov 28, 2012

iceblood said:
now 4k alignment added.

More like "now 4k alignment added?"

Because I could only see it added in one case called "normal)", that should instead be called "stripe)". ThereÂ´s nothing normal with creating a striped raid, you have no redundancy whatsoever. The slightest error on any of the disks in the pool and you would be toast. Stripe is dangerous and therefore should be clear as to what it is you are choosing.

ItÂ´s good you used gnop to create a 4k provider but there are a few more steps that needs to be done to have it truly 4k optimized. And use labels when creating the pool. Let me give you an example:

# gpart create -s gpt da0
# gpart create -s gpt da1
# gpart create -s gpt da2
# gpart create -s gpt da3
# gpart add -t freebsd-boot -s 64k da0
# gpart add -t freebsd-boot -s 64k da1
# gpart add -t freebsd-boot -s 64k da2
# gpart add -t freebsd-boot -s 64k da3
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da0
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da1
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da2
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da3
# gpart add -t freebsd-zfs -l disk0 -b 2048 -a 4k da0
# gpart add -t freebsd-zfs -l disk1 -b 2048 -a 4k da1
# gpart add -t freebsd-zfs -l disk2 -b 2048 -a 4k da2
# gpart add -t freebsd-zfs -l disk3 -b 2048 -a 4k da3
# gnop create -S 4096 /dev/gpt/disk0
# gnop create -S 4096 /dev/gpt/disk2
# zpool create -o autoexpand=on pool0 mirror gpt/disk0.nop gpt/disk1 mirror gpt/disk2.nop gpt/disk3
# zpool export pool0
# gnop destroy /dev/gpt/disk0.nop
# gnop destroy /dev/gpt/disk2.nop
# zpool import -d /dev/gpt/ pool0

This example was for a RAID10, and I showed you this because you only need to have gnop'ed every first disk in each vdev, since the ashift value is set on vdev basis. But since youÂ´re using a "for" it might be easier just to do them all; your call.

/Sebulon

iceblood · Nov 28, 2012

all add 4k alignment at yesterday.
Please see 78 line.

iceblood · Nov 28, 2012

Must be " -a 4K " is effective?

Sebulon · Nov 28, 2012

iceblood said:
Must be " -a 4K " is effective?

Yes. " -l diskX -b 2048 -a 4k ".

ashift != alignment. But it needs to be used as well.

/Sebulon

iceblood · Nov 28, 2012

Oh..I see.
thanks.add at late.

iceblood · Nov 28, 2012

Must be "-l diskX" too?

Sebulon · Nov 28, 2012

iceblood said:
Must be "-l diskX" too?

https://forums.freebsd.org/showpost.php?p=198755&postcount=12

Read that again, two more times

/Sebulon

jem · Nov 28, 2012

It dismays me to see yet another FreeBSD-on-ZFS installation article promoting the use of the top-level dataset for the root filesystem.

It's bad practice and seriously limits flexibility to change things around later.

Sun used the dataset 'rpool/ROOT/solaris' for the root filesystem in Solaris for a good reason.

Sebulon · Nov 28, 2012

@jem

IÂ´m not alone, yeay

/Sebulon

iceblood · Nov 28, 2012

But i think "-l" is not must.because it is lable.

gkontos · Nov 28, 2012

jem said:
It dismays me to see yet another FreeBSD-on-ZFS installation article promoting the use of the top-level dataset for the root filesystem.

It's bad practice and seriously limits flexibility to change things around later.

Sun used the dataset 'rpool/ROOT/solaris' for the root filesystem in Solaris for a good reason.

Solaris uses BE so placing the root pool in the top level is not possible. Do you think there are other limitations in this practice?

Sebulon · Nov 28, 2012

iceblood said:
But i think "-l" is not must.because it is lable.

It is very bad practice using the raw partition names.

Imagine a person has:

HDD0 -> ada0
HDD1 -> ada1
HDD2 -> ada2

Then HDD0 dies and for whatever reason person reboots server. The pool relation now looks like:

HHD0 -> dead
HDD1 -> ada0
HDD2 -> ada1

Now, in worst case, ZFS will respond like "WTF just happened!?!?", refuse to boot and curls up in a fetal position, feeling very sorry for itself.

This is just one scenario.

If person had used labels (as he should have), it would instead have looked like:

HDD0 -> disk0(ada0)
HDD1 -> disk1(ada1)
HDD2 -> disk2(ada2)

Then HDD0 dies and for whatever reason person reboots server. But this time, the pool relation is unchanged:

HHD0 -> disk0,dead
HDD1 -> disk1(ada0)
HDD2 -> disk2(ada1)

ZFS happy

Now, ZFS is supposed to have mechanisms to prevent these sort of errors but IÂ´ve seen that fail. Better safe than sorry.

/Sebulon

iceblood · Nov 28, 2012

Oh...I see.thanks for your advise.

jem · Nov 28, 2012

gkontos said:
Solaris uses BE so placing the root pool in the top level is not possible. Do you think there are other limitations in this practice?

It's often cited as good practice to keep your system files and data files logically separate. On a single-pool ZFS system, this means keeping them in different branches of your ZFS hierarchy, but if you're using the top level dataset then you can't do that.

Take the following example of a dataset hierarchy:

Code:

rpool				(container dataset - not mounted)
rpool/ROOT			(container dataset - not mounted)
rpool/ROOT/freebsd		OS root filesystem - mounted at /
rpool/ROOT/freebsd/usr		OS /usr filesystem
rpool/ROOT/freebsd/var		OS /var filesystem
rpool/DATA			(container dataset - not mounted)
rpool/DATA/home			Home directory container, mounted at /home
rpool/DATA/home/joe		Joe's homedir
rpool/DATA/mediafiles		Media files, music, movies etc
rpool/DATA/database		MySQL files
rpool/DATA/www			Webserver content
rpool/SWAP			zvol for swapspace

Here, the top-level rpool dataset isn't used for storing any files. It's just a container for more datasets. Likewise, rpool/ROOT and rpool/DATA are also containers. These containers split the ZFS hierarchy into two main branches and allow you to manage them more independently of eachother.

Now if I want to make a recursive snapshot of only my OS files, I can 'zfs snapshot -r rpool/ROOT@snapname' and my data files aren't touched.

I could also ZFS send all my data files by recursively sending rpool/DATA, without including any OS files.

If I want to install a new version of FreeBSD alongside the existing version and switch between them, I can create a new rpool/ROOT/freebsd10 dataset alongside rpool/ROOT/freebsd and install to that. That wouldn't be possible if I had used the top-level dataset for my OS root filesystem.

In general, it just eases management and flexibility to do things this way, and I suspect it's why Sun did it.

A FreeBSD install for ZFS root script,support RAID0/1/5/10 and 4k alignment

Attachments