Appending more storage space to the file system

pietrasm · Jul 25, 2013

Hi,

I have a 250 GB HDD in my home server. I need more space so I have just bought a new 4 TB HDD.

However, I am not sure how to add a new volume to the existing file system. Most of space on existing HDD is occupied by the home directory. I figured to just mount a new HDD to /usr/home. Is it a good idea? What's the best way to copy existing home directory to a partition on the new HDD? Is just using cp(1) a good idea?

Another issue is how to add a next HDD in the future? I would like to avoid mounting it to a different path and having two subtrees of the file system segmented between two HDDs. Is it better to use software RAID 0 or ZFS for this purpose? Will it be possible to create a RAID/ZFS volume out of two HDDs without loosing data on one of them? Maybe, is it a better idea to create such a volume with just one HDD now and add an another when it's needed?

Thanks.

wblock@ · Jul 25, 2013

Unless there is a good reason to keep using the old drive, I would just copy everything to the new drive. To set up a new drive, see Disk Setup On FreeBSD. To back up or copy the old drive onto the new drive, see Backup Options For FreeBSD.

Some other observations:

ZFS is software RAID.
RAID0 is faster than a single drive but at least twice as likely to fail. An SSD is much faster.
ZFS can probably grow a single-drive pool by adding another single drive (untested). That would be similar to RAID0, where there is no redundancy. A three-drive RAID-Z arrangement is better, allowing any single drive to fail without data loss.

pietrasm · Jul 26, 2013

wblock@ said:
Unless there is a good reason to keep using the old drive, I would just copy everything to the new drive. To set up a new drive, see Disk Setup On FreeBSD. To back up or copy the old drive onto the new drive, see Backup Options For FreeBSD.

The 250 GB HDD is a mid-end grade HDD. That's why I would like to keep it and use the cheaper 4 TB HDD only for home directories.

wblock@ said:
Some other observations:

ZFS is software RAID.

Is there any ZFS-independtent software RAID implementation?

wblock@ said:
RAID0 is faster than a single drive but at least twice as likely to fail. An SSD is much faster.

I dont' care about speed. The most important thing to me is to have one logical drive build on top of a few HDDs. I would like to avoid a need for distributing files between HDDs manually.

wblock@ said:
ZFS can probably grow a single-drive pool by adding another single drive (untested). That would be similar to RAID0, where there is no redundancy. A three-drive RAID-Z arrangement is better, allowing any single drive to fail without data loss.

If I have 3 X 4 TB HDDs how much usable storage I can obtain in 3-drive setup?

Thanks.

kpa · Jul 26, 2013

RAID-Z is pretty slow unless you can stripe together multiple RAID-Z vdevs which would mean at least 6 disks to be efficient and redundant enough. If I were you I would get one more disk and do 2x2-disk mirror vdevs. Those would give you 8 TBs of storage and the performance would be more than acceptable.

wblock@ · Jul 26, 2013

pietrasm said:
The 250 GB HDD is a mid-end grade HDD. That's why I would like to keep it and use the cheaper 4 TB HDD only for home directories.

Is there any ZFS-independtent software RAID implementation?

Yes: gmirror(8), gstripe(8), gconcat(8).

I dont' care about speed. The most important thing to me is to have one logical drive build on top of a few HDDs. I would like to avoid a need for distributing files between HDDs manually.

The risk is that a single drive failure could make data on the other drives inaccessible and unrecoverable.

If I have 3 X 4 TB HDDs how much usable storage I can obtain in 3-drive setup?

RAIDZ with three drives gives 2/3 the total amount of space for data, so 8 TB.

pietrasm · Jul 26, 2013

kpa said:
RAID-Z is pretty slow unless you can stripe together multiple RAID-Z vdevs which would mean at least 6 disks to be efficient and redundant enough. If I were you I would get one more disk and do 2x2-disk mirror vdevs. Those would give you 8 TBs of storage and the performance would be more than acceptable.

I don't get it. Do you mean to build two ZFS's vdevs per HDD and than use all four of them as one logical drive without any redundancy?

wblock@ said:
Yes: gmirror(8), gstripe(8), gconcat(8).

The risk is that a single drive failure could make data on the other drives inaccessible and unrecoverable.

I am aware of this fact.

What about extending either ZFS or RAID 0 with more HDDs in the future without a need for backing up all data to another drive, recreating an array and restoring data from a backup? I read that it's possible for ZFS but not implemented yet. What about software RAID 0?

kpa · Jul 26, 2013

I meant using two disks for each vdev in mirror configuration and then stripe them together into a single pool. This would give RAID 1 redundancy for each vdev. This is how it would be done with zpool(8)

zpool create tank mirror ada0 ada1 mirror ada2 ada3

Assuming that ada0 trough ada3 are the four individual disks. There are however some issues with newer disks that use 4096 byte sectors but don't tell to the OS about it. With those drives, and I'm quite sure that your 4 TB disks are such drives, it is necessary to use proper alignment and sector size when creating the ZFS pool. Search the forums for details, I don't have a good link right now.

pietrasm · Jul 26, 2013

kpa said:
I meant using two disks for each vdev in mirror configuration and then stripe them together into a single pool. This would give RAID1 redundancy for each vdev. This is how it would be done with zpool(8)

zpool create tank mirror ada0 ada1 mirror ada2 ada3

Assuming that ada0 trough ada3 are the four individual disks. There are however some issues with newer disks that use 4096 byte sectors but don't tell to the OS about it. With those drives, and I'm quite sure that your 4TB disks are such drives, it is necessary to use proper alignment and sector size when creating the ZFS pool. Search the forums for details, I don't have a good link right now.

As far as I understand, you suggest a solution that uses four 4 TB HDDs. I have only one 4 TB HDD.

After some more reading I would go for gconcat(8). It seems like one drive failure causes only lose of files on this drive. How are files spread between drives when using concat? Can I rely on a fact that most of files won't be segmented between more than one drive? Is it possible to extend concat volume by appending more HDDs without losing data?

Thanks.

wblock@ · Jul 26, 2013

If one drive in a gconcat(8) array fails, it's effectively the same as a portion of a single hard drive failing. It may not just be file contents that are lost. If the failure happens in an important part of the filesystem, directory indexes and pointers to the files could be lost. So the files are still there, there's just no way to locate them.

pietrasm · Jul 26, 2013

wblock@ said:
If one drive in a gconcat(8) array fails, it's effectively the same as a portion of a single hard drive failing. It may not just be file contents that are lost. If the failure happens in an important part of the filesystem, directory indexes and pointers to the files could be lost. So the files are still there, there's just no way to locate them.

It doesn't sound too bad for my needs. I think I will go for it. Is there any way to locate and backup directory indexes and pointers to files or they are spread around the file system?

What about appending more HDDs?

Finally, what's the best way to copy /usr/home directory to a new HDD? Is using cp(1) sufficient to ensure that everything will be exactly copied including access rights, symlinks etc.?

Thanks.

wblock@ · Jul 26, 2013

UFS has some backup directory information that might help in recovering after a fail. Too often, it does not. The failure rates in big hard drives is one of the things that is driving the acceptance of ZFS and RAID-Z.

cp(1) might be enough, with enough options. dump(8)/restore(8) as shown in the link in post #2 is better.

pietrasm · Jul 27, 2013

wblock@ said:
UFS has some backup directory information that might help in recovering after a fail. Too often, it does not. The failure rates in big hard drives is one of the things that is driving the acceptance of ZFS and RAID-Z.

cp(1) might be enough, with enough options. dump(8)/restore(8) as shown in the link in post #2 is better.

Thanks for all the help guys.

I have just created an UFS filesystem on a new drive and I got just 3.5 TB of space:

Code:

root@Server:/dev # df -h
Filesystem     Size    Used   Avail Capacity  Mounted on
/dev/ada0p2    224G    203G    3.3G    98%    /
devfs          1.0k    1.0k      0B   100%    /dev
/dev/ada1p1    3.5T    8.0k    3.2T     0%    /mnt
root@Server:/dev #

Why is that? How can I fix it?

Thanks.

wblock@ · Jul 27, 2013

That drive started with 4,000,000,000,000 bytes of space, or about 3.6T. A typical newfs reserves 8% of space, leaving 3.3T by my calculations. I don't know why it shows 3.5T.

If you did not align the first partition on that drive, writes will be slow.

pietrasm · Jul 27, 2013

wblock@ said:
That drive started with 4,000,000,000,000 bytes of space, or about 3.6T. A typical newfs reserves 8% of space, leaving 3.3T by my calculations. I don't know why it shows 3.5T.

It shows 3.2 TB as available space. I guess that is correct.

If you did not align the first partition on that drive, writes will be slow.

I did it exactly like it's described in the Handbook:
http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/disks-adding.html
What are benefits of aligning the first partition? How can I do this?

Thanks.

wblock@ · Jul 27, 2013

pietrasm said:
It shows 3.2 TB as available space. I guess that is correct.

I did it exactly like it's described in the Handbook:
http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/disks-adding.html

That is correct as far as it goes. I actually rewrote that section recently, and left alignment out because it would confuse the issue.

What are benefits of aligning the first partition? How can I do this?

The benefit is that writes will go as fast as the drive can go. If partitions are not aligned, writes can take twice as long. (Write one 4K block to an aligned partition, and it writes a single block on the drive. Misaligned, part of that data is in one disk block, and part in another. So the drive has to read the first block, modify it, then write it back out, then repeat for the second.)

Please show the output from gpart show ada1. To be aligned, the data partition must start at an even multiple of 4K. Drives usually pretend to have 512-byte partitions even when they are really 4K. The primary GPT table takes 34 512-byte blocks. The next aligned spot is at block 40. I suggest starting the first data partition at 1M, or block 2048, for compatibility with other operating systems.

Seagate has a patented method to avoid the need for alignment. Other brands do not.

pietrasm · Jul 27, 2013

wblock@ said:
That is correct as far as it goes. I actually rewrote that section recently, and left alignment out because it would confuse the issue.

The benefit is that writes will go as fast as the drive can go. If partitions are not aligned, writes can take twice as long. (Write one 4K block to an aligned partition, and it writes a single block on the drive. Misaligned, part of that data is in one disk block, and part in another. So the drive has to read the first block, modify it, then write it back out, then repeat for the second.)

Please show the output from gpart show ada1. To be aligned, the data partition must start at an even multiple of 4K. Drives usually pretend to have 512-byte partitions even when they are really 4K. The primary GPT table takes 34 512-byte blocks. The next aligned spot is at block 40. I suggest starting the first data partition at 1M, or block 2048, for compatibility with other operating systems.

Seagate has a patented method to avoid the need for alignment. Other brands do not.

Code:

pietrasm@Server /u/h/pietrasm> gpart show
=>       34  488397101  ada0  GPT  (232G)
         34        128     1  freebsd-boot  (64k)
        162  478150528     2  freebsd-ufs  (228G)
  478150690    8388608     3  freebsd-swap  (4.0G)
  486539298    1857837        - free -  (907M)

=>        34  7814037101  ada1  GPT  (3.7T)
          34           6        - free -  (3.0k)
          40  7814037088     1  freebsd-ufs  (3.7T)
  7814037128           7        - free -  (3.5k)

pietrasm@Server /u/h/pietrasm>

It seems to be correct for the new HDD as it starts at block 40. However, it looks like partitions on the main HDD are not aligned correctly.

The first drive is a Seagate as well (Seagate Barracuda VB0250EAVER HPG7). Does it mean that alignment doesn't matter?

Thanks.

wblock@ · Jul 27, 2013

Excellent! As far as I know, only drives 1T or larger use 4K blocks, meaning partitions on the smaller drive do not need any special alignment.

pietrasm · Jul 27, 2013

wblock@ said:
Excellent! As far as I know, only drives 1T or larger use 4K blocks, meaning partitions on the smaller drive do not need any special alignment.

That's great, thanks. How can I check block size just to be sure?

Thanks.

wblock@ · Jul 27, 2013

diskinfo -v ada0 | grep stripesize is a quick way.

pietrasm · Jul 27, 2013

wblock@ said:
diskinfo -v ada0 | grep stripesize is a quick way.

Code:

pietrasm@Server /u/h/pietrasm> diskinfo -v ada0 | grep stripesize
	0           	# stripesize
pietrasm@Server /u/h/pietrasm> diskinfo -v ada1 | grep stripesize
	4096        	# stripesize
pietrasm@Server /u/h/pietrasm>

What does zero mean? Does it mean that any alignment isn't necessary?

By the way, there is a huge amount of free space at the end of the first HDD. I have no idea why it's like this. Is there any way to shift swap partition and expand UFS partition?

Thanks again.

wblock@ · Jul 28, 2013

Zero would mean the stripesize is not any different than the sectorsize. Look at the full output of diskinfo -v ada0.

907M at the end of that disk is not really that much, or not enough to make it worth juggling partitions around.

pietrasm · Jul 28, 2013

wblock@ said:
Zero would mean the stripesize is not any different than the sectorsize. Look at the full output of diskinfo -v ada0.

907M at the end of that disk is not really that much, or not enough to make it worth juggling partitions around.

Thanks, it's clear to me now.

I know that it's not worth to do it to recover 907 MB but I think I would like to try to learn something new. Can I just delete the swap partition and then recreate it at the end of HDD? Then just expand the root partition?

wblock@ · Jul 28, 2013

Yes, that can be done with growfs(8). The result is usable but not exactly what would be there if it had been created at that size. I would back up, repartition, and restore. The links in post #2 show both.

pietrasm · Jul 28, 2013

wblock@ said:
Yes, that can be done with growfs(8). The result is usable but not exactly what would be there if it had been created at that size. I would back up, repartition, and restore. The links in post #2 show both.

I don't think it's a good idea to do this on live system. Can I reboot to single-user mode or some other mode in order to preform repartitioning?

Thanks.

wblock@ · Jul 28, 2013

Repartitioning can't be done on a system with mounted partitions. mfsBSD is handy to use for this kind of thing.