[FreeNAS] Data Drive with 4096B sectors

sub_mesa said:
- If you write 512 bytes, the HDD would have to read 3.5KiB, then recalculate checksum and write 4KiB+ECC. Much slower than just writing 512 bytes without reading anything; like regular 512-byte sector drives do. Same thing your filesystem has to do when it wants to write 1 byte or change 1 byte; it has to read the sector size (512 bytes) containing that single byte; then update it by writing 512 bytes instead.

This is a good explaination of what is goin on here.
 
reily_tump said:
thanks for the detailed tutorial. I'm facing the same issue with a new WD drive with advanced format. However I use geli to encrypt the device. Do I have to face the problem on this layer? Or is it sufficient to aligne the partition/label inside the geli container?

regards - reily

I am not familiar with geli at all but if the normal procedure is to use geli with an already partitioned and formatted drive then it is less likely that 4096b sectors will be a problem. But, if geli uses 512b sectors internally there may be a performance hit. That is a big "may" because I know nothing about geli. Hopefully others with more understanding of geli can help here. The only other issue that I can think of that could conceivably be a problem is with 64 byte integers due to the size of these drives. The drives greater than 1.5 terabytes require a 64 byte integer in order to represent the number of 512b sectors these drives have (as reported by the drives). There are several FreeBSD (and Unix in general) applications/utilities that cannot represent 64bit integers. For example, when I tried partitioning my 1.5 TB drive which reported 2930277168 512b sectors fdisk only saw it as 2147483647 which is the upper limit for 4 byte integers. I am guessing that this would only be a problem if geli used 512b sectors internally. Again, hopefully someone with more knowledge of the internal workings of geli can shed more light on this.
 
You can make geli use 4KiB sectors instead - this would increase encryption performance (encryption = per sector; so bigger sectors = less overhead). It would also make sure you are aligned whatever partitioning stuff you use! Because GELI makes your EARS disk with 512 bytes of *exposed* sector size; to a 4096 or 4KiB sector size. So now you're using both internal and 'external' 4KiB sectors; problem solved instantly.

Now you can partition, disklabel them, etc. Whatever - offset does not matter anymore. Any offset will be in 4KiB increments, so any offset will be aligned with the HDD's bigger sectors.

I wonder if there is a geom layer that can do the same thing (change provider sector size) without actually doing anything such as GELI; perhaps geom_nop?
 
# diskinfo -v ada2
Code:
ada2
        512             # sectorsize
        2000398934016   # mediasize in bytes (1.8T)
        3907029168      # mediasize in sectors
        0               # stripesize
        0               # stripeoffset
        3876021         # Cylinders according to firmware.
        16              # Heads according to firmware.
        63              # Sectors according to firmware.
        WD-WCAZA0385160 # Disk ident.
# gpart create -s mbr ada2
# gpart add -b 63 -s 3907029105 -t freebsd ada2
# gpart create -s bsd ada2s1
# gpart add -b 1 -s 3907029104 -t freebsd-ufs ada2s1
# gpart show
Code:
=>        63  3907029105  ada2  MBR  (1.8T)
          63  3907029105     1  freebsd  (1.8T)

=>         0  3907029105  ada2s1  BSD  (1.8T)
           0           1          - free -  (512B)
           1  3907029104       1  freebsd-ufs  (1.8T)

Have I correctly done the 64 sector align for 'Advanced Format' disk?
 
turb013, absolutely ununderstandable arithmetics.
Code:
63 * 512 = 32256
32256 + 512 = 32768
2930245632 - 32768 = 2930212864 <-- sector size of my 1.5TB drive)

Ok let's check

Code:
63 sectors * 512 bytes = 32254 bytes
32254 bytes + 512 bytes = 32766 bytes (63 sectors + 1 sector = 64 sectors)

Ok here is offset in bytes and sectors.

Next
I choose to use a block size of 32768
because it is evenly divisible by both 512 and 4096

Ok little bit strange but ok. Block size 32768 bytes. Yes?

Next
Code:
/dev/ad4
   512         # sector size
   15003191006      #media size in bytes (1.4T)
   2930277168      #media size in sectors
   2907021         #Cylinders according to firmware   
   16         #Heads according to firmware
   63         #Sectors according to firmware
   ad:WD-WMAVU1303392   #diskident

Code:
2930277168 sectors / 32768 bytes = 89424.96240234375 ???

What is it? :q I don't know. And all your next steps are wrong. Let's go to http://wdc.custhelp.com/app/answers/detail/a_id/5655. And read it very carefully.

Make sure that all partitions start on a multiple of 8 sectors (8x 512B = 4KB) and that partition sizes are multiples of 8 sectors. Make sure that there is space left at the start of partitions as required. For example on a boot drive, do not start at sector 0 as there needs to be space for the boot code. Sector 64 is a good start point or even 2048 which would be a 1MB boundary. Also extended partitions will need a gap between their start point and the first logical partition contained within them.

That's all! For your one partition you only need:

2930277168 sectors - 64 sectors = 2930277104 sectors - media size without offset.
2930277104 sectors / 8 = 366284638 - your partition already aligned.
Ok We need space for backup gpt.
2930277104 sectors - 64 sectors = 2930277040 sectors
double check
2930277040 sectors / 8 = 366284630 (evenetly!)
And final
Code:
gpt create ad4   <--- replace ad4 with your dive identifier
Code:
gpt add –b 64 –s 2930277040 –t ufs /dev/ad4   <--- replace ad4 with your dive identifier
Code:
newfs -S 4096 -b 32768 -f 4096 -O 2 -U -m 8 -o space -L datadrive /dev/ad4p1   <--- replace ad4 with your dive identifier

Finish. :beer
 
konstantin said:
63 sectors * 512 bytes = 32254 bytes
32254 bytes + 512 bytes = 32766 bytes (63 sectors + 1 sector = 64 sectors)

Your math is wrong here: 63 * 512 = 32256 not 32254

I am not sure what part of my calculations you are talking about. Why do I use 32768 for the start sector? Because we are trying to align to physical sectors. And, the alignment is on 4K boundaries. Although there are other places we could have started even Western Digital suggests sector 64 as a good start point. In section 3 of my tutorial I first divide my total sector size for the drive which was given by diskinfo as 2930277168 sectors by 32768 which gives me 89424.96240234375. I then took the integer part of this result which represents the actual number of 32768 Byte blocks which is evenly divisable. And, 89424 * 32768 = 2930245632. This is the largest size in sectors that can be evenly divided by 32768. But, to get the actual size of our partition we need to subtract the 32768 bytes included in sectors 0 to 64. As stated, sector 64 is our starting sector. This gives a total size for our partition in sectors of 2930212864.

As far as choosing a block size of 32768 it is just a convenient sise when using 4k sectors. I am sure there are other sizes that will work.

I chose 2930212864 as the partition size in sectors not just because it is divisable by 32768 (my block size) but also because it is evenly divisable by 4096 which is the physical sector size of the drive. My quess is that using a partition size of 2930277040 you will get misalignment at the end of the drive since 2930277040 is not evenly divisable by 4096. This is probably not a big issue though.
 
turb013 said:
konstantin said:

I am not sure what part of my calculations you are talking about. Why do I use 32768 for the start sector? Because we are trying to align to physical sectors. And, the alignment is on 4K boundaries. Although there are other places we could have started even Western Digital suggests sector 64 as a good start point. In section 3 of my tutorial I first divide my total sector size for the drive which was given by diskinfo as 2930277168 sectors by 32768 which gives me 89424.96240234375.

I also think that the problem is that you divide sectors by bytes. 32768 is the number of Bytes. You can't divide apples (sectors) by oranges (bytes). Your 32768 implies a blocksize of 16MByte (32768 * 512).

/agni
 
JFYI, I prepared patches for test:
1. http://people.freebsd.org/~ae/gpart_align.diff
2. http://people.freebsd.org/~ae/gpart_align_stable8.diff

The first one is for FreeBSD 9.0, second for 8-STABLE. Patch adds "-a alignment" option to gpart(8) utility. You can apply patch with this command:

Code:
# cd /usr/src/sbin/geom/class/part
# patch < /path/to/gpart_align.diff
# make all install

After that you will be able to use gpart add with -a, e.g.:
Code:
# gpart add -t freebsd-ufs -a 4k -s 5G ada0
 
CORRECTED - [Solved] [FreeNAS] Data Drive with 4096B sectors

Thank you to konstantin and agni for pointing out the errors in my tutorial. When I put together my tutorial I incorrectly copied my notes. I had been testing along the way to be sure that the number I was using for partition size was divisable by my block size. But I actually used bytes in my calculations. For those that used the origional tutorial you should not see any problems with performance with your partitions. They are still valid and lie on 4k boundaries. But, they are 3 blocks smaller. I have corrected it below. This is what I actually used for my drive. My drive(s) are only data drives. I do not boot from them. Sorry for the long post.

How to align, partition and format a drive on 4k boundaries using gpt and newfs

This tutorial is for increasing performance of the new hard drives with native 4k sectors. Specifically, it illustrates how to set up a data drive with only one GPT partition that is NOT bootable.

1) Get the physical characteristics of you drive from your drives firmware:

Code:
diskinfo -v ad4   <--- replace ad4 with your dive identifier

My drive is the 1.5 Western Digital Green model WD-15EARS

Here is the output for my drive using diskinfo:

Code:
/dev/ad4
   512                  #sector size
   1500301910016        #media size in bytes (1.4T)
   2930277168           #media size in sectors
   2907021              #Cylinders according to firmware   
   16                   #Heads according to firmware
   63                   #Sectors per track according to firmware
   ad:WD-WMAVU1303392   #diskident

2) Find the start sector which will fall on a 4k boundary:

The slice/partition does nor start at the beginning of the drive. With no offset i.e. offset of 0 the slice would start at (63 * 512 = 32256) which is not at a 4k boundary. But when you add an offset of 1 you get:

Code:
(63 (from fdisk) + 1 (here)) * 512 = 32768, which is on 4k boundary

3) Find the Size of disk in bytes which is evenly divisable by 32768 byte blocks and will fall on a 4k boundary. And, then subtract the starting sector from above:

The drive reports 1500301910016 total bytes. I decided to use 32768B blocks because they are evenly divisible by 512B and 4096B. Unfortunately 32768 does not go into 1500301910016 evenly:

Code:
1500301910016 / 32768 = 45785580.75

The next lower number of bytes that 32768 will divide into evenly is 45785580.

Code:
45785580 * 32768 = 1500301885440

Now we need to subtract the 32768 from 1500301885440 to get actual size of partition in bytes less the starting sector (which is 64 and 64 * 512 is 32768):

Code:
1500301885440 - 32768 =  1500301852672

So, 1500301852672 are the total number of bytes for the partition. But, we need to convert that to 512 byte sectors for GPT.

Code:
1500301852672 / 512 = 2930277056

So, 2930277056 is the total size of the partition in 512 byte sectors.

4) I recommend clearing sector zero before creating your partition. WARNING: This will destroy all data on your drive. Be sure to unmount the drive if you have it mounted. If you did not use GPT to create your current partition.

Use this to clear sector zero:

Code:
dd if=/dev/zero of=/dev/ad4 bs=1M count=10

If you used GPT to create your partition(s) use this:

Code:
gpt destroy ad4   <--- replace ad4 with your dive identifier

5) Create an empty GPT partition:

Code:
gpt create ad4   <--- replace ad4 with your dive identifier

6) Align and create your partition:

Code:
gpt add –b 64 –s 2930277056 –t ufs /dev/ad4   <--- replace ad4 with your dive identifier

Note: If you get an error like:

Code:
The secondary GPT table is corrupt or invalid. Using the primary only -- recovery suggested

This may occur if you have attempted to partition your drive unsuccessfully or possibly for other reasons. To fix this type:

Code:
gpt recover ad4  <--- replace ad4 with your dive identifier

7) Format and label your drive:
Code:
newfs -S 4096 -b 32768 -f 4096 -O 2 -U -m 8 -o space -L datadrive /dev/ad4   <--- replace ad4 with your dive identifier

Note: The newfs example above uses UFS2 as the file system ("-O 2"). If you want to use UFS1 then use "-O 1".
 
Back
Top