HW Raid and partition alignment

Hi,

I'm testing a server with a hardware raid controller 3ware 9550sxu-8lp. The arrays created are a 5 disks raid5 or 6 disks raid10 with a 64k stripe size.

I'm want to align partitions in the right way. I'm following this tutorial http://forums.freebsd.org/showpost.php?p=76148&postcount=38.

Is this tutorial adapted for a raid array like mine ?

Reading the tutorial, i'm unsure about some steps.

I don't see the fdisk step. isn't it necessary ?

About the bsdlabel step , the tutorials says :
Code:
bsdlabel -R  /dev/ad4 datadrive.cfg

should'n it be ?

Code:
bsdlabel -R  /dev/ad4[color="Red"]s1[/color] datadrive.cfg

And the last one, says :

Code:
newfs -S 4096 -b 32768 -f 4096 -O 2 -U -m 8 -o space -L datadrive /dev/ad4

shouldn't it be ?

Code:
newfs -S 4096 -b 32768 -f 4096 -O 2 -U -m 8 -o space -L datadrive /dev/ad4[color="#ff0000"]s1a[/color]

I'm asking because usually I add a drive to a freebsd system this way : fdisk, bsdlabel and newfs. Am I wrong ?

Thanks,
 
There are two ways to use disks. A dedicated and a slice mode.

"Using Slices: This setup will allow your disk to work correctly with other operating systems that might be installed on your computer and will not confuse other operating systems' fdisk utilities. It is recommended to use this method for new disk installs. Only use dedicated mode if you have a good reason to do so!"

http://www.freebsd.org/doc/en/books/handbook/disks-adding.html

Search for "Dangerously Dedicated Disk" in this forum for more informations.
 
Yes, i've read about dedicated mode. I've read also that dedicated mode will be removed from futur releases.

The array i'm adding will be used only by a freebsd system. I'm not against using it in dedicated mode but if it won't be supported in the futur, should'nt i stay in the sliced one ?

Is there a difference in term of performance between sliced and dedicated mode ?
 
kisscool-fr said:
Yes, i've read about dedicated mode. I've read also that dedicated mode will be removed from futur releases.

This means to me that only slices are safe to work with if i want to upgrade system later.

kisscool-fr said:
The array i'm adding will be used only by a freebsd system. I'm not against using it in dedicated mode but if it won't be supported in the futur, should'nt i stay in the sliced one ?

I never used the dedicated mode. Just set up your array with the slice mode, it will work good.

kisscool-fr said:
Is there a difference in term of performance between sliced and dedicated mode ?

I never heard about something like that.
 
I try to understand the different steps of the howto from turbo13 and i block at this

turb013 said:
Before we can get a final size we need to determine the offset. When "BSDlabel"
creates a slice/partition it does not start it at the first sector it starts it
at sector 63.
Code:
63 * 512 = 32256

So in order to align the start of the slice/partition to a 4K boundary you need
an offset of 1 (512B sector).
Code:
32256 + 512 = 32768 <-- 4K boundary since 32768 can be evenly divided by 4096

We can read in the bsdlabel manuel

Code:
offset  The offset of the start of the partition from the beginning of
	the drive in sectors, ...
	... The first partition should start at offset 16,
	because the first 16 sectors are reserved for metadata.

Don't know what to believe. Does the first partition start at sector 63 like says Turbo13 or does it start at sector 16 like said in the manuel ? Or does it start at sector 63+16=79 ?

turb013 said:
So now that we have the major parameters for "BSDlabel" we need to make a
configuration file. I called mine datadrive.cfg. It is a simple text file. But,
it requires the parameters to be presented in the following format (Lines starting
with "#" are comments):
Code:
# datadrive.cfg

8 partitions:
#         size      offset     fstype     [fsize     bsize    bps/cpg]
a:  2930212864           1     4.2BSD       4096     32768

Aren't the metadatas overwriten in this case ? Which may explain some problems i had during my test phase.

Is the offset value correct ? Shouldn't the offset be set at a value of 64 (63+1) or at least 16 ? Or maybe at 63+16 or 63+16+1 ?

I want to understand the different values and adapt them to my case. So if anyone have a suggestion, i'm open to heard it.

Thanks :)
 
After few days trying to make it works correctly, i had to say i'm not completely satisfied.

I've continued in dedicated mode because looks simpler.

I aligned the partition to the sector 384 (128 for each mirrored disks ).

I can see a good read performance improvement but don't see any write performance improvement.

I have to add that i used default newfs parameters (bsize 16384 and fsize 2048). How important are these parameters ?
 
I will take a look at your link to see if there is something interesting.

But concerning the perfs that you can have with correct parameters, i don't completely agree with you.

I have done some (trivial) benchmarking and can see a difference


Default parameters with not aligned partition :

Code:
# bsdlabel -w da0
# bsdlabel da0
# /dev/da0:
8 partitions:
#        size   offset    fstype   [fsize bsize bps/cpg]
  a: 2929625072       16    unused        0     0
  c: 2929625088        0    unused        0     0         # "raw" part, don't edit
# newfs -U -L datas /dev/da0a > /dev/null
# sync
# tw_cli flush
Flushing write-cache on unit /c0/u0 ...Done.

# mount /datas
# mkdir -p /datas/tmp
# chmod 777 /datas/tmp/
# bonnie++ -u 10013 -d /datas/tmp -q -n 128
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
hostname         8G   628  99 188811  27 53285   8  1125  97 179628  17 555.9   5
Latency             13508us     151ms     259ms   55877us     109ms     848ms
Version  1.96       ------Sequential Create------ --------Random Create--------
hostname            -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                128 21565  41 111209  99 71442  97 21929  43 94554  99 75275  97
Latency               195ms      61us   13628us     195ms     216us      80us


And the same with this time partition aligned

Code:
# bsdlabel -R da0 labelda0r10bis
# bsdlabel da0
# /dev/da0:
8 partitions:
#        size   offset    fstype   [fsize bsize bps/cpg]
  a: 2929624704      384    unused        0     0
  c: 2929625088        0    unused        0     0         # "raw" part, don't edit
# newfs -U -L datas /dev/da0a > /dev/null
# sync
# tw_cli flush
Flushing write-cache on unit /c0/u0 ...Done.

# mount /datas
# mkdir -p /datas/tmp
# chmod 777 /datas/tmp/
# bonnie++ -u 10013 -d /datas/tmp -q -n 128
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
hostname         8G   631  99 244874  35 55298   9  1131  98 182728  17 654.0   6
Latency             13291us     117ms     202ms   47744us     108ms     167ms
Version  1.96       ------Sequential Create------ --------Random Create--------
hostname            -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                128 35626  68 110211  99 70597  97 35050  70 93471  99 76241  99
Latency               195ms    1145us    2177us     195ms      49us      30us


We can see a difference of about 30% and i'm sure with better parameters it can approach 35% 40% or 45% improvement.

I says that because the array tested was a 6 disks raid10 with WD RE3 drives and the values from bonnie++ seems to me a bit poor.
 
fio is much better tool and and it can produce in graphs format too, for certain condition I'm touching around 0.5GB with 4 disks (73GB x 15K SAS with 512MB memory adaptech 5xxxZ card) in RAID10.
 
I've found sysutils/fio in the ports tree and just wanted to give him a try. After reading the man page and some links on the web, it is a very complex load generator. Maybe to difficult for me.

I do not stopped my actual (simple) benchmarking and there is one thing i don't understand. Whatever i do, i am not able to achieve better write speed except modifying the block and fragment size when formating the array. With these parameters, i can have something like 290-300 MBps in both directions.

Code:
bsdlabel -R da0 labelda0r10bis
# bsdlabel da0
# /dev/da0:
8 partitions:
#        size   offset    fstype   [fsize bsize bps/cpg]
  a: 2929624704      384    unused        0     0
  c: 2929625088        0    unused        0     0         # "raw" part, don't edit
# newfs -b 32768 -f 4096 -U -L datas /dev/da0a > /dev/null
# sync
# tw_cli flush
Flushing write-cache on unit /c0/u0 ...Done.

# mount /datas
# mkdir -p /datas/tmp
# chmod 777 /datas/tmp/
# bonnie++ -u 10013 -d /datas/tmp -q -n 128
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
hostname         8G   611  99 296330  39 67413  11  1131  98 295119  30 637.8   8
Latency             14159us     131ms     521ms   35174us     110ms   61138us
Version  1.96       ------Sequential Create------ --------Random Create--------
hostname            -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                128 42269  86 110087  99 72328  99 38682  77 94696  99 77430  99
Latency               195ms    2514us     118us     195ms     746us      26us

I don't know if i am right doing it like this . I'm not sure about what doubling the size implies at different levels.I have to read more about that.

Server specs are a Tyan Toledo 3210 motherboard with a core 2 duo e7400 and 4GB of ram and a 6 disks raid10 on a 3ware 9550sxu-8lp raid card on pci-x slot. Ok, there are not the latest components, but i thought i'll have better perfs.

Or maybe i am doing something wrong, don't know.
 
Yep, i have a bbu connected to the controller.
I created the array with the "protection" profile and switched it to "performance" just after the creation.
I have a feeling that if I create it directly with the "performance" profile, it will be marked "performance" but not work as.

I wonder if it wouldn't be judicious to disable the disks cache and just keep the controller's cache. The server is plugged on an apc ups with apcupsd.

I stopped for the moment the benchs because of summer break. I will certainly continue after the 16 of august.
 
I'm back at work and continue my experiments

I followed your advice and tried iozone but the results are a bit similar with a 4k and 32k record sizes.

Code:
time iozone -t 4 -s 2g -F /datas/tmp/iotest1 /datas/tmp/iotest2 /datas/tmp/iotest3 /datas/tmp/iotest4 -i 0 -i 1
        Iozone: Performance Test of File I/O
                Version $Revision: 3.327 $
                Compiled for 64 bit mode.
                Build: freebsd

        Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
                     Al Slater, Scott Rhine, Mike Wisner, Ken Goss
                     Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
                     Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
                     Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy,
                     Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root.

        Run began: Tue Aug 17 19:14:49 2010

        File size set to 2097152 KB
        Command line used: iozone -t 4 -s 2g -F /datas/tmp/iotest1 /datas/tmp/iotest2 /datas/tmp/iotest3 /datas/tmp/iotest4 -i 0 -i 1
        Output is in Kbytes/sec
        Time Resolution = 0.000001 seconds.
        Processor cache size set to 1024 Kbytes.
        Processor cache line size set to 32 bytes.
        File stride size set to 17 * record size.
        Throughput test with 4 processes
        Each process writes a 2097152 Kbyte file in 4 Kbyte records

        Children see throughput for  4 initial writers  =  174375.52 KB/sec
        Parent sees throughput for  4 initial writers   =  168166.39 KB/sec
        Min throughput per process                      =   41715.78 KB/sec
        Max throughput per process                      =   44642.87 KB/sec
        Avg throughput per process                      =   43593.88 KB/sec
        Min xfer                                        = 1959612.00 KB

        Children see throughput for  4 rewriters        =   55821.24 KB/sec
        Parent sees throughput for  4 rewriters         =   55801.40 KB/sec
        Min throughput per process                      =   10999.88 KB/sec
        Max throughput per process                      =   21706.55 KB/sec
        Avg throughput per process                      =   13955.31 KB/sec
        Min xfer                                        = 1062976.00 KB

        Children see throughput for  4 readers          =  211771.26 KB/sec
        Parent sees throughput for  4 readers           =  211676.48 KB/sec
        Min throughput per process                      =   39061.23 KB/sec
        Max throughput per process                      =   81475.50 KB/sec
        Avg throughput per process                      =   52942.82 KB/sec
        Min xfer                                        = 1005632.00 KB

        Children see throughput for 4 re-readers        =  243934.09 KB/sec
        Parent sees throughput for 4 re-readers         =  243667.07 KB/sec
        Min throughput per process                      =   41258.00 KB/sec
        Max throughput per process                      =  114817.17 KB/sec
        Avg throughput per process                      =   60983.52 KB/sec
        Min xfer                                        =  753840.00 KB



iozone test complete.
0.636u 26.821s 3:21.78 13.6%    267+2645k 115021+108354io 7pf+0w

Code:
time iozone -t 4 -s 2g -r 32k -F /datas/tmp/iotest1 /datas/tmp/iotest2 /datas/tmp/iotest3 /datas/tmp/iotest4 -i 0 -i 1
        Iozone: Performance Test of File I/O
                Version $Revision: 3.327 $
                Compiled for 64 bit mode.
                Build: freebsd

        Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
                     Al Slater, Scott Rhine, Mike Wisner, Ken Goss
                     Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
                     Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
                     Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy,
                     Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root.

        Run began: Tue Aug 17 19:25:15 2010

        File size set to 2097152 KB
        Record Size 32 KB
        Command line used: iozone -t 4 -s 2g -r 32k -F /datas/tmp/iotest1 /datas/tmp/iotest2 /datas/tmp/iotest3 /datas/tmp/iotest4 -i 0 -i 1
        Output is in Kbytes/sec
        Time Resolution = 0.000001 seconds.
        Processor cache size set to 1024 Kbytes.
        Processor cache line size set to 32 bytes.
        File stride size set to 17 * record size.
        Throughput test with 4 processes
        Each process writes a 2097152 Kbyte file in 32 Kbyte records

        Children see throughput for  4 initial writers  =  179054.95 KB/sec
        Parent sees throughput for  4 initial writers   =  173683.09 KB/sec
        Min throughput per process                      =   44275.37 KB/sec
        Max throughput per process                      =   45704.97 KB/sec
        Avg throughput per process                      =   44763.74 KB/sec
        Min xfer                                        = 2031648.00 KB

        Children see throughput for  4 rewriters        =  169592.08 KB/sec
        Parent sees throughput for  4 rewriters         =  169495.14 KB/sec
        Min throughput per process                      =   41365.84 KB/sec
        Max throughput per process                      =   43276.71 KB/sec
        Avg throughput per process                      =   42398.02 KB/sec
        Min xfer                                        = 2004512.00 KB

        Children see throughput for  4 readers          =  224340.68 KB/sec
        Parent sees throughput for  4 readers           =  224145.58 KB/sec
        Min throughput per process                      =   39897.63 KB/sec
        Max throughput per process                      =   90925.13 KB/sec
        Avg throughput per process                      =   56085.17 KB/sec
        Min xfer                                        =  921408.00 KB

        Children see throughput for 4 re-readers        =  197130.70 KB/sec
        Parent sees throughput for 4 re-readers         =  196913.48 KB/sec
        Min throughput per process                      =   42818.64 KB/sec
        Max throughput per process                      =   58523.18 KB/sec
        Avg throughput per process                      =   49282.68 KB/sec
        Min xfer                                        = 1536768.00 KB



iozone test complete.
0.343u 25.067s 2:47.07 15.2%    255+2533k 96357+130340io 3pf+0w

I will try the automatic test and see tomorrow if i have something interesting.
 
I just realized that i am wrong in my benchmark's interpretations. I inverted read and write speeds :r.

So write speeds are good and better when the partition is aligned.
I will tweak now a little the system with sysctl's tunables i've seen on the 3ware kb if i remember well, to improve read speeds.

I've found another article on the kb which explain how to enable disk's cache when controller's write cache is disabled. It seems that when you enable controller's write cache it activates also disk's cache. Don't know if it is possible to disable disk's cache without disabling controller's cache.
 
The selection of storsave policy affects you in realtime. Does not matter what you used when creating the array. This setting can be changed on the fly -- it affects cache usage.

You did not mention for which configuration (raid5 or raid10) your test results are. You could expect much worse write performance in raid5. Unless you intent to mostly store (rare write, frequent read) files, raid5 is bad idea.
 
kisscool-fr said:
Server specs are a Tyan Toledo 3210 motherboard with a core 2 duo e7400 and 4GB of ram and a 6 disks raid10 on a 3ware 9550sxu-8lp raid card on pci-x slot.

All the results posted here were made with this configuration.

I tried other raid configs to find where it does come from. For example a 6 disks raid 0 quick test, gives me 350-400 MB/s for write speeds and 160-170 MB/s for read speeds. It really shows a read performance problem.

I've found the 3ware kb article explaining how to tune the systems. I'll test and see what happens.
 
Finally i have found acceptable values for read and write.

Following the 3ware recommandations, i tried different values for vfs.read_max and vfs.hirunningspace.
I just doubled the vfs.read_max from 8 to 16 and got a very good improvment in read speed (before 160-170 MB/s and now something like 270 MB/s). Other values gives better speeds but doubling just once improve the most.

These values a good enough for me and i don't need to tune more.
 
Back
Top