Tuning a system for speed - help out

dvl@

Developer
I recently put together a new server running FreeBSD 9.1; eventually it will host 8x2TB drives in a raidz2 ZFS array. What I'd like to do is run benchmarks along the way, document it as I go, and allow others to suggest changes, and help out along the way.

Of course, this is blatantly self-serving, but the community as a whole will benefit from the learning process. My goal is to create a practical example of how to methodically, or magically, tune a new system.

This is not a rush job. We can take our time. Weeks, not days.

The list of hardware appears at the above URL. I have posted the dmesg output for reference.

Here are some points to consider:

  • a GENERIC kernel is running
  • the 8x2TB HDD are yet to be installed
  • there is an optional 128GB SSD waiting in the wings
  • the ZFS disks will be partitioned to leave at least 2MB free at the start and end of the drives

Good idea? Right forum? Suggestions? Ideas?
 
I would be interested to know if there was a difference if you removed "makeoptions DEBUG=-g" and a few other debug options (like stack(9) and ktrace(1)) from GENERIC.
 
It was pointed out: what is the workload of this machine? Let me start that by explaining what is running on the machine it will replace:

* It will be my main development server and will contain a lot of disk space. It will run FreeBSD 9.1. The base OS will be installed on two gmirror’d HDD. The rest of the storage will be based on a ZFS raidz2 array
* It sill contain development copies of all my websites (about 20) and databases (about 16 in PostgreSQL and 7 in MySQL), including FreshPorts.
* It'll run a Nagaios server, my Bacula server (and a Bacula database), it'll have some samba shares, run the smokeping server, and the munin server.
* Yep, Apache is on there
* It also contains the cvs repository for my websites

This server isn't usually under high load, but when I'm working on a database problem, I want it to be fast. When I'm checking in code... yeah, fast. When I'm doing an svn up on the FreeBSD ports tree, I want it fast.
 
nginx for speedup

Surprised to see nginx not appearing on your list,
to speed up Apache.
I understand not a lot of people will visit your
dev server, but a tuned nginx's config might
be used as drop-in-replacement in your live setup.
 
I tend to switch products to solve an issue. So far, Apache has served me well.
 
One issue: how to reliably reproduce the benchmarks, and which ones to run?
 
Yeah, recently I found the way of installing FreeBSD on a gmirror myself by luck, almost exactly as Dan shows in his screenshots.

One thing I still don't quite understand though, is this disk alignment parameters. How to verify if things are optimal on a server?
 
I suspect your biggest wins for your workload are likely to be with tuning storage and network performance, for most of what you're doing I doubt CPU will be a bottleneck at all?


But, if the point of this exercise is to benchmark and improve benchmarks... I'm keen to see results.
 
frijsdijk, there are two ways to check alignment. The first is just by determining the starting block of a filesystem. With MBR, this takes a little work.

This is a real example:
Code:
=>       63  226492352  mirror/gm0  MBR  (108G)
         63  226492308           1  freebsd  [active]  (108G)
  226492371         44              - free -  (22k)

The FreeBSD slice starts at block 63, thanks to ancient CHS values that no longer apply. That is not aligned with 4K blocks: (63*512)/4096= 7.875. But that is not a problem, because the filesystems do not start there. They are inside BSD partitions in that slice.

Code:
=>        0  226492308  mirror/gm0s1  BSD  (108G)
          0          1                - free -  (512B)
          1    4194304             1  freebsd-ufs  (2.0G)
    4194305    8388608             2  freebsd-swap  (4.0G)
   12582913    4194304             4  freebsd-ufs  (2.0G)
   16777217   16777216             5  freebsd-ufs  (8.0G)
   33554433  192937872             6  freebsd-ufs  (92G)
  226492305          3                - free -  (1.5k)

The first slice starts at block 1 inside the slice. Adding the starting block of the slice (63), that makes it block 64 on the disk. (64*512)/4096= 8. So that filesystem is aligned for 4K blocks. So are the others, just add the starting block of the slice to the starting block of the FreeBSD partition. The math can be simplified by realizing that a 4K block is just 8 512-byte blocks. So if the starting block of the filesystem can be evenly divided by 8, it is aligned for 4K blocks.

The size of the partitions is also a multiple of 4K. (Except the last one, don't know what happened there.)

The offset inside the partition will be adjusted for alignment when -a4k is used when adding the FreeBSD partitions with gpart(8). Without that, the filesystem partitions will almost certainly be misaligned. bsdlabel(8) offers no way to fix alignment.

The Handbook RAID1 section shows examples. These also work for single drives, the only difference would be the device. ada0 instead of mirror/gm0, for instance.

The second way to test alignment is with benchmarks. A misaligned 4K-block disk might only go half as fast as the same disk aligned correctly.
 
wblock@ said:
The Handbook gmirror instructions were updated very recently, with the best practices we could find. That includes alignment.

For benchmarks, diskinfo -tv for raw read speed and benchmarks/bonnie++ for realistic filesystem speed in a difficult-to-read format.

None of the instructions I read at that URL are designed for a fresh install. They all assume FreeBSD is ALREADY installed.

I see no mention of alignment at that URL. :/
 
As taken from http://dan.langille.org/2013/01/25/aligned-versus-not-aligned/

I first worked only with dd, before realizing that sequential speeds are not affected by alignment issues. DOH.

First, ada1. Aligned partitions are shown first. Then unaligned.

Code:
ada1

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
floater.unix 66000M   519  99 109704  20 41924  33  1028  98 114528  18 120.0   7
Latency             16477us     448ms   15903ms   58855us     902ms    4473ms
Version  1.96       ------Sequential Create------ --------Random Create--------
floater.unixathome. -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 31825  70 +++++ +++ +++++ +++ 24005  57 +++++ +++ +++++ +++
Latency               111ms      50us      41us     252ms      34us      41us


Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
floater.unix 66000M   532  99 113838  20 40954  36  1049  99 116269  19 132.0   5
Latency             16232us     755ms   15295ms   19001us    1624ms    4518ms
Version  1.96       ------Sequential Create------ --------Random Create--------
floater.unixathome. -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 31292  58 +++++ +++ +++++ +++ 23477  46 +++++ +++ +++++ +++
Latency               158ms      72us      38us     125ms      35us      37us


Now for ada2, first, with aligned partitions, then unaligned.

Code:
ada2

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
floater.unix 66000M   519  99 109704  20 41924  33  1028  98 114528  18 120.0   7
Latency             16477us     448ms   15903ms   58855us     902ms    4473ms
Version  1.96       ------Sequential Create------ --------Random Create--------
floater.unixathome. -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 31825  70 +++++ +++ +++++ +++ 24005  57 +++++ +++ +++++ +++
Latency               111ms      50us      41us     252ms      34us      41us


Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
floater.unix 66000M   524  99 121793  23 39647  43  1053  99 120336  20 160.2  12
Latency             16407us     795ms   10873ms   22076us    1623ms     579ms
Version  1.96       ------Sequential Create------ --------Random Create--------
floater.unixathome. -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
Latency             62188us      36us      61us   62548us      37us      41us

Anyone care to interpret the results for us?
 
From http://dan.langille.org/2013/01/27/diskinfo-tests/:

First unaligned, then aligned:

Code:
$ sudo diskinfo -tv ada1
Password:
And with that remarks folks, the case of the Crown vs yourself was proven.
Password:
ada1
    512             # sectorsize
    250059350016    # mediasize in bytes (232G)
    488397168       # mediasize in sectors
    4096            # stripesize
    0               # stripeoffset
    484521          # Cylinders according to firmware.
    16              # Heads according to firmware.
    63              # Sectors according to firmware.
    9VYJ9HY5        # Disk ident.
 
Seek times:
    Full stroke:      250 iter in   7.052209 sec =   28.209 msec
    Half stroke:      250 iter in   5.173003 sec =   20.692 msec
    Quarter stroke:   500 iter in   8.200960 sec =   16.402 msec
    Short forward:    400 iter in   3.161057 sec =    7.903 msec
    Short backward:   400 iter in   3.111896 sec =    7.780 msec
    Seq outer:   2048 iter in   0.157592 sec =    0.077 msec
    Seq inner:   2048 iter in   0.160382 sec =    0.078 msec
Transfer rates:
    outside:       102400 kbytes in   0.744580 sec =   137527 kbytes/sec
    middle:        102400 kbytes in   0.829667 sec =   123423 kbytes/sec
    inside:        102400 kbytes in   1.331500 sec =    76906 kbytes/sec

Code:
$ sudo diskinfo -tv ada1
ada1
    512             # sectorsize
    250059350016    # mediasize in bytes (232G)
    488397168       # mediasize in sectors
    4096            # stripesize
    0               # stripeoffset
    484521          # Cylinders according to firmware.
    16              # Heads according to firmware.
    63              # Sectors according to firmware.
    9VYJ9HY5        # Disk ident.
 
Seek times:
    Full stroke:      250 iter in   7.061921 sec =   28.248 msec
    Half stroke:      250 iter in   5.189909 sec =   20.760 msec
    Quarter stroke:   500 iter in   8.184480 sec =   16.369 msec
    Short forward:    400 iter in   3.161515 sec =    7.904 msec
    Short backward:   400 iter in   3.103441 sec =    7.759 msec
    Seq outer:   2048 iter in   0.166287 sec =    0.081 msec
    Seq inner:   2048 iter in   0.165270 sec =    0.081 msec
Transfer rates:
    outside:       102400 kbytes in   0.746159 sec =   137236 kbytes/sec
    middle:        102400 kbytes in   0.829952 sec =   123381 kbytes/sec
    inside:        102400 kbytes in   1.331916 sec =    76882 kbytes/sec
 
dvl@ said:
None of the instructions I read at that URL are designed for a fresh install. They all assume FreeBSD is ALREADY installed.

Figure out a way to get bsdinstall(8) to install to a mirror, and that can be added. Of course, if bsdinstall(8) could create and install to a mirror in the first place, that would simplify things greatly.

I see no mention of alignment at that URL. :/

Discussion of alignment is out of scope of the article, which is already bigger than I would prefer. But the commands create correct alignment.
 
There's an important detail hidden in the dmesg(8): these are Seagate drives. Seagate has a thing called SmartAlign which claims, through some technique I have not seen fully described, to eliminate the need to align partitions. Only on their drives, of course. My guess is that it reworks transfers, transferring misaligned bits at the start and end separately, and doing the rest aligned. Kind of the same thing that happens using dd(1) to read or write. Found the patent, maybe: http://www.google.com/patents?printsec=abstract&zoom=4&id=GxbLAAAAEBAJ&output=text&pg=PA3.

Your benchmark numbers suggest that it works. 109704K/sec writes aligned, 113838K/sec unaligned, 114528K/sec reads aligned, 116269K/sec unaligned. Those numbers are within 2-3% of each other, and might disappear entirely in multiple tests.

So is aligning those drives a waste of time? No. Align all drives from now on, it's a good protective habit that does no harm on 512-byte block drives.
 
wblock@ said:
Discussion of alignment is out of scope of the article, which is already bigger than I would prefer. But the commands create correct alignment.

Oh, then I misunderstood what you meant by:

The Handbook gmirror instructions were updated very recently, with the best practices we could find. That includes alignment.
 
dvl@ said:
FYI, they are not both Seagate drives.

One is a Seagate: ST250DM000-1BD141

The other is a Western Digital: WD2500AAKX

Hmm. The WD docs say that drive is Advanced Format, but there's not much point in using it on relatively smaller drives like that. The output of gpart list on that drive should verify:

Code:
1. Name: ada0
   Mediasize: 256060514304 (238G)
   Sectorsize: 512   <- logical sector size
   Stripesize: 4096  <- physical sector size
 
Code:
$ gpart list ada2
Geom name: ada2
modified: false
state: OK
fwheads: 16
fwsectors: 63
last: 488397134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: ada2s1
   Mediasize: 250059309056 (232G)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 20480
   Mode: r0w0e0
   rawuuid: f9276f63-68ef-11e2-b888-00259082215a
   rawtype: 516e7cb4-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 250059309056
   offset: 20480
   type: freebsd
   index: 1
   end: 488397127
   start: 40
Consumers:
1. Name: ada2
   Mediasize: 250059350016 (232G)
   Sectorsize: 512
   Mode: r0w0e0
 
Back
Top