ZFS slow on FreeBSD 9

phoenix said:
Note also, that just handing the entire drive to ZFS and using gnop is not enough to get proper 4K alignment. Even if you give the whole drive to ZFS, it doesn't use the entire disk starting at LBA 0.
This is very interesting and certainly not what I would have expected. So where would be the point then to hand the complete device to ZFS?
 
phoenix said:
I wouldn't qualify a Core i3 and 2 GB of RAM as "a fast motherboard". I'd barely qualify it as "a usable desktop". :) Especially if you are sticking SAS drives onto it.

There most certainly is something wrong with your setup. My home NAS (...)

You know, you should perhaps apply to the Vatican, your mindset might be of use there. In religion you attack or demean anyone who dares to go against your world view, facts couldn't be less important. In science, we have respect for the facts and that's what makes the world a better place: a little humility in the face of reality. You ignore what I am saying and go on nitpicking what is fast or slow. When I said fast I said it in the context of the task at hand: ZFS. Perhaps I should have said 'fast like > 300 MB/s when Solaris is installed on it and "barely a usable desktop" when FreeBSD is running.' Perhaps then you would not have had so much opportunity for nitpicking and would have paid more attention to the facts. Besides, I never said it was a desktop, I don't know where you got that impression. But never let facts (or the absence of them) get in the way of a good disdainful post to put n00bies in their place. Administering Solaris boxes I would not qualify myself as a newbie but I certainly am a FreeBSD newbie, so you could have a point there :)

phoenix said:
If my horribly under-powered setup can do 60 MBps with ancient SATA controllers and harddrives, then you're doing something wrong if you can only get 60 MBps out of SAS drives and controller.

According to your logic I must be doing something wrong when FreeBSD is installed but as my barely usable desktop smokes yours (and I didn't start the e-Penis comparison, you did, and for the record, I don't have one) when Solaris runs on it, then I must be doing something right! We arrive at a contradiction. I think this is all the answer you will get from me specially when you say what you say here - can you be held to your words? then why the bickering in your reply?
 
lucinda said:
In science, we have respect for the facts and that's what makes the world a better place:

Link: http://en.wikipedia.org/wiki/Scientific_method

lucinda said:
Wow. This situation is so unprofessional of FreeBSD devs. I was about to create a thread about another slowness on FBSD 9.0 where SXCE is going four times as fast, same hardware, same disks, 512 b/sec by the way, good old fast SAS disks. Now I won't even bother and I'll keep using OpenSolaris or Solaris Express. The numbers don't add up regarding ZFS on FreeBSD, which by any other measure is pretty good and stable. I use it for a number of things and it's been good but back to my complain (and I don't mean to hijack the thread) I wonder why FreeBSD is trumpeting ZFS as 'stable' on http://www.freebsd.org since 8.2 when it's obviously incomplete, demonstrably slow, and now panics too. Just DON'T lie! ZFS on FreeBSD is not ready for production so don't lie on the front page, it hurts credibility terribly.

You can not PROVE the hypothesis with a single experiment, because there is a chance that you made an error somewhere along the way.

Best Regards,
George
 
@lucinda

I have set up many (more than I have fingers) FreeBSD systems; NAS and others, that have the same performance as when running Solaris. It's not that FreeBSD is incapable, but can demand more fine-tuning and experience to reach the same level of performance. That's why I try my best to test and document as much as possible so that others can read it and use that information. But it does require you to search for information that you perhaps didn't know beforehand that you should have to searched for, and that may be a part of why Solaris can feel "better"; because it requires less experience. Personally, I instead feel "confined" by Solaris, and that's why I choose FreeBSD any day.

But because there are plenty of people that can account for FreeBSD systems with equal or better performance than this particular system, with "similar" hardware, it is hard to just say;
FreeBSD = BAD
Solaris = GOOD

Although, until the official documentation reflects how to achieve this performance in FreeBSD from the beginning, we as a community can just do our best to come with suggestions on how you can improve your current situation.

As always, no one will ever force you to use FreeBSD. So if you feel more happy using Solaris, then go ahead.

And lucinda, it's pretty obvious to figure out your gender from your alias, and frankly, I don't care. And I hope this stays being a place where *beep* size is only measured in IOPS:)

/Sebulon
 
Sebulon said:
I have set up many (more than I have fingers) FreeBSD systems; NAS and others, that have the same performance as when running Solaris.

During what? The last year? Then it would have been 8.2 at most. Can you point to a single benchmark on the Internet that substantiates your claim, namely FreeBSD ZFS as fast as Solaris ZFS?

Sebulon said:
It´s not that FreeBSD is incapable, but can demand more fine-tuning and experience to reach the same level of performance. That´s why I try my best to test and document as much as possible so that others can read it and use that information.

I also like to do this, to some extent. Can you share some insight? My disks are advertising 512 B/sec physical and logical (Hitach Ultrastar 15K450, HUS154545VLS300), which means no 4k strangeness. I handed them raw to zpool, which means they are 4k unaligned but it doesn't matter in 512 B/sec discs as we all know.

(If anyone is going to say "no, it matters, sometimes", please show some data with your claim)

Sebulon said:
But it does require you to search for information that you perhaps didn´t know beforehand that you should have to searched for, and that may be a part of why Solaris can feel "better"; because it requires less experience. Personally, I instead feel "confined" by Solaris, and that´s why I choose FreeBSD any day.

Well, not exactly. I appreciate your point. I am not frightened by complexity when it buys me flexibility. I like FreeBSD, just ZFS is very subpar against Solaris. When running FreeBSD no part of the system looks stressed: iostat/vmstat/gstat, yet it's slow. About complexity vs banging your head against a wall: the difference is very clear.

Sebulon said:
Although, until the official documentation reflects how to achieve this performance in FreeBSD from the beginning, we as a community can just do our best to come with suggestions on how you can improve your current situation.

This is what I miss: documentation. Short of reading the source code, what's left? FreeBSD ZFS wiki and Handbook all say amd64 is autotuning. Maybe it is, but yet again, so slow. Still, if it were not, there are only three sysctls listed and none apply to my case. Benchmarks everywhere show it's far slower. I don't want to be unjust, I know it's a complicated piece of software, and evolving. But on the wiki, just say it's slower. Every release saying it 'significantly better' than the last gives no real information.
 
@lucinda

Can you point to a single benchmark on the Internet that substantiates your claim, namely FreeBSD ZFS as fast as Solaris ZFS?
I´ll have to get back to you on the Solaris part, I know I put them somewhere around newfs /dev/null, the storage space is unbelievable, so that´s going to take a while to find again:)

But I can happily link to my latest shenanigan: GELI Benchmarks. A smaller but powerful NAS, aimed at around 24TB at most, but encrypted in real time with geli and scored 400MB/s write with bonnie++. Speaking of bonnie, there are more people posting on the matter- myself included with performance results from another of my systems: My ZFS V28 benchmarks. Would give more examples, but I have to run...

/Sebulon
 
Ok, calm down a bit, will the lot of you? No, I am not a moderator and I think I would not like to be, but I do not think this is going into the right or polite ways.

@lucinda:
Maybe phoenix should have written that something is going wrong, not that you are doing this.

You quote on iostat/vmstat/gstat being low, the CPU load should not be an issue there, is it? May I inquire how much bandwidth do the drives deliver raw when all of them are streaming to /dev/null? Just so we can rule out other factors like drivers and PCIe lanes, and be content to argue about ZFS on FreeBSD ;)
 
Lucinda, as any one of us, is free to believe whatever she chooses.

I have not seen Phoenix make fun of / belittle anyone in any of his posts since I have been here; allegedly "controversial" post included.
 
Crivens said:
This is very interesting and certainly not what I would have expected. So where would be the point then to hand the complete device to ZFS?

Works fine for non-4K drives, and makes management simpler, as you just label the drive. No partitioning needed.

But, with 4K drives, especially ones that emulate 512B sectors, you need to do some manual twiddling to get things properly optimised.
 
lucinda said:
Qaz, for the record I also have been considering 9.0 RELEASE as a replacement for a Solaris box. My setup is a fast motherboard with Core i3, 2 GB RAM, 1 Adaptec 3085, 4x400 GB SAS disks. With an old release of Solaris Express Community Edition, build snv_113, I get > 300 MB/s read performance out of that setup. FreeBSD 9.0 slashes these speeds by 4, at 60-80 MB/s - I've done no tuning and I'll open another post asking for tips here but it's frustrating that the same hardware is so slow. If it was for my home NAS I'd be worried about backups taking 24 hours instead of 6 but there's no way I can substitute a busy Solaris box at the office with FreeBSD with these numbers. The FreeBSD handbook says amd64 is largely autotuning (Solaris is) but if this is the best FreeBSD can do I wonder why Solaris is 4 times as fast on the same hardware.

I would make a guess. Since your system has only 2GB RAM, in FreeBSD, prefetch would be disabled. This may be the cause of perceived slowness in FreeBSD. Could you check the output of

$ zpool iostat -v

Look at the column about read bandwidth. Is the bandwidth of the whole pool just a fraction of the sum of bandwidth of each drive? If possible, also run the above command in Solaris and compare the two results.
 
@Qaz

Comparing the output of zpool iostat, you can see that before the change reading speed is around 1.5MB/s. After the change, it increases to 8MB/s. Also, writing is more consistent at 16MB/s. I guess with your original machine, average writing speed at 30MB/s may be achievable.
 
t1066 said:
I would make a guess. Since your system has only 2GB RAM, in FreeBSD, prefetch would be disabled. This may be the cause of perceived slowness in FreeBSD. Could you check the output of

$ zpool iostat -v

Look at the column about read bandwidth. Is the bandwidth of the whole pool just a fraction of the sum of bandwidth of each drive? If possible, also run the above command in Solaris and compare the two results.

I'll post all the data in one place, and your suggestion improved the situation a lot. I'll discuss below:
(Using code tags since nothing else preserves leading space)

Code:
# camcontrol devlist
<HITACHI HUS154545VLS300 A570>     at scbus0 target 0 lun 0 (pass0)
<HITACHI HUS154545VLS300 A570>     at scbus0 target 1 lun 0 (pass1)
<HITACHI HUS154545VLS300 A570>     at scbus0 target 2 lun 0 (pass2)
<HITACHI HUS154545VLS300 A570>     at scbus0 target 3 lun 0 (pass3)

dmesg (some, hopefully relevant)

Code:
CPU: Intel(R) Core(TM) i3 CPU         530  @ 2.93GHz (2942.49-MHz K8-class CPU)
  Origin = "GenuineIntel"  Id = 0x20652  Family = 6  Model = 25  Stepping = 2
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x98e3bd<SSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT>
  AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
  AMD Features2=0x1<LAHF>
  TSC: P-state invariant
real memory  = 2147483648 (2048 MB)

(...)

aac0: <Adaptec RAID 3085> mem 0xfb600000-0xfb7fffff irq 18 at device 14.0 on pci3
aac0: Enabling 64-bit address support
aac0: Enable Raw I/O
aac0: Enable 64-bit array
aac0: New comm. interface enabled
aac0: [ITHREAD]
aac0: Adaptec 3085, aac driver 2.1.9-1
aacp0: <SCSI Passthrough Bus> on aac0
aacp1: <SCSI Passthrough Bus> on aac0
aacp2: <SCSI Passthrough Bus> on aac0

(...)

aacd0: <Volume> on aac0
aacd0: 429056MB (878706688 sectors)
aacd1: <Volume> on aac0
aacd1: 429056MB (878706688 sectors)
aacd2: <Volume> on aac0
aacd2: 429056MB (878706688 sectors)
aacd3: <Volume> on aac0
aacd3: 429056MB (878706688 sectors)

(...)

pass0 at aacp0 bus 0 scbus0 target 0 lun 0
pass0: <HITACHI HUS154545VLS300 A570> Fixed Uninstalled SCSI-5 device 
pass0: 3.300MB/s transfers
pass1 at aacp0 bus 0 scbus0 target 1 lun 0
pass1: <HITACHI HUS154545VLS300 A570> Fixed Uninstalled SCSI-5 device 
pass1: 3.300MB/s transfers
pass2 at aacp0 bus 0 scbus0 target 2 lun 0
pass2: <HITACHI HUS154545VLS300 A570> Fixed Uninstalled SCSI-5 device 
pass2: 3.300MB/s transfers
pass3 at aacp0 bus 0 scbus0 target 3 lun 0
pass3: <HITACHI HUS154545VLS300 A570> Fixed Uninstalled SCSI-5 device 
pass3: 3.300MB/s transfers

pool (note it's simply striped)

Code:
# zpool status
  pool: lykke2
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        lykke2      ONLINE       0     0     0
          aacd0     ONLINE       0     0     0
          aacd1     ONLINE       0     0     0
          aacd2     ONLINE       0     0     0
          aacd3     ONLINE       0     0     0

the default setting for < 4 GB

Code:
# sysctl -a | grep vfs.zfs.prefetch_disable
vfs.zfs.prefetch_disable: 0

and our friend ashift

Code:
# zdb | grep ashift
            ashift: 9
            ashift: 9
            ashift: 9
            ashift: 9

reading 4 GB this happens:

cores are mostly idle

Code:
# vmstat -P 5
 procs      memory      page                    disks     faults         cpu0     cpu1     cpu2     cpu3     
 r b w     avm    fre   flt  re  pi  po    fr  sr ad12 aa0   in   sy   cs us sy id us sy id us sy id us sy id
 0 0 0    773M   456M     0   0   0   0   201   0   0 241  976 67079 9966  0  3 97  0  6 94  0  4 96  0  3 97
 0 0 0    773M   456M     0   0   0   0   187   0   0 197  795 58149 8144  0  5 95  0  3 97  0  2 98  0  3 97
 0 0 0    773M   456M     0   0   0   0  1670   0   0 240  970 66496 9840  0  7 93  0  5 95  0  5 94  0  4 96
 0 0 0    746M   457M     7   0   0   0  3806   0   0  71  311 21995 3400  0  2 98  0  3 97  0  3 97  0  8 92

disks very lightly used

Code:
# iostat -d -n5 5
            ad12            aacd0            aacd1            aacd2            aacd3 
KB/t tps  MB/s   KB/t tps  MB/s   KB/t tps  MB/s   KB/t tps  MB/s   KB/t tps  MB/s 
  0.00   0  0.00  113.41  58  6.38  104.43  65  6.59  101.04  65  6.43  101.86  68  6.80 
  0.00   0  0.00  126.92 252 31.18  125.99 253 31.07  126.05 251 30.94  126.14 250 30.79 
  0.00   0  0.00  125.56 224 27.51  125.34 226 27.65  124.10 223 26.97  125.67 227 27.90 
 16.00   0  0.00  125.17 261 31.92  125.06 256 31.26  125.26 257 31.48  125.05 259 31.62 
  0.00   0  0.00  127.66 237 29.57  127.93 240 29.93  127.85 239 29.83  127.84 238 29.76 
  0.00   0  0.00  127.97 195 24.42  128.00 195 24.35  127.98 196 24.54  128.00 196 24.50

zpool iostat
Code:
# zpool iostat 5
               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
lykke2      1.08T   562G    132      0  10.9M      0
lykke2      1.08T   562G  1.00K      0   126M      0
lykke2      1.08T   562G    887      0   108M      0
lykke2      1.08T   562G  1.02K      0   128M      0
lykke2      1.08T   562G    955      0   119M      0
lykke2      1.08T   562G    787      0  98.4M      0

and gpart shows 19-20% utilization

Now, doing

Code:
# sysctl vfs.zfs.prefetch_disable=0
vfs.zfs.prefetch_disable: 1 -> 0

the situation is somewhat improved, there's high variability: I do a find on a directory with 30 1 GB files and cat them to /dev/null; sometimes the numbers add up to what they should be, sometimes it falls down. This filesystem was created by zfs send/recv and I don't know if fragmented files at the origin are fragmented at the destination; I also don't know if these files are fragmented at the origin or how much, I don't know how to look at that in ZFS but below is a raw dd so if not fragmentation something (probably much) is going on:

Code:
# zpool iostat 5
               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
lykke2      1.08T   562G  2.92K      0   371M      0
lykke2      1.08T   562G  3.58K      0   454M      0
lykke2      1.08T   562G  3.51K      0   446M      0
lykke2      1.08T   562G  1.91K      0   241M      0
lykke2      1.08T   562G    696      0  83.6M      0
lykke2      1.08T   562G    753      0  90.8M      0
lykke2      1.08T   562G    812      0  98.0M      0
lykke2      1.08T   562G    709      0  85.4M      0
lykke2      1.08T   562G    727      0  87.6M      0
lykke2      1.08T   562G    698      0  84.2M      0
lykke2      1.08T   562G    753      0  90.5M      0
lykke2      1.08T   562G    748      0  89.9M      0
lykke2      1.08T   562G    961      0   116M      0
lykke2      1.08T   562G    858      0   104M      0
lykke2      1.08T   562G   1003      0   122M      0
lykke2      1.08T   562G    989      0   120M      0
lykke2      1.08T   562G    827      0   100M      0
lykke2      1.08T   562G  1.10K      0   137M      0
lykke2      1.08T   562G    860      0   104M      0
lykke2      1.08T   562G  1.02K      0   127M      0
lykke2      1.08T   562G  2.34K      0   296M      0
lykke2      1.08T   562G  3.69K      0   469M      0
lykke2      1.08T   562G  1.29K      0   162M      0
lykke2      1.08T   562G  1.03K      0   128M      0
lykke2      1.08T   562G  1.07K      0   133M      0
lykke2      1.08T   562G  1.06K      0   132M      0
lykke2      1.08T   562G  1.04K      0   130M      0
lykke2      1.08T   562G  1.08K      0   134M      0
lykke2      1.08T   562G  1.14K      0   142M      0
lykke2      1.08T   562G  1.02K      0   126M      0
lykke2      1.08T   562G    855      0   103M      0
lykke2      1.08T   562G    713      0  86.0M      0
lykke2      1.08T   562G    712      0  85.3M      0

The raw devices can do better:

Code:
# dd if=/dev/aacd1 of=/dev/null bs=1M count=4096
4096+0 records in
4096+0 records out
4294967296 bytes transferred in 26.147740 secs (164257686 bytes/sec)

That's 157 MB/s, admittedly at the beginning of the disk. The average speed might be about 100 * 4 which I don't dare to calculate.

In fact the data path (disks + controller + OS) can handle the four devices at the same time: launching a dd for each disk, gstat shows 96-97%/disk and iostat shows this nice view:

Code:
# iostat -d -n5 5
            ad12            aacd0            aacd1            aacd2            aacd3 
KB/t tps  MB/s   KB/t tps  MB/s   KB/t tps  MB/s   KB/t tps  MB/s   KB/t tps  MB/s 
 16.00   0  0.00  128.00 1219 152.37  128.00 903 112.88  128.00 1018 127.25  128.00 1121 140.17 
 16.00   0  0.00  128.00 1225 153.14  128.00 1212 151.49  128.00 1220 152.54  128.00 1228 153.44 
  0.00   0  0.00  128.00 1228 153.52  128.00 1225 153.07  128.00 1227 153.35  128.00 1225 153.17 
  0.00   0  0.00  128.00 1220 152.51  128.00 1219 152.36  128.00 1226 153.23  128.00 1228 153.53 
 16.00   0  0.00  128.00 1225 153.10  128.00 1227 153.33  128.00 1227 153.40  128.00 1229 153.68 
  0.00   0  0.00  128.00 425 53.18  128.00 767 95.92  128.00 634 79.28  128.00 520 65.06

The write tests will have to wait for another day as I have nothing that can go near 200-400 MB/s to keep the board busy and this is a pool without redundance to just test the state of the art; lots of two way mirrors (my preferred setup) will perform worse.

So, it's a nice improvement and I'll keep prefetch enabled; still not perfect but if this speed does not lead to instability I am much happier. Thanks to everyone for the tips and sorry for my grumpiness and the beginning!
 
@lucinda

Glad to be helpful.

Since FreeBSD and Solaris handle memory differently, for safe guard, you may want to add
Code:
vfs.zfs.arc_max="1G"
to /boot/loader.conf. Rerun the above test to see what impact this change would do.
 
Sebulon said:
If you could install benchmarks/bonnie++, I would like to see the output of:
# bonnie++ -d /zfs/dir -u 0 (if root) -s 8g

There you go:

Code:
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
bsdginz2.example 8G   157  99 491934  79 349396  66   412  99 887696  71  1039  16
Latency             68512us     162ms     304ms   54978us   34702us     111ms
Version  1.96       ------Sequential Create------ --------Random Create--------
bsdginz2.example.co -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 20879  70 +++++ +++ 22373  99 31393  99 +++++ +++ 28047  99
Latency             91972us     124ms     686us   15007us      99us     133us
 
t1066 said:
Since FreeBSD and Solaris handle memory differently, for safe guard, you may want to add vfs.zfs.arc_max="1G" to /boot/loader.conf. Rerun the above test to see what impact this change would do.

At 1 G the results are very alike: it starts at 450 MB/s for 15 sec (iostat and zpool iostat show the same, there is very little compression) and then it quickly goes down to 100 making 250 once or twice but mostly around 100 MB/s.

I looked into this filesystem and I thought all files were 1 GB but only about 10 % of the files are really 1 GB while the rest are at 700-800 MB, incomplete, because this is a torrent that was caught in a snapshot before finishing, which is what I have been using for testing.

So there is a bit of sparseness here, might this explain the drop in performance? I have noticed abysmal performance also in two filesystems that contain VMware ESXi vmdks, heavily sparse (like 10 to 1). The rest of the filesystem (950 GB, 370,000 files) takes 1 hour to md5 while these two (50 GB, only 280 files) takes another hour... the disks are not loaded at all save 40 MB/s every minute or two and I see a big increase in the number of freed pages per second in vmstat from 10k-200k in the 950 GB of non-sparse files to 0.9-1.2m when going over the sparse filesystems, which I don't understand at all. All the time there are about 100-170 MB free (top).

If I leave out the md5 and just cat a 21 GB file, 1.2 GB on disk, to /dev/null it takes three minutes, 110 MB/s. The 'real' files are going twice as fast! Has anyone also observed this slowdown when touching sparse files? Why are so many pages being freed at the same time?
 
lucinda said:
There you go:

Code:
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
bsdginz2.example 8G   157  99 491934  79 349396  66   412  99 887696  71  1039  16
Latency             68512us     162ms     304ms   54978us   34702us     111ms
Version  1.96       ------Sequential Create------ --------Random Create--------
bsdginz2.example.co -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 20879  70 +++++ +++ 22373  99 31393  99 +++++ +++ 28047  99
Latency             91972us     124ms     686us   15007us      99us     133us

Have you enabled compression on the file system? The speed seems faster than what the hardware can deliver.

lucinda said:
At 1 G the results are very alike: it starts at 450 MB/s for 15 sec (iostat and zpool iostat show the same, there is very little compression) and then it quickly goes down to 100 making 250 once or twice but mostly around 100 MB/s.

I looked into this filesystem and I thought all files were 1 GB but only about 10 % of the files are really 1 GB while the rest are at 700-800 MB, incomplete, because this is a torrent that was caught in a snapshot before finishing, which is what I have been using for testing.

So there is a bit of sparseness here, might this explain the drop in performance? I have noticed abysmal performance also in two filesystems that contain VMware ESXi vmdks, heavily sparse (like 10 to 1). The rest of the filesystem (950 GB, 370,000 files) takes 1 hour to md5 while these two (50 GB, only 280 files) takes another hour... the disks are not loaded at all save 40 MB/s every minute or two and I see a big increase in the number of freed pages per second in vmstat from 10k-200k in the 950 GB of non-sparse files to 0.9-1.2m when going over the sparse filesystems, which I don't understand at all. All the time there are about 100-170 MB free (top).

If I leave out the md5 and just cat a 21 GB file, 1.2 GB on disk, to /dev/null it takes three minutes, 110 MB/s. The 'real' files are going twice as fast! Has anyone also observed this slowdown when touching sparse files? Why are so many pages being freed at the same time?

Setting the limit to 1G is mainly for stability and leave some memory for other programs. But maybe FreeBSD had improved in memory management that this is no longer necessary. As for how FreeBSD deal with sparse files, you should better ask it on the mailing lists.
 
t1066 said:
Have you enabled compression on the file system? The speed seems faster than what the hardware can deliver.

Yes; these are the results with compression off:

Code:
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
bsdginz2.example 8G   165  99 354348  59 165725  29   405  96 485177  34 498.4   5
Latency             58438us     495ms    1274ms     150ms     119ms     198ms
Version  1.96       ------Sequential Create------ --------Random Create--------
bsdginz2.example.co -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 22161  73 +++++ +++ 20936  99 23220  87 28977  93 14834  92
Latency             87640us     111ms     257us     110ms   29366us   61132us


t1066 said:
Setting the limit to 1G is mainly for stability and leave some memory for other programs. But maybe FreeBSD had improved in memory management that this is no longer necessary. As for how FreeBSD deal with sparse files, you should better ask it on the mailing lists.

Thanks, I'll do that
 
@lucinda,

if this is not a production server and you are just experimenting I would suggest an upgrade to FreeBSD 9.0-STABLE.

Perform the same tests as before with only the following options in your /boot/loader.conf:

First test:
Code:
vfs.zfs.arc_max="1024M"

Second test:
Code:
vfs.zfs.prefetch_disable=0

The first one should give you a more stable system in regards to your RAM specs. The second one should give you a much better read performance.
 
@gkontos,

Thanks for the suggestions. I tried 9.0 in the beginning but 8.3 was recommended as a more stable option, which I installed. Whatever I end up using I should remind the thread that this pool is just a stripe and thus the speed is a bit unrealistic, everything should be divided by 2 at least for a pool with redundance so it's pointless to keep testing; I did a stripe to have enough room instead of destroying the real backup machine, which is also Solaris. I like speed but the main point is stability and for the moment I'll unload the filesystem here to an old machine running 8.3 or maybe 9.0 that I will dedicate to backups if all goes well. After some months of testing, adding snapshots, checksumming, comparing the FreeBSD and the Solaris backups, making sure it survives failures, and whatever I can come up with, if I see no problems I will trust ZFS in FreeBSD more... doing more tests now will show little. But thanks for all the help. FreeBSD enables me to reuse old machines, old disks (and old Promise ATA controllers!) that Solaris wouldn't even look at, thus freeing the fast hardware for more play.

I should now go to understand things like how does the ARC behave, if it is given back to the OS under pressure from other programs (a la 'normal' cache or not), what's going on with what I saw earlier about sparse files and more things that'll come up as I go.
 
Back
Top