I've just deployed two FreeBSD 10.1-R (fully updated to -p5 as of this post) servers which are using two Crucial 960M500 SSDs (running the latest firmware, MU05) in a GMIRROR. Partitions are aligned to 1 MiB boundaries. I'm using UFS with TRIM enabled.
When I run
What's even odder is originally before GMIRRORing these disks there was an Adaptec 6504 RAID card and I had the SSDs in a hardware RAID 1 with it and I saw the same issue. There are two servers configured identically and both show the problem.
Here is a typical example of
Additionally while this is happening the system can freeze when any other I/O is required, even causing the occasional console message like this:
Load also climbs to 4-5.0 during this time (as I'd expect if I/O was backed up).
We've used
Here is the
Read: 172 MiB/sec
Write: 960 MiB/sec
System specs:
FreeBSD 10.1-RELEASE-p5 64-bit; Kernel GENERIC
Intel® Xeon® CPU E5-1650 v2 @ 3.50GHz; Hyper-Threaded; 64-bit; 6x Physical Cores
32.0 GiB RAM: 2.7 GiB used / 327.0 MiB cache / 29.0 GiB free
Intel Patsburg AHCI SATA controller: Channel 0 (ahci0:ahcich0)
Things I have tried to resolve the issue:
If I do a DD to test throughput I can get 400 MiB/sec writes, and under "normal" use the system seems fine -- however it's not in production yet so it's not seeing real load. I worry that something is broken and when it does start doing it's job it's going to start freezing (these will become DB servers).
This may very well be some kind of artifact of
Any suggestions/questions/etc welcome. I've got time to continue testing things easily for now.
When I run
bonnie++
on the volume write/read tests start out fast but degrade. By the time it hits the file creation phase I can see under gstat
that I/O "busy" is maxed out yet very little throughput. I would expect this to some degree as it's very random read/writes while it makes files but this "busyness" can take hours to run a test that should take 5-10 minutes. What's even more interesting is the "busy" will last past the bonnie++
run, sometimes for hours.What's even odder is originally before GMIRRORing these disks there was an Adaptec 6504 RAID card and I had the SSDs in a hardware RAID 1 with it and I saw the same issue. There are two servers configured identically and both show the problem.
Here is a typical example of
gstat
during this perriod:
Code:
dT: 1.020s w: 1.000s filter: ada[0-9]$
L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
29 56 0 0 0.0 0 0 0.0 101.0| ada0
25 53 0 0 0.0 0 0 0.0 99.2| ada1
Additionally while this is happening the system can freeze when any other I/O is required, even causing the occasional console message like this:
Code:
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 120335, size 28672
Load also climbs to 4-5.0 during this time (as I'd expect if I/O was backed up).
We've used
bonnie++
for years and it's standard for us on new deployments to test the hardware before going into production, I've never see something like this.Here is the
bonnie++
output. For and SSD write performance is very low (meaning this issue is very-write specific in my view) and read performance is on par with what I'd expect (roughly 2x a single SSD - yay GMIRROR!).Read: 172 MiB/sec
Write: 960 MiB/sec
Code:
Version 1.97 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
db1.michigan.ls 64G 796 99 172216 10 99444 7 1313 99 960844 39 3428 27
Latency 31966us 1347ms 15310ms 18280us 9752us 1991ms
Version 1.97 ------Sequential Create------ --------Random Create--------
db1.michigan.ls.pri -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 1 0 115 0 +++++ +++ 2 0 +++++ +++ +++++ +++
Latency 9118s 141s 36us 5462s 9us 19us
1.97,1.97,db1.michigan.ls.privsub.net,1,1423046069,64G,,796,99,172216,10,99444,7,1313,99,960844,39,3428,27,16,,,,,1,0,115,0,+++++,+++,2,0,+++++,+++,+++++,+++,31966us,1347ms,15310ms,18280us,9752us,1991ms,9118s,141s,36us,5462s,9us,19us
System specs:
FreeBSD 10.1-RELEASE-p5 64-bit; Kernel GENERIC
Intel® Xeon® CPU E5-1650 v2 @ 3.50GHz; Hyper-Threaded; 64-bit; 6x Physical Cores
32.0 GiB RAM: 2.7 GiB used / 327.0 MiB cache / 29.0 GiB free
Intel Patsburg AHCI SATA controller: Channel 0 (ahci0:ahcich0)
Things I have tried to resolve the issue:
- Disabling power management entirely in BIOS
- Disabling powerd
- Disabling SATA agressive link management
- Switching SATA from AHCI to IDE mode
- Disabling VT-d and other virtualization options
If I do a DD to test throughput I can get 400 MiB/sec writes, and under "normal" use the system seems fine -- however it's not in production yet so it's not seeing real load. I worry that something is broken and when it does start doing it's job it's going to start freezing (these will become DB servers).
This may very well be some kind of artifact of
bonnie++
but the state it puts the system (sometimes for HOURS) is very worrying and I feel like should not be happening.Any suggestions/questions/etc welcome. I've got time to continue testing things easily for now.