3ware JBOD and ZFS, controller settings

Howdy,

I have a 3Ware 9550SX (4 port) raid controller and I'm using it with 4 1TB WD RE3 drives in an 8.0 box. I'm using ZFS with two mirrors in a pool. Performance seems OK on benchmarks, but not quite what I'd expect and the system gets quite laggy during writes - for example doing tab completion in the shell will have a slight delay of 1-3 seconds if there is a continuous write operation happening on the pool.

When I set this box up, I simply configured the 3Ware to pass through all the drives in "JBOD" mode. I just added 3DM and tw_cli to see if there's anything worth setting up in the controller. I see a few options there regarding write caching (the card has 256MB cache, but no BBU) and some various "performance" settings. This is where I get lost - I'm not sure that if I enable the write cache if the zfs layer will be aware of this and take that into consideration. I'm also not sure how to verify whether this controller is enabling NCQ for the drives - they support it, and I know when the controller is handling RAID tasks, it uses NCQ if the drives support it, but in JBOD mode, I'm finding conflicting info. "camcontrol" claims a queue depth of 254 on the drives which I believe exceeds the specs of SATA NCQ.

I'm now seeing some info suggesting that the controller should not be setup in JBOD mode, but to export each drive as a raid device...

Can anyone familiar with these controllers and ZFS comment on what the best practice is for this type of setup?

Thanks.
 
You should configure them as SingleDrive instead of JBOD.

Code:
# tw_cli
//> info c0

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    SINGLE    OK             -       -       -       931.312   ON     OFF    
u1    SINGLE    OK             -       -       -       931.312   ON     OFF    
u2    SINGLE    OK             -       -       -       931.312   ON     OFF    
u3    SINGLE    OK             -       -       -       931.312   ON     OFF    
u4    SINGLE    OK             -       -       -       931.312   ON     OFF    
u5    SINGLE    OK             -       -       -       931.312   ON     OFF    

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     931.51 GB   1953525168    STF604MR
p1     OK               u1     931.51 GB   1953525168    STF604MR   
p2     OK               u2     931.51 GB   1953525168    STF607MH    
p3     OK               u3     931.51 GB   1953525168    STF604MR      
p4     OK               u4     931.51 GB   1953525168    STF604MR      
p5     OK               u5     931.51 GB   1953525168    STF604MR   
p6     NOT-PRESENT      -      -           -             -
p7     NOT-PRESENT      -      -           -             -

Name  OnlineState  BBUReady  Status    Volt     Temp     Hours  LastCapTest
---------------------------------------------------------------------------
bbu   On           Yes       OK        OK       OK       0      xx-xxx-xxxx

The "performance" setting would be insecure because you dont have bbu (iam not sure if "balanced" is ok for you). The "write cache" option in the bios means the cache on the disks itself. Without these cache the perfomance of the connected drives will have like 10% of normal throughput.
 
If you have good power coming into the building and a UPS, then enable the Performance profile and the write cache on the drives. The performance profile puts the cache on the controller itself into write-back mode (meaning the controller tells ZFS that data has been written to disk as soon as it's written to cache). Without the performance profile, the controller cache is put into write-through mode, where the controller waits for data to hit the disk before telling ZFS that it's on disk (IOW, it's a read cache).

You also don't want to use JBOD mode. In JBOD mode, all of the controller stuff is turned off (onboard cache, for example) and the controller becomes a plain SATA controller.

Not sure if you can switch 1 disk at a time from JBOD to Single. You may have to backup, flip the controller out of the JBOD mode, create the 4 SingleDisk "arrays", recreate the pool, and restore your data.

That's how we run all our 3Ware controllers with ZFS (write cache enabled on drives, performance profile, queueing, etc ... everything set to max performance).

Oh, and disable all verification tasks. Let ZFS handle that as well (zpool scrub).
 
Thanks all.. I gave this a try today.

I was feeling a bit giddy, so I tried to do this while the host was running. I used "zpool offline" to take one disk out of the pool, and then used 3dm to "delete" the drive and then re-add it as a "Single Drive". This almost worked... Two things happened. The real deal killer is that the drive shrinks by about 30MB. So, you know, that's a real no-no. The second issue I ran into is that the 3Ware does seem to wipe something because the gpt label disappeared. Things got real squirrelly on a reboot since da2 and da3 somehow switched places and zfs tried resilvering both drives at once, which seemed... odd.

Anyhow, I wiped and did another install (I'm down to about 15 minutes for a rootonzfs install), and all is well. I'm waiting on bonnie++ to tell me more now.
 
Interesting. I'm actually seeing slower read speeds in my bonnie tests. Does this seem a bit off for 4 1TB WD RE3 drives? It's two mirrors. WC is enabled. I believe that when I was on JBOD I was closer to 100MB/s on sequential reads:

Code:
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
h22 6G    38  83 69606  30 34679  15   104  94 65569  10 102.8   2
Latency              7467ms    2622ms    2250ms     956ms     147ms   79632ms
Version  1.96       ------Sequential Create------ --------Random Create--------
h22    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 10099  95 +++++ +++  8601  95  9828  95 +++++ +++  8965  95
Latency             26673us     154us     247us   61542us      67us     169us

This is 8.0/i386, 3GB RAM, kmem_max @ 1GB, zfs arc_max @400MB.
 
spork said:
Interesting. I'm actually seeing slower read speeds in my bonnie tests. Does this seem a bit off for 4 1TB WD RE3 drives? It's two mirrors. WC is enabled. I believe that when I was on JBOD I was closer to 100MB/s on sequential reads:
I'm seeing very good performance on a 3Ware 9650SE w/ 16 2TB RE4 drives. I haven't done a lot of tuning yet, but here's the zfs layout and bonnie++ results:

Code:
# zpool status
  pool: data
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        data        ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            da1     ONLINE       0     0     0
            da2     ONLINE       0     0     0
            da3     ONLINE       0     0     0
            da4     ONLINE       0     0     0
            da5     ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            da6     ONLINE       0     0     0
            da7     ONLINE       0     0     0
            da8     ONLINE       0     0     0
            da9     ONLINE       0     0     0
            da10    ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            da11    ONLINE       0     0     0
            da12    ONLINE       0     0     0
            da13    ONLINE       0     0     0
            da14    ONLINE       0     0     0
            da15    ONLINE       0     0     0
        logs        ONLINE       0     0     0
          da0       ONLINE       0     0     0
        spares
          da16      AVAIL   

errors: No known data errors

da1-16 are the 2TB RE4's. da0 is a 256GB PCI Express SSD.

Code:
# bonnie++ -d /data -s 100g -u root
[snip]
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
new-rz1        100G   116  99 362360  82 186155  51   357  99 576826  81 325.4   8
Latency             71939us    4867ms    4882ms   26216us    2079ms     647ms
Version  1.96       ------Sequential Create------ --------Random Create--------
new-rz1             -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 22364  99 +++++ +++ 20827  99 21995  99 +++++ +++ 19682  99
Latency             13484us     128us     151us   27468us     128us      92us
1.96,1.96,new-rz1,1,1277117797,100G,,116,99,362360,82,186155,51,357,99,
576826,81,325.4,8,16,,,,,22364,99,+++++,+++,20827,99,21995,99,+++++,+++,19682,99,
71939us,4867ms,4882ms,26216us,2079ms,647ms,13484us,128us,151us,27468us,128us,92us

In real-world usage, the system achieves > 450Mbyte/sec reads over a 24-hour period.
 
Thanks for the data Terry... You have a few more spindles than me though. :) How does the SSD drive for logs (ZIL?) work out?

I'm a little stumped as to why my reads are slower than writes, that's the main thing bugging me.
 
phoenix said:
If you have good power coming into the building and a UPS, then enable the Performance profile and the write cache on the drives. The performance profile puts the cache on the controller itself into write-back mode (meaning the controller tells ZFS that data has been written to disk as soon as it's written to cache). Without the performance profile, the controller cache is put into write-through mode, where the controller waits for data to hit the disk before telling ZFS that it's on disk (IOW, it's a read cache).

You also don't want to use JBOD mode. In JBOD mode, all of the controller stuff is turned off (onboard cache, for example) and the controller becomes a plain SATA controller.

Not sure if you can switch 1 disk at a time from JBOD to Single. You may have to backup, flip the controller out of the JBOD mode, create the 4 SingleDisk "arrays", recreate the pool, and restore your data.

That's how we run all our 3Ware controllers with ZFS (write cache enabled on drives, performance profile, queueing, etc ... everything set to max performance).

Oh, and disable all verification tasks. Let ZFS handle that as well (zpool scrub).

My machine has a pcix 3ware 8500 controller and the web gui only allows me to create drives as jbod or spare. How do I "flip the controller out of the JBOD mode"?
 
ytotk said:
My machine has a pcix 3ware 8500 controller and the web gui only allows me to create drives as jbod or spare. How do I "flip the controller out of the JBOD mode"?

For clarity, system is 8.1, dual opteron, 16gb ram, 3ware 8506, 20xwd20eads. To install 3dm I executed:
Code:
cd /usr/ports/sysutils/3dm/ && make install clean
pkg_add -r 3dm
The 3ware raid card has little to no options in the bios other than array creation; nothing related to single, legacy, or jbod.

When in the 3dm web gui only 'jbod' or 'spare' are available options when activating an available individual drive.

I would attempt to flash the cards to the newest firmware but the 3ware website forwards to lsi.com which atm poorly documents and supports older 3ware cards. I can't find any firmware related info for these cards on the web.

I will test 9500 series and newer cards very soon to identify the performance impact of jbod versus otherwise.

Am I correct to assume that for strictly sequential reads prefetch and a tuned arc are more relevant than controller cache, write cache on drives, performance profile, queueing, etc?
 
The 3ware 8xxx series dont support single drive units.

Only the 3ware 9xxx series supports single drive units.
 
Sorry to resurrect this post. I have an old 3ware controller which I used to support single drive units to create a ZFS volume. I'm wondering if my 3ware controller ever dies, can I replace it with another JBOD controller to save the volume or am I just shit out of luck? Please advise.
 
Sorry to resurrect this post. I have an old 3ware controller which I used to support single drive units to create a ZFS volume. I'm wondering if my 3ware controller ever dies, can I replace it with another JBOD controller to save the volume or am I just shit out of luck? Please advise.
If you export the raw drives (an option on some 3ware controllers) then they should be usable on other controllers that also support bare drives. Be sure to use position-independent naming (for example, # glabel label ...) instead of relying on /dev/da0, etc. Single unit volumes tend to have metadata on the drive which may or may not cause problems (if the metadata is at the beginning of the disk, "sector 0" will be shifted; if it is at the end of the disk you should be OK). I don't know which method 3Ware uses.

One thing to be aware of with volumes on 3Ware controllers is that at least some older controllers (9500) did a persistent lock on the component drives, so the volume needs to be deleted on the 3Ware controller before the drives will be accessible on other controllers. Deleting the volume will likely make the user data inaccessible. See (for example) here.
 
Sorry to resurrect this post. I have an old 3ware controller which I used to support single drive units to create a ZFS volume. I'm wondering if my 3ware controller ever dies, can I replace it with another JBOD controller to save the volume or am I just shit out of luck? Please advise.

Yes, if you create "Single Drive" arrays on a 3Ware controller, you can move those disks to other controllers. We've done this using 3Ware 9550SXU-4LP and 9560SE-8LP controllers in Linux using mdadm-based software RAID. The original controller/drives were on Tyan motherboards with "broken"/bad onboard SATA controller; they were migrated to a system with SuperMicro motherboard and working onboard SATA controller (no 3Ware). mdadm just picked up and carried on as per normal.

No reason it shouldn't be the same with FreeBSD.
 
Can anyone familiar with these controllers and ZFS comment on what the best practice is for this type of setup?

Thanks.

Don't use ZFS with hardware RAID controllers period. By the way those are bad proprietary hardware RAID controllers anyway but some idiots like myself still have them.



Code:
root@neill-backup:~ # tw_cli info c0
Unit  UnitType  Status  %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0  RAID-1  OK  -  -  -  698.481  ON  OFF
u1  SINGLE  OK  -  -  -  1862.63  ON  OFF
u2  RAID-5  OK  -  -  64K  1862.61  ON  OFF

Port  Status  Unit  Size  Blocks  Serial
---------------------------------------------------------------
p0  OK  u2  465.76 GB  976773168  5QG00XF0
p1  OK  u2  465.76 GB  976773168  5QG00WTH
p2  OK  u2  465.76 GB  976773168  5QG00WSS
p3  OK  u2  465.76 GB  976773168  5QG00WXC
p4  OK  u2  465.76 GB  976773168  5QG03BNV
p5  OK  u1  1.82 TB  3907029168  MN1270F33AEPMD
p6  OK  u0  698.63 GB  1465149168  3QD09B7M
p7  OK  u0  698.63 GB  1465149168  3QD08EP2

Name  OnlineState  BBUReady  Status  Volt  Temp  Hours  LastCapTest
---------------------------------------------------------------------------
bbu  On  Yes  OK  OK  OK  0  xx-xxx-xxxx


Code:
root@neill-backup:~ # zpool list
no pools available


Code:
root@neill-backup:~ # mount
/dev/da0p2 on / (ufs, local, journaled soft-updates)
devfs on /dev (devfs, local, multilabel)
/dev/da1p1 on /backup (ufs, local, journaled soft-updates)
/dev/da2p1 on /attic (ufs, local, journaled soft-updates)
 
Last edited:
Don't use ZFS with hardware controllers period.
I assume you meant "with hardware RAID controllers"?
By the way those are bad proprietary hardware RAID controllers anyway but some idiots like myself still have them.
Why do you dislike them? I have a number of systems with them and they seem fine to me. I particularly like the 3-LED-per-drive-bay status indicators (I²C between the controller and the backplane) and their web-based management utility.

Having said that, they are an older generation of controller and the world has moved on, mostly to LSI controllers. Those work fine, too.
 
Why do you dislike them? I have a number of systems with them and they seem fine to me. I particularly like the 3-LED-per-drive-bay status indicators (I²C between the controller and the backplane) and their web-based management .
For starters they are difficult to monitor. A good hardware RAID controller should have a simple non proprietary API which can easily use to pool device, get the status, and do hot replacement. See OpenBSD man pages for ARECA controllers. I have to be able to see the status of the device from central monitoring tool. This is very good first read for anybody trying to learn about hardware RAID

http://www.openbsd.org/papers/bio.pdf
 
tw_cli is the command-line tool that provides all the same information as the web gui (3dm2).

And there are Nagios plugins that allow you to monitor the status of the arrays (check_3ware). We use that. It does require a simple patch to make "VERIFYING" a "known good" state, as the current check_3ware script lists that as "UNKNOWN".

tw_cli is available on FreeBSD and Linux, and the check_3ware script runs on both OSes.

The 3ware/LSI/Areca/whoever-buys-them-next RAID controllers are decent. The main reason we stopped using them is that they don't support online expansion of RAID arrays (meaning using 250 GB disks to create a RAID6 array, then replacing all the disks with 1 TB drives). You can migrate between RAID types, but you can't change the total amount of space in an array without destroying it and building a new one. :( Plus, software RAID (ZFS on FreeBSD and mdadm on Linux) using onboard SATA ports or LSI controllers is actually faster these days.
 
Back
Top