New Disk Problems

OK,
I've been reading these forums and the mailing list and anything else I can find for about 3 weeks now. No matter what I do, I cannot get anything above 1 mb/sec on any of the 4 drives I just purchased. All 4 drives are the same. Seagate ST2000DL003 in a SansDigital TR8M enclosure. The controller is a HighPoint 622 dual eSATA.
I've tried both FreeBSD 8.2 and now I'm on 9.0 RC1. I can create a partition and format the drive, I can use gstripe or zfs (yes the ashift is 12) and I'll have a filesystem on the drives, yet when I run bonnie or dd, I get absolutely awful performance no matter how I create things. Even if I just partition with gpart, and run newfs and mount a single drive the performance is about 700kb/sec.
I can take the same box and controller and move it to a CENTOS or Windows system, and performance is where it should be.

Here's info from one drive:
Code:
alexandria# gpart list ada7
Geom name: ada7
modified: false
state: OK
fwheads: 16
fwsectors: 63
last: 3907029134
first: 34
entries: 128
scheme: GPT
Consumers:
1. Name: ada7
   Mediasize: 2000398934016 (1.8T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0

alexandria# diskinfo -v ada7
ada7
        512             # sectorsize
        2000398934016   # mediasize in bytes (1.8T)
        3907029168      # mediasize in sectors
        4096            # stripesize
        0               # stripeoffset
        3876021         # Cylinders according to firmware.
        16              # Heads according to firmware.
        63              # Sectors according to firmware.
        6YD0G9RG        # Disk ident.

I even tried just doing 'newfs -U -f 4096 /dev/ada7' and mounting it after that with no partition. No difference.
Any help would be appreciated.

Thanks!
 
Hi,

you might be overworking your partitioning and filesystem settings. I would suggest that you destroy all partitioning, then begin with testing IO directly towards a raw device, like how fast is:
# dd if=/dev/zero of=/dev/ada7 bs=1m count=1000
to determine where your bottleneck is. Is it partition/filesystem settings or is it hardware/driver settings?

/Sebulon
 
Great idea.
Here we go:

Code:
alexandria# dd if=/dev/zero of=/dev/ada7 bs=1m count=1000
1000+0 records in
1000+0 records out
1048576000 bytes transferred in 10.021938 secs (104628066 bytes/sec)
104 megabit/sec which is about what I'd expect for this drive.

So, whats the next step?

EDIT - I wiped the partition table with: [cmd=]dd if=/dev/random of=/dev/ada7 bs=512 count=1024[/cmd]
 
Bits and bytes man, watch it;) 104 megabytes

Cool, so then you have start wondering about things like drivers and stuff, since we´ve ruled out an hardware issue.

What´s your output of:
# uname -a
# more /boot/loader.conf
# more /etc/sysctl.conf
(if you have fiddeled with sysctls)

Is AHCI enabled in BIOS?

What´s your computers hardware?

/Sebulon
 
Sorry about that.. You're right. MB/sec

Heres the info you asked for:

Code:
alexandria# uname -a
FreeBSD alexandria.naebunny.net 9.0-RC1 FreeBSD 9.0-RC1 #0: Wed Oct 26 04:35:36 EDT 2011     
derwood@alexandria.naebunny.net:/usr/obj/usr/src/sys/ALEXANDRIA  i386
alexandria# cat /boot/loader.conf
ahci_load="YES"
geom_stripe_load="YES"
vfs.zfs.prefetch_disable=1
vm.kmem_size="330M"
vm.kmem_size_max="340M"
vfs.zfs.arc_max="40M"
vfs.zfs.vdev.cache.size="6M"
alexandria# cat /etc/sysctl.conf
# $FreeBSD: src/etc/sysctl.conf,v 1.8.40.1 2011/09/23 00:51:37 kensmith Exp $
#
#  This file is read when going to multi-user and its contents piped thru
#  ``sysctl'' to adjust kernel values.  ``man 5 sysctl.conf'' for details.
#

# Uncomment this to prevent users from seeing information about processes that
# are being run under another UID.
#security.bsd.see_other_uids=0

alexandria# zpool status
  pool: dvds
 state: ONLINE
 scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        dvds        ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            ada1    ONLINE       0     0     0
            ada2    ONLINE       0     0     0
            ada3    ONLINE       0     0     0
            ada4    ONLINE       0     0     0
            ada5    ONLINE       0     0     0
            ada6    ONLINE       0     0     0

errors: No known data errors


The system is pretty basic. Biostar TF720+ motherboard with AMD Athlon 4850e dual core CPU
2 gigs DDR2, One 160 gig drive for the OS, six 1TB ST31000333AS using the on-board SATA. The on board SATA is set for AHCI. I did not try to use the RAID in the BIOS at all. The 6 1TB drives are in a raidz1 array. Video is on-board through a KVM. I've added the SansDigital box to the system in ths hopes of eventually adding 4 more drives for a total of 16TB.
Performance with the existing 6 x 1TB array has been fine all along. Never had a problem with it.
I am considering adding another 2 gig and recompiling it all as 64 bit to improve speed even more.
I have geom_stripe loaded because I was testing with it to see how it performed compared to ZFS stripe.
 
Code:
alexandria# kldstat
Id Refs Address    Size     Name
 1   13 0xc0400000 ea4318   kernel
 3    1 0xc6abc000 173000   zfs.ko
 4    1 0xc6c2f000 3000     opensolaris.ko
 5    1 0xc92aa000 7000     geom_stripe.ko
 6    1 0xc937a000 4000     geom_nop.ko

The kernel is 100% stock.. No changes at all.
 
According to:
http://www.sansdigital.com/towerraid/tr8mb.html
...the host adapter card is based on Silicon Image chipset...
I looked through the manual and found that it is a SiI3132R5 which is supported by the siis driver. However you have said:
The controller is a HighPoint 622 dual eSATA
To that I found this from mav@
Both 62x and 64x RocketRAIDs seems to be based on same Marvell 6Gbps SATA chips. ahci(4) driver can work with them if you add their PCI IDs to the list of supported. ID's of 62x was already added to 8.2-RELEASE
So that seems to be in order. But to rule out any issue with the Highpoint controller together with the SATA port multiplier in the JBOD, you could also try to install the HBA you got with package and have:

/boot/loader.conf
Code:
siis_load="YES"
But since it already gave you 100+MB/s when using dd, it would seem less likely that that would be the cause.

You should definitely switch to an amd64 install. When doing that, you should also comment out the kmem and arc values in loader, like:
Code:
vm.kmem_size="330M"
vm.kmem_size_max="340M"
vfs.zfs.arc_max="40M"
vfs.zfs.vdev.cache.size="6M"
Because amd64 can tune stuff like that fine on it´s own.

Most importantly is the way you set up your pool afterwards. This is how I would do it:
# zpool destroy pool
# touch /usr/local/bin/cleandrives
# ee /usr/local/bin/cleandrives
Paste in:
Code:
#!/bin/sh

if [ -z "$1" ]
  then
    echo "Usage: `basename $0` drive1 drive2 ..."
  exit
fi

drives="$*"

verifydrives()
  {
    for drive in $drives
      do
        if [ `ls -l /dev/ | grep -w $drive | wc -l` = "0" ]
          then
            echo "Drive $drive does not exist. Aborting."
            exit
          else
            echo "Drive $drive verified."
        fi
      done
  }

seeksector()
  {
    blocksize=`dmesg | grep -w $drive | grep -oe '[0-9]\{8,\}'`
    mbsize=`echo "$blocksize / 2048" | bc`
    echo "$mbsize - 10" | bc
  }

cleandrives()
  {
    for drive in $drives
      do
        dd if=/dev/zero of=/dev/$drive bs=1M count=10 >/dev/null 2>&1
        dd if=/dev/zero of=/dev/$drive bs=1M count=10 seek=`seeksector $drive` >/dev/null 2>&1
    done
  }

verifydrives $drives

echo ""
echo "This will irreversibly destroy partition- and filesystem data on drive(s):"
echo "$drives"
echo ""
echo "USE WITH EXTREME CAUTION!"
read -r -p 'Do you confirm "yes/no": ' choice
  case "$choice" in
    yes) cleandrives $drives
         echo ""
         echo "Drive(s) cleaned."  ;;
     no) echo ""
         echo "Cleaning cancelled."; break ;;
      *) echo ""
         echo "Cleaning cancelled."; break ;;
  esac
Save and then make executable with:
# chmod 755 /usr/local/bin/cleandrives
# rehash
# cleandrives ada7 ada8 ada9 ada10
It will erase the first- and last 10MB´s on them. Be careful to type the correct disks, it doesn´t have any undo;)
# gnop create -S 4096 ada7
# zpool create pool raidz ada7.nop ada{8,9,10}
# zpool export pool
# gnop destroy ada7.nop
# zpool import pool
# zdb pool | grep ashift
(It´s supposed to show 12)

Then you can try to copy stuff over from your old pool to the new to test performance.

/Sebulon
 
Actually, the RocketRaid 622 came in the box with the SansDigital.. I have since purchased a Silicon Image 3132 card to see if the drives would work any better with it. But, the Highpoint was definitely in the box with the enclosure. Here is the page from NewEgg: http://www.newegg.com/Product/Product.aspx?Item=N82E16816111168

I followed your instructions and cleaned the 4 drives.
I then built the array exactly as you listed.

Here's output from zpool iostat:
Code:
alexandria# zpool iostat pool 5 100
               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
pool         245M  7.28T      0     23    641   728K
pool         245M  7.28T      0      9      0  1.20M
pool         257M  7.28T      0     21      0  1.05M
pool         265M  7.28T      0     26      0  1.49M
pool         275M  7.28T      0     34      0  1.25M
pool         282M  7.28T      0     43      0  1.24M
pool         287M  7.28T      0     12      0   597K
pool         287M  7.28T      0      6      0   845K
pool         287M  7.28T      0      1      0   256K
I was running
Code:
dd if=/dev/zero of=/pool/zerofile bs=1m count=500
Which is exactly what I've been encountering all along.

ashift is set to 12:
Code:
alexandria# zdb pool | grep ashift
                ashift: 12
                ashift: 12

I've been a FreeBSD user since 4.10, and I've never had an issue like this.. I don't know if its the drive or the controller or the OS. Its baffling.
 
I had problems with the deeper C states where the latency to wake up was what hurt performance a lot. Could you check the cx values of the CPUs and maybe try again without the powerd for a test run? I had to disable acpi throtteling on my AMD machine to keep it responsive when powerd was running, it was also prone to freeze up, maybe due to missed interrupts.
 
@derwood

You´re always testing with dd from /dev/zero. Also try just using regular cp together with time, like:
# time cp /oldpool/something.iso /newpool/
Or use a file from your old pool together with dd, like:
# dd if=/oldpool/something.iso of=/newpool/something.iso bs=1m
Also test read speed afterwards:
# dd if=/newpool/something.iso of=/dev/zero bs=1m
Do they differ?
# zfs snapshot pool@test
# zfs send pool@test | zfs recv -d pool/test
How much IO does that generate?

You could install gkrellm to monitor with, or watch disk IO with gstat. I would also look at top at the same time to see if something is putting too much load on the CPU while copying.

To exclude the multipliers in the JBOD, you could connect two disks directly to the 2xesata, set up a smaller pool with just those two and test again where your bottleneck is.

/Sebulon
 
@Sebulon

I've tried copying files and the performance is no different whether dd, cp, or anything else is being used to write to these new Seagate disks.
CPU is mostly idle. Top never shows anything much over about 2 percent. I've also been using gstat to monitor things and performance shows the same as zpool iostat. Nothing over about 1.5 MB/sec.
I tried taking one of the drives and putting it in a Dell GX620 that had 9.0 BETA3 running and there was no difference in performance. The SATA chip in that system was an ICH7 if I remember right.
I've also tried the SansDigital box with both the RocketRaid 622 and the Silicon Image 3132. No difference.
Read speed is just as slow as write speed.

I've ordered 4 gigs of memory so I can switch this thing over to 64 bit. It should get here next week some time. Maybe that will change things. Who knows?



@Crivens

Code:
alexandria# sysctl dev.cpu |grep cx
dev.cpu.0.cx_supported: C1/0
dev.cpu.0.cx_lowest: C1
dev.cpu.0.cx_usage: 100.00% last 6498us
dev.cpu.1.cx_supported: C1/0
dev.cpu.1.cx_lowest: C1
dev.cpu.1.cx_usage: 100.00% last 41996us

I disable as much power management as I can in the BIOS.. I don't like it getting involved.
Plus, this AMD cpu is low power by default. Only 45 watt.. Its one of the reasons I bought it. Good idea though..
 
derwood said:
I've ordered 4 gigs of memory so I can switch this thing over to 64 bit. It should get here next week some time. Maybe that will change things. Who knows?
That is worth a try.
I see that you have set the controller to AHCI, maybe you want to retry normal IDE mode.
Using AHCI on my bigger box showed problems with the drives' firmware, after going back to pre-ahci all is well.

I disable as much power management as I can in the BIOS.. I don't like it getting involved.
Plus, this AMD cpu is low power by default. Only 45 watt.. Its one of the reasons I bought it. Good idea though..
The C't had some interesting articles about the sleep states and performance of external disks. Maybe you should search there a bit.

Being low power is also the reason I like to buy them - but I also like the system to consume as little energy as possible. What I would like to disable in the BIOS is the (buggy) ACPI. One has only to check the linux kernel source for comments in the ACPI parts to get a good idea what they think about some of the BIOS vendors in total and some system vendors in particular. Note to self: next purchase, check the proximity of $VENDOR to swearwords in *BSD and Linux kernel sources before buying.
 
Yea.. ACPI is a pain.. The Dell system I ran FreeBSD 9 BETA 3 on would not boot unless ACPI was disabled, and there was no option to do that in the BIOS.

I really doubt that it's a sleep state issue. I'm wondering if it's the drive needing a new firmware. I doubt sleep states because if there's no partition on the drive and I use dd to write to the raw device, speeds are in excess of 100MB/sec. As soon as there's a partition on the drive everything slows down to 1MB/sec even if it's only one drive. I'll try all over again when the new memory arrives.
 
derwood said:
I doubt sleep states because if there's no partition on the drive and I use dd to write to the raw device, speeds are in excess of 100MB/sec. As soon as there's a partition on the drive everything slows down to 1MB/sec even if it's only one drive.

Please show the partition layout that's being created. The Seagate ST2000DL003 is an Advanced Format drive with 4K sectors. When using gpart(8), use -a 4096 to get things aligned correctly, or just create partitions on boundaries evenly divisible by 4K.

That drive also has the mysterious SmartAlign, which
dynamically manage each individual miss-alignment condition within the hard drive firmware and without the host computer even knowing

Don't know what that means, exactly, or whether it might be a problem with a RAID controller.
 
@Wblock

Here it is:
Code:
alexandria# gpart show ada7
=>        34  3907029101  ada7  GPT  (1.8T)
          34        4062        - free -  (2M)
        4096  3907022848     1  freebsd  (1.8T)
  3907026944        2191        - free -  (1.1M)

I thought I was doing that all along. Is this not right?


Here's how I partitioned and formatted the drive:

Code:
alexandria# gpart create -s gpt /dev/ada7
ada7 created
alexandria# gpart add -t freebsd -a 4096 ada7
ada7s1 added
alexandria# newfs -U -f 4096 /dev/ada7s1
Reduced frags per cylinder group from 189440 to 189432 to enlarge last cyl group
/dev/ada7s1: 1907726.0MB (3907022848 sectors) block size 32768, fragment size 4096
        using 2579 cylinder groups of 739.97MB, 23679 blks, 47360 inodes.
        with soft updates

As far as the RocketRaid controller goes, I've flashed the firmware on the card to an AHCI version of the firmware, so it's no longer a RAID card unless I re-flash it. Also, I've tried a Silicon Image 3132 eSATA card that is non-RAID by default with similar results.
 
There's a mistake in the command to add a partition. It should be
# gpart add -t freebsd-ufs -a 4096 ada7
See % man gpart | less -p freebsd

Since you're still testing, try
# gpart add -t freebsd-ufs -b 1M -a 4k
That starts at the standard 1M position and should go as near to the end of the disk as possible, subject to 4096-byte alignment.
 
@Wblock
OK.. Here's the output from the changes you suggested:

Code:
alexandria# gpart create -s gpt ada7
ada7 created
alexandria# gpart add -t freebsd-ufs -b 1M -a 4k ada7
ada7p1 added
alexandria# newfs -U -f 4096 /dev/ada7p1
Reduced frags per cylinder group from 189440 to 189432 to enlarge last cyl group
/dev/ada7p1: 1907728.1MB (3907027080 sectors) block size 32768, fragment size 4096
        using 2579 cylinder groups of 739.97MB, 23679 blks, 47360 inodes.
        with soft updates


alexandria# gpart show ada7
=>        34  3907029101  ada7  GPT  (1.8T)
          34        2014        - free -  (1M)
        2048  3907027080     1  freebsd-ufs  (1.8T)
  3907029128           7        - free -  (3.5k)

Heres output from gstat while trying to copy an ISO file onto the newly-formatted drive:

Code:
dT: 1.001s  w: 1.000s  filter: ada7
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
   13      5      0      0    0.0      5    639   1426  110.0| ada7
   13      5      0      0    0.0      5    639   1426  110.0| ada7p1

I'm really wishing the drive manufacturers hadn't come up with this advanced format garbage.
 
Advanced Format was kind of necessary with drives getting so large. It would be nice if the firmware didn't try to pretend it had 512-byte sectors.

The test above seems to show that sector alignment isn't the problem. -f 4096 to newfs(8) isn't default, and the man page cautions against changing the blocksize/fragment ratio. That's worth a try, just
# newfs -U /dev/ada7p1

Other than that, I'm out of ideas.
 
Just tried it and still no change.

I still have memory on the way that should arrive this week. I'll try adding the memory and installing the AMD64 version to see if there is any difference.
I know that if I use the SansDigital box with either controller in Linux, performance is normal, so I guess thats where I'll have to place it if the added memory doesn't work out.

Thanks everyone for your time.

BTW - Here's raw device writes:

Code:
alexandria# dd if=/dev/zero of=/dev/ada7 bs=1m count=1000
1000+0 records in
1000+0 records out
1048576000 bytes transferred in 10.022792 secs (104619151 bytes/sec)

gstat output
dT: 1.001s  w: 1.000s  filter: ada7
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    1    801      0      0    0.0    801 102553    1.1   89.5| ada7
 
OK. I finally figured it out.
I took the RR622 card out and put the Silicon Image 3132 card in.
I partitioned, formatted and mounted the drives and I get
Code:
1048576000 bytes transferred in 7.774410 secs (134875315 bytes/sec)

I made a stripe up with GEOM Stripe and I get pretty much the same speed. At this point I really don't care.

I had used the Silicon Image card when I was still running FreeBSD 8.2 and had not tried it with 9.0 RC1.
So, just in case anyone else encounters this, get rid of the HighPoint RocketRaid 622

I really would like to thank wblock, Crivens, and Sebulon for the ideas and help.
 
Thanks for following up on this. Please enter a PR about the RocketRaid card, that's something others should at least be warned about.
 
@derwood

Did you ever try this with the Highpoint controller?
To exclude the multipliers in the JBOD, you could connect two disks directly to the 2xesata, set up a smaller pool with just those two and test again where your bottleneck is.

Cause from what I´ve read, multipliers are generally not something you get away with, mix and matching. You tried the Sil controller with 8.2-RELEASE, which doesn´t have the siis driver, only 8-STABLE and up has that.
Then you tried the Sil controller again with 9.0-RC1 and presumably still had:

/boot/loader.conf
Code:
ahci_load="YES"
which pulled the siis driver in and then the controller and multiplier played together nicely. Therefore, it would be very interesting to know exactly where the Highpoint controller starts acting up. Could you please humor me by testing two drives connected directly to the Highpoint controller as well?

Or perhaps, if one wants to use a Highpoint controller together with Sil multipliers, you have to have both:

/boot/loader.conf
Code:
ahci_load="YES" (which takes care of the controller)
siis_load="YES" (which takes of the multipliers)

/Sebulon
 
@Sebulon

I don't have a single eSATA enclosure to test with and the RocketRaid card does not have internal connectors. It's eSATA only. I'll see if I can get a cable to convert eSATA to internal SATA and try it that way. I'll see what I can find.

Darin -
 
Back
Top