Other How to get the greatest performance from hardware (drive)?

Chris_H · Nov 6, 2014

Greetings,
I hope the title was concise. I didn't want to make a long sentence. But still get the point across.
Here's my situation. I'm almost exclusively running SATA (PATA) drives. They are all capable of at least 6GB/s. But some of the MB's they're on aren't up to it (6GB/s). So I was wondering what kernel additions/tweaks, or sysctl(8) tuning I might be able to do, to get the most speed. I am not [currently] interested in ZFS(8), or other file systems. As I [currently] prefer, and am sometimes [obligated] to use UFS2(8).

Thank you for all your time, and consideration.

--Chris

P.S. These are at least RELENG_9 systems.

ralphbsz · Nov 7, 2014

I will give you two answers.

First: What is your goal? You must have a certain workload or application in mind. People don't tune their computer for the sake of tuning, but to get something concrete done. Say for example you are running an application that involves a web server and a database. Are you serving as many users as you need to? If yes, don't change anything: You can't improve "good enough", and there is a probability of breaking something. If you are not serving as many users as you need to: What is the bottleneck? What measurements have to taken to identify the bottleneck?

We won't be able to help you, unless you tell us what concrete application or use case you have in mind, and unless you tell us what your goals are. Do you want a system that is easy to manage and upgrade, without the risk of your tuning getting undone? Do you want to size and then buy the most cost-effective (cheapest) system that can do a certain workload? What factors enter into your cost (for example, do you have to pay for the time of sys admins and analysts as they do performance tuning)? Is risk of unstability an important factor for you?

Second: Disks are characterized by many metrics, and interface speed is just one of them, and usually not the important one. Modern SATA (not PATA!) disks can transmit 6 GBit per second (not GByte) over the SATA interface. In practice, that works out to the wire being able to do about 600 MByte/s. For spinning disks, the hardware itself (the head and platter) are much slower, typically 150 to 170 MByte/s, for large sequential IOs with deep queueing on the outside edge of the platter. Which means that it doesn't actually matter whether you use 3 GBit/s or 6 GBit/s SATA, the limiting factor is the mechanism itself. Even with SSDs, you are unlikely to get into a situation where the drive itself is limited by the interface (unless you are buying some pretty expensive SSDs). Obviously the situation is different with shared bus-style interfaces (SAS, Infiniband, Fibre Channel), where many drives have to share on link, but on SATA, that 3 or 6 GBit/s link is solely dedicated to one drive.

And even then, that 150-170 MByte/s figure is only for one particular class of IO, which is unlikely to occur in practice. Once you put a file system on top (which means extra seeks and small IOs for metadata, such as inodes or the moral equivalent), performance tends to go down. Furthermore, there are few applications that are really optimized to get the most out of a file system either; typical real-world applications use short IOs (short compared to tracks on the disk), and are typically unable to maximally exploit the hardware; but then, file system prefetching and write-behind tends to make up for that.

If you are interested in file system throughput, here's my rule of thumb: If you get to 70% or 80% of the hardware throughput (and remember to correctly average the throughput on the outside and inside edge of the platter), for real-world multi-user or multi-application workloads, you are doing excellently. For single-workload cases that are well tuned (synthetic micro-benchmarks, which typically don't resemble real-world use), you can often get to 90% of the hardware limit, or eve more.

So, what is your hardware, and what are its specifications? What are you trying to accomplish? What have you measured so far?

Chris_H · Nov 7, 2014

WOW, ralphbsz. That's really a mouthful. Quite a bit more than expected. Thanks for such a thorough breakdown. I'm afraid I may not have articulated my query as well as I might (should?) have. I was seeking things that the kernel itself provided, that I may not have enabled, or thought of. That might give the drive better, or perhaps, more accurately; as much performance as the hardware itself is capable of. Meaning; just because the SATA ports are "seen", and the drive is "recognized". Does not mean that I have chosen the best options the driver that is used has to offer -- assuming it's even the correct driver, and not just a generic one. For example, enabling AHCI (where available) does provide better performance (when the hardware is capable/exists). But blindly enabling it has some not-so-nice side effects -- the device names (drive/slice names in /dev) change. Which, if you don't know better, and don't adjust the entries in fstab(5), will leave you with an unbootable system.

Anyway, I hope my intentions are a bit clearer, and thank you very much, ralphbsz. For such an informative analasys.

--Chris

wblock@ · Nov 7, 2014

The ad(4) and ada(4) drivers have different behavior with device names, but that is not due to AHCI. As always, use GPT or filesystem labels, which remain static no matter how the device is connected.

There are filesystem parameters that can be tuned. For example, if a filesystem is going to have lots of small files, it might be useful to have more inodes. Benchmarking this sort of thing is tedious and mostly unrewarding.

Some people make smaller partitions on the faster part of a drive, the beginning blocks or outside edge. These partitions can be quite a bit faster due to the higher sector density.

My favorite performance improvement involves replacing a hard drive with an SSD. They are night and day. Much of this is due to access time, not maximum transfer speed.

Chris_H · Nov 7, 2014

wblock@ said:
The ad(4) and ada(4) drivers have different behavior with device names, but that is not due to AHCI. As always, use GPT or filesystem labels, which remain static no matter how the device is connected.

Really. A month ago I turned on AHCI on a drive, and after bouncing the box, I was sent to single user mode. With the kernel indicating it couldn't find/mount the boot device. No matter how hard I tried, I couldn't get it to boot correctly. Even tho I could manually mount the drive slices. I simply figured I wasn't correctly defining the /dev/ names. Ultimately, it was due for an upgrade. So I simply wiped the drive, repartitioned, using gpart(8), creating GFS, as apposed to MBR slices, and installed (restore(8)d) RELENG_9 from an image (dump(8)) I had created for the upgrade.
I enabled AHCI in loader.conf(5). Booted to the new system, and all was fine. So I just assumed it must have been something in the naming. Now I guess I don't know.

wblock@ said:
There are filesystem parameters that can be tuned. For example, if a filesystem is going to have lots of small files, it might be useful to have more inodes. Benchmarking this sort of thing is tedious and mostly unrewarding.

Yea. I considered that, and it looked like it would likely be a long, and arduous process. For an unknown reward, if any.

wblock@ said:
Some people make smaller partitions on the faster part of a drive, the beginning blocks or outside edge. These partitions can be quite a bit faster due to the higher sector density.

My favorite performance improvement involves replacing a hard drive with an SSD. They are night and day. Much of this is due to access time, not maximum transfer speed.

Couldn't agree more. I have a couple, and am looking forward to replacing everything with SSD's. I'm only waiting, because I'd like to get some more numbers. Where longevity, and speed, are concerned.

Thanks for taking the time to reply, wblock@.

--Chris

ralphbsz · Nov 8, 2014

Chris_H said:
Really. A month ago I turned on AHCI on a drive, and after bouncing the box, I was sent to single user mode.

Yea. I considered that, and it looked like it would likely be a long, and arduous process. For an unknown reward, if any.

That was exactly my point. Performance tuning is hard work. On a file system on a single disk drive for common desktop/development workloads, there is little performance benefit. The amount of work is large, and the risk is high. In the case of turning AHCI on, you experienced one particular risk.

Another risk comes from the fact that some "tuning" operations may require moving the data off and back on, which for the average home user (with just a few drives) is tedious, and has the risk of a mistake wiping everything out.

Now, if this was a large professional-strength file system product, where tunables could be changed online, which could handle multiple drives, supported online migration, and where there was considerable industry experience in tuning, then it would make sense. If you are a large bank, supercomputer center, or secretive government agency, and have spend tens or hundreds of millions on your storage and file system, you will be very willing to spend a few hundred thousand on consultants to tune it, and get 10% or 20% better performance out of it. For the single-disk desktop user, default settings tend to do pretty well.

I have a couple, and am looking forward to replacing everything with SSD's. I'm only waiting, because I'd like to get some more numbers. Where longevity, and speed, are concerned.

I have two SSDs in my home system, I boot from one, and use the other for backups, or an alternate boot during upgrades. The speed difference to hard disk is astonishing. The way I like to put it: In engineering, 10% improvement is a big deal; in astrophysics, you look for factors of 10. In that sense, SSDs are like astrophysics: they are an order of magnitude faster. And that's using a cheap consumer-grade SSD (an Intel device with a SATA port); the high-end enterprise SSDs are even more astonishingly fast.

BUT: You have to be careful with write endurance. The high-capacity SSDs that are shipping now are somtimes spec'ed to be overwritten on average once per day, and then are expected to give you a useful life (on average! no guarantees!) of 5 years. Since their interfaces are typically capable of delivering about 500 MByte/s, or 43 TB per day, if you are careless, you can overwrite say a 1TB SSD about 43 times a day, and just that simple back-of-the envelope calculation reduces their expected life span to a little over a month. In this is even before write amplification, which in the case of random 512 byte (sector) writes can be another huge factor. (In reality, many drive firmwares will balk, refuse the writes by reducing their speed, and maintain reasonable lifespan, and many modern SSDs expose 4K sectors). So when planning your file system layout, if you have a write-intensive part, it might be a good idea to segregate that onto a separate file system that uses either a spinning disk, or RAIDed and disposable SSDs, if you can afford that. I thought about it, and decided that the only write-intensive part of my root disk is /var/, and the total write traffic to it is a few MB per day, so on a multi-GB SSD the overall write rate is very little. But for other workloads (DVR for example), the answer might be radically different.

Also to consider: In terms of raw bandwidth (in units of MByte/s per $), spinning rust hard disks are still impossible to beat at large capacities, if your workload is suitable for the head/platter interface (large IOs and long sequential streams). So if you have something bandwidth intensive, it might be cheaper to buy a half-dozen discounted traditional disks and stripe them together, rather than one expensive SSDs. But beware of the overall reliability when you do that.

Again, all this really depends on the workload. For a normal desktop system (with e-mail, web, some documents, a few baby pictures being stored, and some development work), a single spinning disk with a stock file system is usually sufficient, and a good compromise between cost, convenience, and performance.

Chris_H · Nov 8, 2014

Greetings, ralphbsz, and thank you for such a thoughtful reply.
I'm working with production servers, for the most part. Which is what I'd be applying any "tweaks", additional hardware for. Most of your assertions regarding SSD's were my current line of thinking, and understanding(s). Just like their predecessors (spinning platters), the manufacturers are spinning all kinds of webs (lies, and misinformation). The actual size of a hard drive, for example vs actual capacity. Tweaking numbers, they were able to tell you the drive was larger, than it actually was. Not much has changed, where SSD's are concerned. They are pulling tricks to either limit your write abbilities, so they can meet their guarantee/warranty. Others are also failing them prematurely so you'll purchase a new one earlier. Raising their "bottom line". If you're clever, you can wipe the write count. Effectively "resetting" the SSD, as new.
Anyhow. These, and several other reasons keep me from heavily investing in SSD's, just yet. I'll bide my time, until I get enough stats, and numbers to make an "enlightened/informed" investment. Rather than just a purchase.

Thanks again, ralphbsz. For your thoughtful, and informative reply.
--Chris

Beeblebrox · Nov 8, 2014

Maybe not the answer you're looking for, but there are PCI cards that provide sata3 connection point and they don't cost that much AFAIK (just do an image search for "pci sata3"). Wİth such a card you get 6GB/s capability to the extent the PCI bus is able to support (it's usually more than capable). Nice way to extend the service life of your older mother boards.

Chris_H · Nov 8, 2014

Beeblebrox said:
Maybe not the answer you're looking for, but there are PCI cards that provide sata3 connection point and they don't cost that much AFAIK (just do an image search for "pci sata3"). Wİth such a card you get 6GB/s capability to the extent the PCI bus is able to support (it's usually more than capable). Nice way to extend the service life of your older mother boards.

Right. Good advice. I hadn't considered that, dunno why.

My books (financial records) indicate my next major purchase period, will be around the end of the first quarter of 2015. Which is when I had planned to upgrade most of the server motherboards/CPU's. As well as get any more boards to create additional servers, if needed. But I think your suggestion would be a good investment. It would give me an opportunity to compare against what ever is already provided on the new boards I'll be purchasing. Which will have to have STATA3, and USB3, anyway. But if the card provides better performance, than those available onboard. That'd be the way to go. I normally try to find boards that already have the "features" I'm looking for onboard. Because the BUS lines are more closely tied to the CPU. But often the type, or brand provided on those boards, aren't as good as those provided on the boards that are created for that feature itself. I should really perform tests myself. So I can get hard, meaningful numbers.

Thanks for the suggestion, Beeblebrox! I'm on it!

--Chris