Solved What exactly does gpart -a do?

Chris_H · Nov 19, 2014

Greetings,
OK given that most of the information provided by drive manufacturers is largely manipulated to make them all seem better than they actually are. I'll not get into too much of that. But rather, attempt to dispel any inconsistent conceptions I may have regarding what gpart -a actually does.

To that end; on platters (PATA drives), SECTORS/BLOCKS are as slices in pie. The BLOCKs must all be of the same size. But that size can often be chosen. Most frequently; 512 bytes, or 4 KiloBytes. It is my current understanding that I may use gpart(8) to pick this size ( gpart -a). Ideally, the chosen BLOCK/SECTOR size reflects the size of the majority of the files that will occupy it. A 4k SECTOR/BLOCK size is best suited for larger files, where 512b, is best for many small files. Drive manufacturers of more recent models, assume that most consumers use their drives primarily to store porn, and ripped MP3s. So they manufacture them with a default SECTOR/BLOCK size of 4k. Whereas in the "olden" days, porn, and ripped MP3s were harder to come by. So they sent them out primarily with a 512b SECTOR/BLOCK size.

Now that that's out of the way. Let's get to the point.

Can I accomplish the same with gpart(8) across an entire drive? As in
gpart destroy -F da0 (this is a PATA, on a USB==>IDE converter.
gpart create -s GPT da0
gpart add -t freebsd-ufs -a 512b -l gptdrive da0
leaving me with 1 slice/partition across the entire drive, aligned at 512 bytes?
Followed by newfs -U /dev/gpt/gptdrive as a general rule.

This is the way I've done it -- of course not aligned @512b.

But on a recent partition/newfs on a USB SSD I attempted to use -a 1M. Thinking that I'd have a drive aligned on a One MegaByte boundary. But it appeared to end up on a 512b boundary.

So. Do I really have this right. Or have I just been lucky all these years.

Thank you for all your time, and consideration.

--Chris

phoenix · Nov 19, 2014

-a just sets the start of the partition to a multiple of the size given. It has nothing to do with selecting the sector size (that's set by the drive manufacturer).

Meaning, -a 1M will start the partition at the next 1 MB boundary.

For example, a generic GPT partition table would have a freebsd-boot partition, a freebsd-ufs partition, and a freebsd-swap partition. The boot partition is generally 128K but can go up to 512K without issues. You can add that partition without -a and it will use the first available sector after the GPT header and table and whatnot (I believe that's sector 34, which is not a multiple of 512B or 4K or 1M, etc).

Use -a 1M for the UFS partition, and it will start it at whichever sector starts at 1 MB into the disk (sector 2048 for a 512B disk; sector 512 for a 4K disk; sector (1 MB / sector size) for other drives).

The reason for using -a 1M is that regardless of whether the disk presents 512B sector (physical or logical) or 4K sectors (physical or logical), 128K sectors (some SSDs), or 1 MB sectors (some SSDs), they'll all work correctly and optimally as they are all factors of 1M.

Makes sense?

wblock@ · Nov 20, 2014

-a is just a rounding value. If the values are not an integer multiple of the -a value, they are rounded up to the next multiple. There is some interconnection between -a and the partitioning scheme. GPT can put partitions anywhere. The MBR specification says that partitions must land on CHS boundaries. So try to use -a1M with MBR, and the first partition will end up at block 2079 instead of 2048 (1M).

There is now a sysctl() to override that, kern.geom.part.mbr.enforce_chs.

Chris_H · Nov 20, 2014

phoenix, wblock@
Thank you both, for taking the time to clarify/correct this for me!
So. It would appear there is not a way to set the block/sector sizes, and that whatever the drive says, is what you get. OK, maybe there's possibilities where zfs(8) is concerned. But even then, it's all on a higher layer. So. If I finally have all this correct. I'll limit my use of it (the -a), to setting up the initial partition/slice size (end of it), and omit the -a switch for any further actions on the drive.

Thank again, to you both, for all your time, and trouble!

--Chris

wblock@ · Nov 20, 2014

gpart(8) will only accept -a where it is allowed. There is no harm in using it any time partitions are being added or resized.

Of course, it's not required. If you calculate the values out by hand, -a is not necessary. But check them in the gpart show output afterward, because partitioning schemes like MBR can cause surprises.

ralphbsz · Nov 20, 2014

Chris_H said:
OK given that most of the information provided by drive manufacturers is largely manipulated to make them all seem better than they actually are.

Actually, that's false. Disk drive manufacturers do indeed use terms that make their drives look good. For example, they have the habit of measuring capacity in metric gigabytes or terabytes, rather than in binary. which makes an 6-8% difference. Their statements about capacity are true, but they set unrealistic expectations: If I buy a 1 terabyte disk drive, I might expect to be able to store 1024 files, each of which are 1024x1024x1024 bytes long, but the drive will fill up when I have written 10^12 bytes. The drive vendor is not being dishonest, and their statements about drive capacity are true. Similarly, their statements about performance, reliability and error rates are nearly always true, and when they are false, they tend to get into big trouble with large customers. I've seen disk drive manufacturers (name withheld to protect the guilty) have to refund the purchase price on tens of thousands of drives at a time, and then pay their customer millions of $ to buy drives from the competition. For this reason, manufacturers work really hard to be honest in verifiable statements. Now, their glossy sales literature sometimes has beautiful pictures of green meadows, with cute blond girls holding flowers and deer contentedly munching grass, and statements like "enterprise storage reliability and performance at commodity prices". Anybody who believes that you can get all three of good/fast/cheap is as dumb as someone who thinks that their new disk drive comes shipped with bambi or barbie. And the marketing departments of disk drive vendors hope to exploit dumb customers

what gpart -a actually does.

See above in the answers by others. It simply tells gpart where to put partition boundaries.

But that size can often be chosen. Most frequently; 512 bytes, or 4 KiloBytes.

That's 99% wrong, and 1% right.

I'll refer to what you're talking about as the sector size. I know that the term is inaccurate, and standards documents call it the logical or physical block size, but the term sector is engrained in the language, and at a hardware level, the recorded data is actually partitioned into something that we'll call sector. Reading and writing a sector is typically an atomic operation, which is why the sector size is relevant to correctness. Consumer drives come pre-formatted with a sector size of 512 or 4096 bytes (4096 is more recent), although some drives use 4096 bytes in the hardware of the platter, but expose a 512 byte sectors on the interface (and fix the difference with some magic, which usually involves read-modify-write and the judicious use of NVRAM or flash, and is often known as 4096e or emulated drives).

Most drives allow the user to change the sector size, within limits. Sector sizes of 520 and 528 bytes are not uncommon in the enterprise market. With the advent of PI a.k.a. T10-DIF and similar data integrity mechanisms, the physical sector sizes have been increasing. The user is free to use a format command to change the sector size of the drive. On FreeBSD, the camcontrol format command does not directly support format commands that change the sector size, the user has to implement that with camcontrol cmd and assemble the command from the drive documentation (a bit tedious). On Linux, or after installing the sg_utils package on FreeBSD, this can be easily done with sg_format. But: the sector size can usually only be changed in a narrow range. On 512-byte sector disks, you can go up to 528 (I've never seen anyone use higher number), not to 700 or 800. On 4K sector disks, it gets even more tricky.

Ideally, the chosen BLOCK/SECTOR size reflects the size of the majority of the files that will occupy it. A 4k SECTOR/BLOCK size is best suited for larger files, where 512b, is best for many small files.

That depends on the definition of small files. There are systems where everything below 1GiB is considered a small file. I know of no serious production file systems where 4K is considered a large file. From a raw disk performance point of view, the performance threshold is often the cylinder: IOs are considered large if they involve multiple cylinders, because then a whole cylinder can be read or written independent of rotational position, where small IOs will have significant rotation delay. With typical cylinder sizes on modern disks being a quarter or a half MB, that shows the the boundary for small IOs is in the MB range. From a blocking point of view, the performance threshold is having to perform read-modify-write operations on an atomic block. The block size varies widely, most commodity file systems have at least a 4KiB VM page size as the smallest accessible unit, but once RAID gets involved, block sizes can be as large as 16 MiB. And RAID always gets involved on "large" systems, as anything larger than a few drives becomes too unreliable for production use if you want your file system to actually be persistent in the long run.

You have to remember: Sensible file systems are able to store small files in the metadata (typically inside the inode). So a 512 byte file may actually not be a file at all, as seen from the on-disk IOs.

Drive manufacturers of more recent models, assume that most consumers use their drives primarily to store porn, and ripped MP3s.

You are off by a few orders of magnitude there. And you are focusing on a part of the market that is visible to the amateur, but not seriously relevant to the industry as a whole.

To begin with, a typical ripped MP3 file is multiple megabytes in size. I just checked for fun on my home server: Of the 3567 ripped tracks (I'm a classical and band music fan with a large CD collection), only 120 or 3% are smaller than 1MiB. Unfortunately, I have no porn on my server (sadly), but looking at home movie clips (band performances, my kid's soccer games), the typical file size seems to be hundreds of MB. This is between 3 and 5 (decimal) orders of magnitude away from the 512 byte / 4 KB sector size transition.

Second, the market of consumers storing data on their desktop is beginning to be irrelevant to the industry. Today, most data is stored by enterprises (whether they're called "Blue Cross" or "Facebook"). And while in the end they also store documents (scanned paper documents, photos, and videos stored by youtube.com), they tend to use file and storage systems that are much more complex than what the individual end user sees (which is a commodity file system, whether it's called NTFS or UFS, using a single drive).

So they manufacture them with a default SECTOR/BLOCK size of 4k. Whereas in the "olden" days, porn, and ripped MP3s were harder to come by. So they sent them out primarily with a 512b SECTOR/BLOCK size.

The transition to 4K sectors was not really driven by the change in file sizes, and in particular not by the internet-delivered workloads of porn and music. For many years, file systems have been dealing in 4K VM pages anyhow, so the 512 bytes have not been relevant for a while. The transition was driven by disk hardware, in particular the much higher data densities (due to new heads, platter materials, and magnetics), which make short sectors terribly inefficient. Do I like the new 4K sectors? No, for me it is a hassle having to adapt to them. But they are a necessity if you want to store a half-dozen terabytes for a small stack of $20 bills.

But on a recent partition/newfs on a USB SSD I attempted to use -a 1M. Thinking that I'd have a drive aligned on a One MegaByte boundary. But it appeared to end up on a 512b boundary.

Your statement is hard to parse. If you align to a 1MiB boundary, it will also align to a 512b boundary, as 1MiB is an integer multiple of 512 bytes.

And: All I wrote above is relevant to what I lovingly call "spinning rust", namely traditional disk drives. You are using SSDs or memory sticks. While they also have physical block sizes, their hardware is very different, and has other characteristics. A very important consideration in them is "write amplification", which for serious use should probably the driving factor in block size and alignment decisions, given that other than read-modify-write operations on their physical blocks (which are usually way larger than the 512 bytes or 4K they expose on their interface), their performance is not heavily dependent on sequential versus random IO.

wblock@ · Nov 20, 2014

To extend this already-long discussion a little farther, alignment merely means making sure the hardware blocks on the disk (512 or 4096 bytes, usually) map directly to filesystem blocks. It is entirely possible--and regrettably easy--to have a disk with 4K blocks and a filesystem with 4K blocks, but the two do not map one-to-one. The filesystem writes what it thinks is a single 4K block, and the drive puts half of it in one 4K hardware block, and the rest in another. That is why misalignment is a big deal. Performance can be cut in half for writes, or worse.

Small files are not a huge deal, because most filesystems use some type of fragment or block suballocation. A single filesystem block can be divided up to store several small files or leftover pieces of files.

Chris_H · Nov 20, 2014

ralphbsz.
Thank you for such a meaningful reply.
While I largely agree, and am familiar with what you had to say. I would like to first point out that some of my derogatory remarks were off-the-cuff, and somewhat tongue-in-cheek (see; porn/ripped mp3's) -- I enjoyed your response to those BTW.

I would also like to clarify, that I was speaking quite broadly, and generally -- in the most *basic* of implementation. The topic was intended to attain clarification to what perceived as being a poor misconception on the implementation of gpart(8) on my part. Which I clearly had.

As to drive manufacturers. Yes. What they state is technically true. But I don't think it's coincidence that the nomenclature they chose, and the way they chose to present that nomenclature. Gives one a false impression of the drive. Sure. Fair enough. It's nothing new in any advertising campaign. None the less, it's deceiving. You decide if deceiving is truth, or lie.

In all fairness. There is so much translation going on within the drives mechanics, and electronics. My (possibly oversimplified) overview doesn't really apply very well, these days. But back in the old days. You had pretty much 2 choices; MFM, or RLL drives, and it was all pretty much, as I presented it above. But in these days, the cache on the drives, are larger than a computer itself could manage back then.
Sadly, there is no real rule-of-thumb, at least where PATA drives are concerned. They all vary quite largely in the way they manage the data read, and written. Raid, zfs(8), etc. Create another additional "layer" of complexity. So, while it's wonderful to think that the software abstraction layer helps to simplify the process. In the end, it's just another formula you need to apply to the already complex math equation to get the *hopeful* best performance. I guess in the end. The best rule-of-thumb is; know your hardware. In other words; get to know a drive brand/model really well. So that you can be assured that you're squeezing the most performance that it's capable of providing.
I rather like the SSD technology. I think it's quite a bit better than all it's predecessors, and should remove much of the ridiculous antique methodology still clung to from the olden days.

Thanks for putting so much thought, and effort into your reply, ralphbsz.

--Chris

Solved What *exactly* does gpart -a do?

Solved What exactly does gpart -a do?