Single SSD zpool ashift?

jem · Mar 19, 2013

I've just bought my first SSD and intend to use it as the boot drive in a FreeBSD system, in the form of a single-disk zpool.

I'm already aware of the need to align my freebsd-zfs partition and will start it at 1MiB. Would it also be a good idea to use gnop(8) to trick ZFS into using a different ashift value for the pool? The SSD reports a 512 byte sectors size.

Ta.

kpa · Mar 19, 2013

Not worth the trouble for just the operating system. Where the correct block size would help is in write speeds and with just the basic OS there's not much to gain. Align the partition that holds the pool at 1 megabyte and that's it.

wblock@ · Mar 19, 2013

Sector size is not the real block size. That shows up in diskinfo(8) as stripesize. The SSDs I have show a block size of 4K:

Code:

% diskinfo -v ada0 | grep stripesize
	4096        	# stripesize

That suggests setting ashift=12 would be a good idea. No idea how much of a difference it makes in performance. Benchmarks would be welcome.

jem · Mar 19, 2013

diskinfo(8) reports a stripesize of zero for this SSD:

Code:

root@filer:/root # dmesg | grep ada4
ada4 at ahcich5 bus 0 scbus5 target 0 lun 0
ada4: <SanDisk SDSSDH120GG25 365A13F0> ATA-8 SATA 2.x device
ada4: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada4: Command Queueing enabled
ada4: 114473MB (234441648 512 byte sectors: 16H 63S/T 16383C)
ada4: Previously was known as ad14
root@filer:/root # diskinfo -v ada4
ada4
        512             # sectorsize
        120034123776    # mediasize in bytes (111G)
        234441648       # mediasize in sectors
        0               # stripesize
        0               # stripeoffset
        232581          # Cylinders according to firmware.
        16              # Heads according to firmware.
        63              # Sectors according to firmware.
        122032300309    # Disk ident.

On a different note, a post to the freebsd-fs list by PJD last year mentioned committing TRIM support for ZFS. I'm not sure whether this made it into 9.1-RELEASE, but 'sysctl -a' output doesn't include 'kstat.zfs.misc.zio_trim'.

Savagedlight · Mar 19, 2013

~~Didn't ZFS change to a standard of ashift=12 with 9.0 (or was it 9.1)?~~ No idea where I had that from... thanks @kpa.

kpa · Mar 19, 2013

No it didn't. The ashift detection is based on the reported block size of the disk(s).

It would be a very bad idea to default to ashift 12 because larger block size can waste a lot of space when most of the data is in very small files.

bthomson · Mar 31, 2013

I think it depends on the quality of the SSD controller. I've read something from the Intel team saying they spent a lot of time to make sure their SSDs could emulate 512 sectors without (much) performance loss. Probably some of the cheaper SSDs do not do as good a job with it.

Avery Freeman · Apr 22, 2018

Old thread, but timeless question...

Was curious about this myself so Googled this topic, turned up this thread. Noticed somebody said tests but didn't see any.

Quick cursory test on new pool, INTEL_SSDMCEAC180A3 SSD:

Code:

root@ubuntu:/dev/disk/by-id# zdb | grep ashift
            ashift: 9
root@ubuntu:/dev/disk/by-id# cd /rpool
root@ubuntu:/rpool# dd if=/dev/zero of=ashift_9_testfile count=1 bs=5G
0+1 records in
0+1 records out
2147479552 bytes (2.1 GB, 2.0 GiB) copied, 9.68508 s, 222 MB/s

# zpool create -o ashift=12 rpool /dev/disk/by-id/ata-INTEL_SSDMCEAC180A3_CVLI4082006X180C
root@ubuntu:/# zdb | grep ashift
            ashift: 12
root@ubuntu:/# dd if=/dev/zero of=ashift_12_testfile count=1 bs=5G
0+1 records in
0+1 records out
2147479552 bytes (2.1 GB, 2.0 GiB) copied, 1.81676 s, 1.2 GB/s

Not sure I trust that result, so I tested it again:

Code:

root@ubuntu:/# dd if=/dev/zero of=ashift_12_testfile count=1 bs=5G
0+1 records in
0+1 records out
2147479552 bytes (2.1 GB, 2.0 GiB) copied, 1.92221 s, 1.1 GB/s

Does anyone know what to make of that? That speed doesn't seem possible. The SSD does report 512b in fdisk.

phoenix · Apr 22, 2018

Never use /dev/zero to "benchmark" ZFS. ZFS is smart enough to compress zeroes into sparse files or to compress then away.

Instead, use /dev/urandom to first create a file of random (hopefully incompressible) data. Stick that onto a non-ZFS media. Then reboot to clear out all caches.

Then copy (and time) that file into the ZFS pool. Then delete it and reboot.

Repeat those steps, changing a single variable with each run.

Failure to reboot between tests will just be benching the ARC.

ralphbsz · Apr 22, 2018

In addition to what Phoenix said: To test the speed of actually writing, first write to the file, then perform fsync on it. At this point you are guaranteed that the file is actually on disk (on SSD). The write performance is the size of the file, divided by the time to perform the writes (like the dd you are using) plus the time for fsync.

Still, that test is probably very unrealistic. For most server-class machines, the workload is not writing a single large file sequentially. In many real-world scenarios, performance is dominated by many small files, or metadata. Sadly, finding or creating a file system benchmark that really resembles your workload would be a lot of work. So take your benchmarks with a grain of salt.