ZFS SAS drives and non-native block size

A0101 · Oct 12, 2015

Hi, this is my first time installing FreeBSD on an old HP server with 3g SAS drives (previously just normal desktop HDDs). I used Auto ZFS for FreeBSD 10.2-RELEASE installation that has 2 disks mirroring. After that, zpool status command gives the unwelcoming message:

Code:

$ zpool status
  pool: zroot
 state: ONLINE
status: One or more devices are configured to use a non-native block size.
  Expect reduced performance.
action: Replace affected devices with devices that support the
  configured block size, or migrate data to a properly configured
  pool.
  scan: none requested
config:

  NAME  STATE  READ WRITE CKSUM
  zroot  ONLINE  0  0  0
  mirror-0  ONLINE  0  0  0
  da0p3  ONLINE  0  0  0  block size: 8192B configured, 1048576B native
  da1p3  ONLINE  0  0  0  block size: 8192B configured, 1048576B native

errors: No known data errors

At this time I'm not sure what to do. Is there really problems if I just ignore it? Or anything else to work around?

Thanks for your input.

usdmatt · Oct 12, 2015

This is a very similar situation to a post from a few days ago.

Can you post the output of the following:

Code:

diskinfo -v da0

The controller is reporting a block size (or possibly stripe size) of 1MB, which clearly isn't the block size of the actual disks. The maximum supported by ZFS is 8k, which it why it has that number configured. It will work but I don't know if there will be an impact on performance. Even with an 8k block size you may notice quite a lot of storage overhead as all data, including metadata, will take up at least an 8k block. (Edit - I'm not 100% certain whether the overhead problem affects mirrors, most reports seem to be related to raidz)

Obviously the ideal would be to configure the hardware (if possible) to just pass the disks through raw. Anything that gets in the middle and is modifying the interaction between disk/ZFS isn't ideal.

It would be interesting to see what other people think, and if possible for you to actually test different block sizes. If the disks are advanced format (4K), then it may actually be beneficial to force it to use a 4k block size instead of 8k.

Generally speaking, unless you can pass the disks through raw or change hardware, you're probably going to have to live with the non-native block size warning.

A0101 · Oct 12, 2015

Code:

# diskinfo -v /dev/da0p3
/dev/da0p3
  512  # sectorsize
  144629039104  # mediasize in bytes (135G)
  282478592  # mediasize in sectors
  1048576  # stripesize
  0  # stripeoffset
  34617  # Cylinders according to firmware.
  255  # Heads according to firmware.
  32  # Sectors according to firmware.
  QL77MU5850  # Disk ident.

Each disk was configured as a raid-0 array in hardware controller settings.

usdmatt · Oct 12, 2015

Yeah, it's as I expected. The sectorsize is actually being reported as 512B by the disk, which is the standard size for traditional, non-advanced format disks. However, the stripesize is listed as 1MB, which is probably coming from the controller.

As you may know, there's been problems in recent years because advanced format (4K) disks have been advertising a sectorsize of 512B in order to be compatible with systems that don't support 4K disks. We had to build up a database of known 4K disks so that FreeBSD could correctly use a 4K sector size for disks that were 4K, even when they advertised themselves with sectorsize=512.

In recent years quite a few manufacturers have starting making use of the stripesize attribute, which is usually 0 on a plain disk, to hold the real sector size. See this output from one of my WD-RED disks:

Code:

        512             # sectorsize
        2000398934016   # mediasize in bytes (1.8T)
        3907029168      # mediasize in sectors
        4096            # stripesize

It's advertising sectorsize=512 in order to maximise compatibility, but it's also put 4K in the normally unused stripesize field to let newer systems know that it's really a 4k disk. FreeBSD now actually uses that stripesize attribute when creating ZFS pools.

Unfortunately because your disk is behind a RAID controller, the controller is overriding the stripesize to be 1M (I don't *think* this is coming from the disk) so ZFS wants to use a 1M sector size even though the disks will be 512B or 4K.

Ideally I would try and confirm the real sector size of the disks, and test using that, although that would probably require doing a manual install (I can give some instructions for this if you want). Apart from possibly lower performance and greater overhead, the wrong sector size in ZFS shouldn't actually cause any real problems.

Hopefully that all makes some sense...

SirDice · Oct 12, 2015

It's not always possible but can't you configure them as JBOD? If not a single disk RAID-0 should also work but make sure you set the block size to 8K (or whatever size the physical disks are).

A0101 · Oct 12, 2015

usdmatt said:
Yeah, it's as I expected. The sectorsize is actually being reported as 512B by the disk, which is the standard size for traditional, non-advanced format disks. However, the stripesize is listed as 1MB, which is probably coming from the controller.

As you may know, there's been problems in recent years because advanced format (4K) disks have been advertising a sectorsize of 512B in order to be compatible with systems that don't support 4K disks. We had to build up a database of known 4K disks so that FreeBSD could correctly use a 4K sector size for disks that were 4K, even when they advertised themselves with sectorsize=512.

In recent years quite a few manufacturers have starting making use of the stripesize attribute, which is usually 0 on a plain disk, to hold the real sector size. See this output from one of my WD-RED disks:

Code:

512 # sectorsize 2000398934016 # mediasize in bytes (1.8T) 3907029168 # mediasize in sectors 4096 # stripesize

It's advertising sectorsize=512 in order to maximise compatibility, but it's also put 4K in the normally unused stripesize field to let newer systems know that it's really a 4k disk. FreeBSD now actually uses that stripesize attribute when creating ZFS pools.

Unfortunately because your disk is behind a RAID controller, the controller is overriding the stripesize to be 1M (I don't *think* this is coming from the disk) so ZFS wants to use a 1M sector size even though the disks will be 512B or 4K.

Ideally I would try and confirm the real sector size of the disks, and test using that, although that would probably require doing a manual install (I can give some instructions for this if you want). Apart from possibly lower performance and greater overhead, the wrong sector size in ZFS shouldn't actually cause any real problems.

Hopefully that all makes some sense...

Thanks for your helpful information. I was about to install the box manually but decided to post here first, and looking for the manual about partitions layout. Anyway, if I I get it correctly, I must destroy the pool and re-create it manually with adjusted stripesize then install FreeBSD again. But before to do that, how could I find the actual disk sector size?

SirDice said:
It's not always possible but can't you configure them as JBOD? If not a single disk RAID-0 should also work but make sure you set the block size to 8K (or whatever size the physical disks are).

Yeah, it's pretty obvious I should configure JBOD, but my RAID controller only support RAID level 0, 1, and 5, so I have no other choice.

usdmatt · Oct 12, 2015

If you can get the model number of the disk, you might be able to find a spec sheet online. If it's an older disk, and only ~140GB, it's likely to be a standard 512B sector disk.

In order to stop ZFS trying to auto select a block size, you'll probably have to force it to use the block size you want before creating the pool.

Code:

# sysctl vfs.zfs.min_auto_ashift=9
# sysctl vfs.zfs.max_auto_ashift=9

(9 for 512B, 12 for 4k)

Terry_Kennedy · Oct 15, 2015

A0101 said:
Each disk was configured as a raid-0 array in hardware controller settings.

It looks like your controller is being a bit too "helpful" when reporting the drives to FreeBSD. I would assume that the controller has onboard cache of at least (stripe size * number of drives) and probably more, so there won't be as much performance impact (or any) as the ZFS message indicates. However, the controller is probably also lying to the driver and saying that writes have completed when in fact they're just in the controller's onboard memory. Unless the controller is equipped with [working] battery backup for its cache, this can result in ZFS inconsistencies in the event of an unexpected system shutdown. It might be possible to export each drive as a JBOD instead of a RAID 0 volume, though of course that would involve backing up the data, reconfiguring the drives on the controller, and recreating the ZFS pool.

aht0 · Oct 15, 2015

If you actually "meet" some drive using non-512k format then utility scu can help you out. Of course, you would either need HBA/IT-firmware card. Following does not work from behind RAID controller. I once had to figure it out when I accidentally bought myself EMC Clarion 520byte formatted drive. Took over a week. I hope it helps someone else as well.
http://www.scsifaq.org/RMiller_Tools/scu.html

Pick appropriate version from there, start it using syntax like scu -f /dev/devicename in shell and enter following commands.

 set bypass on

set device block-length 512

format

depending on disk size, such formatting might take hours.
Once the text Working... ends and you get returned to the scu prompt, enter

 stop

exit

and reboot your machine.

And now you are going to notice next issue. Drive is much smaller than it should be. Identical drives in Advanced Format have "fewer" sectors than drives using 512byte sectors for the same size. Figure out the difference and reformat the drive using sysutils/sg3_utils.

example:
sg_format --resize --count=0x22ecb25c /dev/da0

--count=0x.. is total sector count in hex representation.

ZFS SAS drives and non-native block size

Administrator