ZFS Correct disk partitioning depending on sector size

I have a very popular Samsung 980 Pro NVME SSD drive. This drive is the only system drive so far. Booting is done in UEFI mode. The file system is ZFS.

Before partitioning the disk, I had the following preconditions:

1.) The size of the EFI partition should be 260M because the installer does this in Auto ZFS mode:
Code:
# gpart show

=>        40  1953525088  nvd2  GPT  (932G)
      40      532480     1  efi  (260M)
      532520        1024     2  freebsd-boot  (512K)
      533544         984        - free -  (492K)
      534528     4194304     3  freebsd-swap  (2.0G)
     4728832  1948794880     4  freebsd-zfs  (929G)
  1953523712        1416        - free -  (708K)

=>        40  1953525088  diskid/DISK-S5GXNX0T975357Z  GPT  (932G)
      40      532480                            1  efi  (260M)
      532520        1024                            2  freebsd-boot  (512K)
      533544         984                               - free -  (492K)
      534528     4194304                            3  freebsd-swap  (2.0G)
     4728832  1948794880                            4  freebsd-zfs  (929G)
  1953523712        1416                               - free -  (708K)

2.) The sector size should be 512 because that's what the smartctl command shows:
Code:
smartctl -a /dev/nvme2
[...]
Namespace 1 Formatted LBA Size:     512
[...]
Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0
 [...]

Therefore, when creating partitions, gpart must be without any -a options. And also the ZFS pool must be created with default vfs.zfs.min_auto_ashift=9 since there was no reason for the value 12.

So partitioning the disk into partitions was done as follows:
Code:
# create gpt partitions
gpart create -s gpt /dev/diskid/DISK-S5GXNX0T1111111
gpart add -i 1 -l S5GXNX0T1111111-efi -s 260M -t efi /dev/diskid/DISK-S5GXNX0T1111111
gpart add -i 2 -l S5GXNX0T1111111-sys -t freebsd-zfs /dev/diskid/DISK-S5GXNX0T1111111

# configure efi
newfs_msdos /dev/diskid/DISK-S5GXNX0T1111111p1
mount -t msdosfs /dev/diskid/DISK-DISK-S5GXNX0T1111111p1 /mnt
mkdir -pv /mnt/efi/boot
cp /boot/loader.efi /mnt/efi/boot/bootx64.efi
umount /mnt

# configure zfs
zpool create -m none scc /dev/diskid/DISK-DISK-S5GXNX0T1111111p2
[...]

As a result, I got the following partitioning of the disk:
Code:
# gpart show
=>        40  1953525088  diskid/DISK-S5GXNX0T975357Z  GPT  (932G)
          40      532480                            1  efi  (260M)
      532520  1952992608                            2  freebsd-zfs  (931G)

I like this output as it is simple and justified above. But...

3.) There is a recommendation in the "Absolute FreeBSD" book to use gpart -a 1m "to change that partition to support UEFI if necessary" and "always align partitions on even megabyte boundaries". Do I understand correctly that this applies if I don't have an EFI partition yet, but want to leave the option to add it later? From what I understand, the parted -a optimal command that I remember from the Gentoo Handbook also used the default size of 1M.

4.) The installer in "Auto ZFS" mode sets the sector size to 4K by default. Which suggests that if such an alignment is not optimal, it will be good, suitable in most cases. Finally, in almost every how-to I've come across a recommendation to use gpart add -a 4K, as well as vfs.zfs.min_auto_ashift=12 when creating a zpool as if everyone has a 4K sector size disk.

Do I understand correctly that:

а) Now my filesystem is misaligned with the disk, because 532520 / 512 = 1040.078125? The second (ZFS) partition should start with 534528 (as the installer did) and it could this be achieved with -a 1m?

b) Does it make sense to redo partitioning from scratch with the gpart add -a 1m (or -a 4K) option for each partition?

c) What do you think are the correct values for gpart add -a ... and vfs.zfs.min_auto_ashift=... ?
 
I would not worry about the current performance impact.
Just in future try some basic rules.
PS: Some disks lie and present them selves as using 512bytes when they work internally with 4K blocks.
 
  • Thanks
Reactions: dnb
Can you post the entire output of smartctl -a /dev/nvme2
Code:
smartctl 7.3 2022-02-28 r5338 [FreeBSD 13.1-RELEASE-p3 amd64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Samsung SSD 980 PRO 1TB
Serial Number:                      S5GXNX0T1111111
Firmware Version:                   5B2QGXA7
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 1,000,204,886,016 [1.00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      6
NVMe Version:                       1.3
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1,000,204,886,016 [1.00 TB]
Namespace 1 Utilization:            63,513,001,984 [63.5 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            002538 b921be77bd
Local Time is:                      Tue Jan 24 22:07:32 2023 MSK
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x0057):     Comp Wr_Unc DS_Mngmt Sav/Sel_Feat Timestmp
Log Page Attributes (0x0f):         S/H_per_NS Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size:         128 Pages
Warning  Comp. Temp. Threshold:     82 Celsius
Critical Comp. Temp. Threshold:     85 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     8.49W       -        -    0  0  0  0        0       0
 1 +     4.48W       -        -    1  1  1  1        0     200
 2 +     3.18W       -        -    2  2  2  2        0    1000
 3 -   0.0400W       -        -    3  3  3  3     2000    1200
 4 -   0.0050W       -        -    4  4  4  4      500    9500

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        50 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    38,526 [19.7 GB]
Data Units Written:                 173,890 [89.0 GB]
Host Read Commands:                 323,519
Host Write Commands:                2,373,901
Controller Busy Time:               1
Power Cycles:                       30
Power On Hours:                     38
Unsafe Shutdowns:                   0
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               50 Celsius
Temperature Sensor 2:               66 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged
 
I was looking for Supported LBA Sizes (NSID 0x1) in the output of your smartctl
Anyway keep an eye on "Media and Data Integrity Errors" at least while the drive is under warranty.

Edit:
also can you post the output of nvmecontrol identify -n 1 nvme2 to check if only one LBA format is supported.
 
  • Thanks
Reactions: dnb
Edit:
also can you post the output of nvmecontrol identify -n 1 nvme2 to check if only one LBA format is supported.
Code:
Size:                        1953525168 blocks
Capacity:                    1953525168 blocks
Utilization:                 124049536 blocks
Thin Provisioning:           Not Supported
Number of LBA Formats:       1
Current LBA Format:          LBA Format #00
Data Protection Caps:        Not Supported
Data Protection Settings:    Not Enabled
Multi-Path I/O Capabilities: Not Supported
Reservation Capabilities:    Not Supported
Format Progress Indicator:   0% remains
Deallocate Logical Block:    Read 00h
Optimal I/O Boundary:        0 blocks
NVM Capacity:                1000204886016 bytes
Globally Unique Identifier:  00000000000000000000000000000000
IEEE EUI64:                  002538b921be77bd
LBA Format #00: Data Size:   512  Metadata Size:     0  Performance: Best
 
My understanding is filesystem block size should be the same as physical block size (ashift for ZFS), but there is really no harm in having your filesystem block size a multiple of your physical.

Alignment when creating the partitions is to prevent a filesystem block span multiple physical blocks. This is more of an issue if you have say 4K physical, use 4k logical, your first partition is boot code of say 512bytes. If you put your next partition starting at byte 513, you wind up crossing 2 physical blocks which is just extra work. So a rule of thumb is align on at least 4K, some consider 1M better. Alignment I think covers both start and end of a partition rounding up or down your size to make it stop nicely.

For general performance reasons, at least with ZFS, ashift=12 to give 4K blocks is the default, it works even if the physcial device has 512byte sectors. Smaller ashift I think has implications on storing lots of little files (the old fragmentation argument).
 
  • Thanks
Reactions: dnb
Back
Top