Solved Optane M.2 nvme 4k format failing with error

I have (a pair of) INTEL SSDPEL1K100GA Intel® Optane™ SSD DC P4801X Series (100GB, M.2 110MM PCIe x4, 3D XPoint™), pretty cool little devices and tragic that Intel dropped the Optane project. My system is running FreeBSD 14.1-RELEASE-p3 GENERIC amd64 on HP/Intel. The cards are installed in a furcating RIITOP M.2 NVMe SSD to PCI-e 3.1 x8/x16 Card.

diskinfo -v /dev/nvme0ns1
returns a sector size of 512 and
nvmecontrol identify -n 1 nvme0
reports that the Current LBA Format: LBA Format #00 and the device supports 7 (seven!) LBA formats:
Code:
LBA Format #00: Data Size:   512  Metadata Size:     0  Performance: Good
LBA Format #01: Data Size:   512  Metadata Size:     8  Performance: Good
LBA Format #02: Data Size:   512  Metadata Size:    16  Performance: Good
LBA Format #03: Data Size:  4096  Metadata Size:     0  Performance: Best
LBA Format #04: Data Size:  4096  Metadata Size:     8  Performance: Best
LBA Format #05: Data Size:  4096  Metadata Size:    64  Performance: Best
LBA Format #06: Data Size:  4096  Metadata Size:   128  Performance: Best

I'm pretty sure I want Format #04, who wouldn't want the extra 10% of Performance: Best, so I tried
nvmecontrol format -f 3 nvme0ns1
which works for 60.79 seconds and returns nvmecontrol: format request returned error. I get the same results with -f 03 and with nvme0. Weirdly, the other LBA Format options (including -f 0, which should be current) return an error instantly. Every time I try -f 3 it thinks for 60 seconds, but no love.

Any hints or should I learn acceptance of Performance: Good?

Is it worth trying sysutils/nvme-cli?
 
I got no data, just the interwebs saying +10% and the firmware admonishments about "good" and "best" performance. I admit some feeling some pressure to "be best." I started full of optimism that it was a simple command away from the other posts of success at 4k formatting here and there. I feel a little left out. :-(
 
Sadly, I'm very remote and extracting the device is a bit of an ask for hands-on. Unless there's something obvious I'm missing (always a pretty likely possibility) on the command line, I'll run them at 512 until I can get to them physically. They're small (capacity and physically) and it won't be a big deal to reformat them and reconfig later. I have some doubts about the RIITOP card, it's a bit cheesy and I have an AOC-SLG3-2M2 I'd try - it only supports one M.2 in this box as the only PCI slot that supports furcation is occupied and tightly cabled. Then maybe give linux a try if that doesn't work. OTOH, 10% isn't going to change any real world results, it's just.. you know... 10% better.
 
There's some reports of timeouts - it does time out at exactly 60s, I think the nvme-cli program supports extended timeout, so I'll give that a try in a few and report back.
 
sysutils/nvme-cli hasn't added anything to the mix, there seems to be an oddity in connecting and while it gets some data successfully other data comes back null. nvmecontrol is more successful, despite the format fail.

However, I did get some new data - sorry for the screen shot, this is written to console after executing
nvmecontrol format -f 3 -m 0 -p 0 -l 0 nvme0

1735919766301.png


which validates the above - I don't see how to change the timeout with nvmecontrol and I'm not finding anything useful googling. Anyone know?
 
nvmecontrol seems to have an undocumented "-t" argument for timeout, but only for the passthrough commands.

You could try messing with the timeouts in
/usr/src/sys/dev/nvme/nvme_private.h
presumably with NVME_ADMIN_TIMEOUT_PERIOD, which is 60 by default.
 
Tried sysctl dev.nvme.0.timeout_period=120 dev.nvme.0.timeout_period: 30 -> 120
but it still timed out (same error) and again in exactly 60.33 real seconds.
 
Is it enough to simply edit the file or do I need to recompile? I did edit
#define NVME_ADMIN_TIMEOUT_PERIOD (120) /* in seconds */
and got the same result (and time)
 
Tried sysctl dev.nvme.0.timeout_period=120 dev.nvme.0.timeout_period: 30 -> 120
but it still timed out (same error) and again in exactly 60.33 real seconds.

That's a different variable. NVME_ADMIN_TIMEOUT_PERIOD has no sysctl at this time.
 
Is it enough to simply edit the file or do I need to recompile? I did edit
#define NVME_ADMIN_TIMEOUT_PERIOD (120) /* in seconds */
and got the same result (and time)

You need to recompile the kernel. And I would set it to 3600 for this experiment.
 
Sadly, I'm not sure I can destroy the namespace:
nvmecontrol ns active nvme0
nvmecontrol: controller does not support namespace management
 
And I just recently moved from custom kernels to default for easier management! I'm gonna try booting to server 2016 and check with the intel tools. Hardware problems could still be an issue and it's certainly easier to do firmware updates with Intel's tools. Lemmie see what that tells me before I go the modded kernel path.
 
And I just recently moved from custom kernels to default for easier management! I'm gonna try booting to server 2016 and check with the intel tools. Hardware problems could still be an issue and it's certainly easier to do firmware updates with Intel's tools. Lemmie see what that tells me before I go the modded kernel path.

Well, you would only have to run a modified kernel for the duration of the format. Afterwards you would operate on the 4096 KB block SSD with a normal one.

If this makes it work I would seek to commit at least a sysctl to change the timeout during format commands.
 
OK, lemmie try that path. Odd result. I made two changes and rebuilt the kernel:
Code:
#define NVME_ADMIN_TIMEOUT_PERIOD       (3600)    /* modified from 60 in seconds */
#define NVME_DEFAULT_TIMEOUT_PERIOD     (120)    /* modified from 30 in seconds */
#define NVME_MIN_TIMEOUT_PERIOD         (5)
#define NVME_MAX_TIMEOUT_PERIOD         (120)

I reran the command:
Code:
# time nvmecontrol format -f 3 -m 0 -p 0 -l 0 nvme0
nvmecontrol: format request returned error
      120.06 real         0.00 user         0.00 sys

So I'll try this:
Code:
#define NVME_ADMIN_TIMEOUT_PERIOD       (3600)    /* in seconds def 60 */
#define NVME_DEFAULT_TIMEOUT_PERIOD     (3600)    /* in seconds def 30 */
#define NVME_MIN_TIMEOUT_PERIOD         (5)
#define NVME_MAX_TIMEOUT_PERIOD         (3600)    /* in seconds def 120 */
 
Success!!!

Code:
# time nvmecontrol format -f 3 -m 0 -p 0 -l 0 nvme0
      316.68 real         0.00 user         0.00 sys
(no errors)

# nvmecontrol identify nvme0ns1 | grep 'LBA Format'
Number of LBA Formats:       7
Current LBA Format:          LBA Format #03
LBA Format #00: Data Size:   512  Metadata Size:     0  Performance: Good
LBA Format #01: Data Size:   512  Metadata Size:     8  Performance: Good
LBA Format #02: Data Size:   512  Metadata Size:    16  Performance: Good
LBA Format #03: Data Size:  4096  Metadata Size:     0  Performance: Best
LBA Format #04: Data Size:  4096  Metadata Size:     8  Performance: Best
LBA Format #05: Data Size:  4096  Metadata Size:    64  Performance: Best
LBA Format #06: Data Size:  4096  Metadata Size:   128  Performance: Best

W00t
 
Success!!!

Code:
# time nvmecontrol format -f 3 -m 0 -p 0 -l 0 nvme0
      316.68 real         0.00 user         0.00 sys
(no errors)

# nvmecontrol identify nvme0ns1 | grep 'LBA Format'
Number of LBA Formats:       7
Current LBA Format:          LBA Format #03
LBA Format #00: Data Size:   512  Metadata Size:     0  Performance: Good
LBA Format #01: Data Size:   512  Metadata Size:     8  Performance: Good
LBA Format #02: Data Size:   512  Metadata Size:    16  Performance: Good
LBA Format #03: Data Size:  4096  Metadata Size:     0  Performance: Best
LBA Format #04: Data Size:  4096  Metadata Size:     8  Performance: Best
LBA Format #05: Data Size:  4096  Metadata Size:    64  Performance: Best
LBA Format #06: Data Size:  4096  Metadata Size:   128  Performance: Best

W00t
I assume since you didn't use the -E flag that your data remained and was not erased?

I've been afraid to use nvme format without -E in case it erased my drive.
 
There was no data on the drive. I was being very brave with possible bricking, but not with data!
 
Surprisingly not helpful, definitely not what I would expect given the "it's worth doing for the 10% better performance" and the internal code, but The More You Know!

# diskinfo -wS -i -t /dev/nvd0 (4k), firmware E2010485# diskinfo -wS -i -t /dev/nvd1 (512) firmware E2010600
Code:
    4096            # sectorsize
    100,030,242,816    # mediasize in bytes (93G)
    24,421,446        # mediasize in sectors
    0               # stripesize
    0               # stripeoffset
    INTEL SSDPEL1K100GA    # Disk descr.
    PHKM926100FA100D    # Disk ident.
    nvme0           # Attachment
    Yes             # TRIM/UNMAP support
    0               # Rotation rate in RPM
Code:
   512             # sectorsize
    100,030,242,816    # mediasize in bytes (93G)
    195,371,568       # mediasize in sectors
    0               # stripesize
    0               # stripeoffset
    INTEL SSDPEL1K100GA    # Disk descr.
    PHKM2035006K100D    # Disk ident.
    nvme1           # Attachment
    Yes             # TRIM/UNMAP support
    0               # Rotation rate in RPM
Seek times:
Code:
    Full stroke:      250 iter in   0.012063 sec =    0.048 msec
    Half stroke:      250 iter in   0.013677 sec =    0.055 msec
    Quarter stroke:      500 iter in   0.017809 sec =    0.036 msec
    Short forward:      400 iter in   0.016441 sec =    0.041 msec
    Short backward:      400 iter in   0.016047 sec =    0.040 msec
    Seq outer:     2048 iter in   0.054341 sec =    0.027 msec
    Seq inner:     2048 iter in   0.055266 sec =    0.027 msec
Seek times:
Code:
    Full stroke:      250 iter in   0.012181 sec =    0.049 msec
    Half stroke:      250 iter in   0.011424 sec =    0.046 msec
    Quarter stroke:      500 iter in   0.017620 sec =    0.035 msec
    Short forward:      400 iter in   0.014526 sec =    0.036 msec
    Short backward:      400 iter in   0.015139 sec =    0.038 msec
    Seq outer:     2048 iter in   0.050303 sec =    0.025 msec
    Seq inner:     2048 iter in   0.051479 sec =    0.025 msec
Transfer rates:
Code:
    outside:       102,400 kbytes in   0.055566 sec =  1,842,854 kbytes/sec
    middle:        102,400 kbytes in   0.054990 sec =  1,862,157 kbytes/sec
    inside:        102,400 kbytes in   0.055301 sec =  1,851,684 kbytes/sec
,,
Transfer rates:
Code:
    outside:       102,400 kbytes in   0.061920 sec =  1,653,747 kbytes/sec
    middle:        102,400 kbytes in   0.054934 sec =  1,864,055 kbytes/sec
    inside:        102,400 kbytes in   0.054950 sec =  1,863,512 kbytes/sec
,
Asynchronous random reads:
Code:
    sectorsize:   1,103,370 ops in    3.000056 sec =   367,783 IOPS
    NA
    32 kbytes:     217,820 ops in    3.001553 sec =    72,569 IOPS
    128 kbytes:     53,749 ops in    3.007121 sec =    17,874 IOPS
    1024 kbytes:     6,827 ops in    3.055959 sec =     2,234 IOPS
Asynchronous random reads:
Code:
    sectorsize:   1,128,521 ops in    3.000055 sec =   376,167 IOPS
    4 kbytes:     1,177,215 ops in    3.000127 sec =   392,388 IOPS
    32 kbytes:     217,791 ops in    3.001751 sec =    72,555 IOPS
    128 kbytes:     53,761 ops in    3.007155 sec =    17,878 IOPS
    1024 kbytes:     6,820 ops in    3.055809 sec =     2,232 IOPS
,
Synchronous random writes:
Code:
       NA
       NA
       NA
       4 kbytes:     19.7 usec/IO =    197.9 Mbytes/s
       8 kbytes:     24.6 usec/IO =    317.0 Mbytes/s
      16 kbytes:     33.2 usec/IO =    470.6 Mbytes/s
      32 kbytes:     62.2 usec/IO =    502.7 Mbytes/s
      64 kbytes:    119.7 usec/IO =    522.3 Mbytes/s
     128 kbytes:    182.9 usec/IO =    683.4 Mbytes/s
     256 kbytes:    300.6 usec/IO =    831.6 Mbytes/s
     512 kbytes:    532.8 usec/IO =    938.4 Mbytes/s
    1024 kbytes:    994.4 usec/IO =   1005.7 Mbytes/s
    2048 kbytes:   1910.6 usec/IO =   1046.8 Mbytes/s
    4096 kbytes:   3750.6 usec/IO =   1066.5 Mbytes/s
    8192 kbytes:   7429.2 usec/IO =   1076.8 Mbytes/s
Synchronous random writes:
Code:
     0.5 kbytes:     19.9 usec/IO =     24.5 Mbytes/s
       1 kbytes:     20.9 usec/IO =     46.7 Mbytes/s
       2 kbytes:     21.5 usec/IO =     90.7 Mbytes/s
       4 kbytes:     19.7 usec/IO =    198.5 Mbytes/s
       8 kbytes:     23.8 usec/IO =    328.0 Mbytes/s
      16 kbytes:     31.0 usec/IO =    503.3 Mbytes/s
      32 kbytes:     63.1 usec/IO =    495.0 Mbytes/s
      64 kbytes:    115.4 usec/IO =    541.6 Mbytes/s
     128 kbytes:    186.5 usec/IO =    670.3 Mbytes/s
     256 kbytes:    306.6 usec/IO =    815.3 Mbytes/s
     512 kbytes:    534.8 usec/IO =    934.9 Mbytes/s
    1024 kbytes:    993.0 usec/IO =   1007.1 Mbytes/s
    2048 kbytes:   1909.5 usec/IO =   1047.4 Mbytes/s
    4096 kbytes:   3745.6 usec/IO =   1067.9 Mbytes/s
    8192 kbytes:   7387.3 usec/IO =   1082.9 Mbytes/s

I have the intel firmware update disk queued up and will return with data post-update, but here's the data post reboot.
 
Alot of work for ~20MB/sec more.....

But I appreciate the learning experience.

You wanted best performance and you got it.

Make sure you have trim enabled on FS.
 
Updated both to E2010650 (current from Intel) and repeating the tests we get:

# diskinfo -wS -i -t /dev/nvd0 (4k), firmware E2010650# diskinfo -wS -i -t /dev/nvd1 (512) firmware E2010650
Code:
    4096            # sectorsize
    100030242816    # mediasize in bytes (93G)
    24421446        # mediasize in sectors
    0               # stripesize
    0               # stripeoffset
    INTEL SSDPEL1K100GA    # Disk descr.
    PHKM926100FA100D    # Disk ident.
    nvme0           # Attachment
    Yes             # TRIM/UNMAP support
    0               # Rotation rate in RPM
Code:
    512             # sectorsize
    100030242816    # mediasize in bytes (93G)
    195371568       # mediasize in sectors
    0               # stripesize
    0               # stripeoffset
    INTEL SSDPEL1K100GA    # Disk descr.
    PHKM2035006K100D    # Disk ident.
    nvme1           # Attachment
    Yes             # TRIM/UNMAP support
    0               # Rotation rate in RPM
Seek times:
Code:
 Full stroke: 250 iter in 0.012673 sec = 0.051 msec
    Half stroke:      250 iter in   0.013921 sec =    0.056 msec
    Quarter stroke:      500 iter in   0.020303 sec =    0.041 msec
    Short forward:      400 iter in   0.016671 sec =    0.042 msec
    Short backward:      400 iter in   0.015128 sec =    0.038 msec
    Seq outer:     2048 iter in   0.054943 sec =    0.027 msec
    Seq inner:     2048 iter in   0.052470 sec =    0.026 msec
Seek times:
Code:
 Full stroke: 250 iter in 0.012268 sec = 0.049 msec
    Half stroke:      250 iter in   0.012846 sec =    0.051 msec
    Quarter stroke:      500 iter in   0.016932 sec =    0.034 msec
    Short forward:      400 iter in   0.015693 sec =    0.039 msec
    Short backward:      400 iter in   0.015192 sec =    0.038 msec
    Seq outer:     2048 iter in   0.053300 sec =    0.026 msec
    Seq inner:     2048 iter in   0.051753 sec =    0.025 msec
Transfer rates:
Code:
 outside: 102400 kbytes in 0.055557 sec = 1,843,152 kbytes/sec
    middle:        102400 kbytes in   0.054831 sec =  1,867,557 kbytes/sec
    inside:        102400 kbytes in   0.055199 sec =  1,855,106 kbytes/sec
Transfer rates:
Code:
 outside: 102400 kbytes in 0.060901 sec = 1,681,417 kbytes/sec
    middle:        102400 kbytes in   0.054846 sec =  1,867,046 kbytes/sec
    inside:        102400 kbytes in   0.055227 sec =  1,854,166 kbytes/sec
Asynchronous random reads:
Code:
    sectorsize:   1146834 ops in    3.000052 sec =   382,271 IOPS
    NA
    32 kbytes:     217939 ops in    3.001765 sec =    72,604 IOPS
    128 kbytes:     53774 ops in    3.007148 sec =    17,882 IOPS
    1024 kbytes:     6827 ops in    3.055068 sec =     2,235 IOPS
,
Asynchronous random reads:
Code:
 sectorsize: 1128207 ops in 3.000055 sec = 376,062 IOPS
    4 kbytes:     1182602 ops in    3.000056 sec =   394,193 IOPS
    32 kbytes:     217950 ops in    3.001757 sec =    72,607 IOPS
    128 kbytes:     53780 ops in    3.007108 sec =    17,884 IOPS
    1024 kbytes:     6822 ops in    3.055894 sec =     2,232 IOPS
,
Synchronous random writes:
Code:
       NA
       NA
       NA
       4 kbytes:     19.5 usec/IO =    200.7 Mbytes/s
       8 kbytes:     24.4 usec/IO =    320.8 Mbytes/s
      16 kbytes:     32.8 usec/IO =    475.7 Mbytes/s
      32 kbytes:     64.5 usec/IO =    484.2 Mbytes/s
      64 kbytes:    118.5 usec/IO =    527.5 Mbytes/s
     128 kbytes:    182.0 usec/IO =    687.0 Mbytes/s
     256 kbytes:    300.7 usec/IO =    831.4 Mbytes/s
     512 kbytes:    533.8 usec/IO =    936.7 Mbytes/s
    1024 kbytes:    993.7 usec/IO =   1006.4 Mbytes/s
    2048 kbytes:   1913.4 usec/IO =   1045.2 Mbytes/s
    4096 kbytes:   3747.4 usec/IO =   1067.4 Mbytes/s
    8192 kbytes:   7424.9 usec/IO =   1077.4 Mbytes/s
Synchronous random writes:
Code:
 0.5 kbytes: 21.4 usec/IO = 22.8 Mbytes/s
       1 kbytes:     21.5 usec/IO =     45.5 Mbytes/s
       2 kbytes:     21.7 usec/IO =     89.9 Mbytes/s
       4 kbytes:     19.7 usec/IO =    197.8 Mbytes/s
       8 kbytes:     24.4 usec/IO =    320.7 Mbytes/s
      16 kbytes:     33.0 usec/IO =    472.9 Mbytes/s
      32 kbytes:     61.8 usec/IO =    505.7 Mbytes/s
      64 kbytes:    121.5 usec/IO =    514.3 Mbytes/s
     128 kbytes:    181.8 usec/IO =    687.4 Mbytes/s
     256 kbytes:    298.6 usec/IO =    837.2 Mbytes/s
     512 kbytes:    532.6 usec/IO =    938.8 Mbytes/s
    1024 kbytes:    994.0 usec/IO =   1006.1 Mbytes/s
    2048 kbytes:   1909.8 usec/IO =   1047.2 Mbytes/s
    4096 kbytes:   3746.3 usec/IO =   1067.7 Mbytes/s
    8192 kbytes:   7417.2 usec/IO =   1078.6 Mbytes/s

That was an involved process but at least the results were underwhelming. I'm gonna # nvmecontrol format -f 0 -m 0 -p 0 -l 0 nvme0 back to 512.
 
Back
Top