FreeBSD Bugzilla – Bug 289698 meta-bug for 15.0-RELEASE

As 15.0 release date is approaching I'm curious to know about the this bug. It depends on many others, a few of them regressions from previous releases.

I'm pretty scared because I'm one of the reporters of a (pretty nasty IMHO) regression from 14.3 that will stop me updating to 15.0 for sure, without a doubt. But even worse IMHO is that I reported the issue three weeks ago and I see no progress, of at least noone contacted me asking for details, tests and so on.

Please don't consider this a rant, it's my major release upgrade and I genuinely don't know how this situations are handled in the FreeBSD world. As of today there are still 12 bugs in the list.
 
As 15.0 release date is approaching I'm curious to know about the this bug.
Mark Johnston
freebsd_committer
freebsd_triage
2025-09-19 15:37:42 UTC
A metabug to allow marking of individual bugs as blockers for the release.

I genuinely don't know how this situations are handled in the FreeBSD world.
I dont know either. I would say this is the standard process, with more or less trouble than the previous release.
 
PR 290265
I think it's stopped because the reporter is ignoring the response.
> To make progress, the first step is going to be to see what power state the NVMe drive is in, and whether or not the links to it are up.
> If they are all OK, then we need to maybe look at other things (does this specific drive need addition time to reset maybe?) Also need to make sure that the BARs are save/restored correctly.
 
I'm the reporter.

Maybe it was just some language barrier, I'm not an English native speaker, but I didn't realize that his answer was for me. No instructions, no commands to execute, no info whatsoever, I don't really know what to do to help debug the issue (especially as I think I explained quite clearly that this is a pretty dangerous issue).
 
I think it's stopped because the reporter is ignoring the response.
> To make progress, the first step is going to be to see what power state the NVMe drive is in, and whether or not the links to it are up.
> If they are all OK, then we need to maybe look at other things (does this specific drive need addition time to reset maybe?) Also need to make sure that the BARs are save/restored correctly.
I clearly explained that the first resume is always fine and the second always fails. To me this means that NVMEs power states and BARs are saved and rested correctly at least once. Moreover, I clearly explained that this is a regression when compared to 14.3 , which works always fine.

I really don't know what else I could do...
 
I don't understand what is said by Warner Losh, but maybe this is related - D53140 as he mentions something about BAR and the revision was created the day you posted the bug. Can you check?
I posted the bug when running B1 and tested up to B3 and STABLE. I may upgrade again as soon as I'm back home and report back. Or do you mean I should test an earlier kernel on my production machine without more details?
 
Or do you mean I should test an earlier kernel on my production machine without more details?
No, as Charlie_ already pointed out: you should give Warner Losh, who is trying to help you, the information he requested:
To make progress, the first step is going to be to see what power state the NVMe drive is in, and whether or not the links to it are up.

If you don't know how to get the information you could as well just ask, but you just went on about what versions you are/were running. Since all developers (and pretty much everyone else involved with FreeBSD) have limited time and are working on their free time on the project, one will usually start at a high level when asking for more information - not writing a lengthy essay involving each and every step, because this steals valuable time possibly for both sides, and at worst might be a bit offensive to a knowledgeable user (just like if some hotline-droid is asking "have you plugged in the device").

Given this is an nvme-related issue, nvmecontrol(8) is the first point to look at where you can gather useful information. (Hint: nvmecontrol power -l and nvmecontrol logpage -p0x01 and 0x02 might include some clues to what might go wrong.)
Given this is a Samsung drive, Firmware quirks/bugs are a definitive possibility - or simply a dying drive. So take a look at the SMART and logpage info of the drive and maybe check for newer firmware and/or known firmware issues for this drive.
 
It is perfectly allright to directly email the committers and release engineers involved with followup questions such as "I want to provide the info you requested, but can you tell me how I gather it?".

Even if all else fails it signals that you care and that you communicate.

A suspend bug always has the problem of any developer reproducing it on hardware available to them.
 
No, as Charlie_ already pointed out: you should give Warner Losh, who is trying to help you, the information he requested:


If you don't know how to get the information you could as well just ask, but you just went on about what versions you are/were running. Since all developers (and pretty much everyone else involved with FreeBSD) have limited time and are working on their free time on the project, one will usually start at a high level when asking for more information - not writing a lengthy essay involving each and every step, because this steals valuable time possibly for both sides, and at worst might be a bit offensive to a knowledgeable user (just like if some hotline-droid is asking "have you plugged in the device").

Given this is an nvme-related issue, nvmecontrol(8) is the first point to look at where you can gather useful information. (Hint: nvmecontrol power -l and nvmecontrol logpage -p0x01 and 0x02 might include some clues to what might go wrong.)
Given this is a Samsung drive, Firmware quirks/bugs are a definitive possibility - or simply a dying drive. So take a look at the SMART and logpage info of the drive and maybe check for newer firmware and/or known firmware issues for this drive.
I still don't understand how I can get the power level from a unresponsive NVME drive, isn't that clear that the whole system is crashed and that the root zpool is missing after the second resume?
 
I still don't understand how I can get the power level from a unresponsive NVME drive, isn't that clear that the whole system is crashed and that the root zpool is missing after the second resume?

Maybe you can boot from USB and hence would be able to look at things after a resume that way?
 
It is perfectly allright to directly email the committers and release engineers involved with followup questions such as "I want to provide the info you requested, but can you tell me how I gather it?".

Even if all else fails it signals that you care and that you communicate.
Yeah, as I mentioned earlier it was just a language barrier problem. I'll see what I can do.
 
Yeah, better try to get it in before the release is finalized. Once the release is out the door, release engineering won't be responsible for it any more. It'll be handed over to the security team. And they are typically not too keen on releasing bug fixes.
 
anything but RELEASE is not supported here
STABLE is supported as well:

Topics about unsupported FreeBSD versions
2. FreeBSD versions that are ahead of the currently supported versions, a.k.a. "HEAD", "-CURRENT", or "bleeding edge"
...
... If you want support on these forums, run either a supported version of the -RELEASE branch (for proven, stable, solid installations) or of the -STABLE branch (a slightly more experimental, but still very stable version that incorporates some of the newer developments of the -CURRENT branch).

Now and then support on CURRENT is also tolerated.
 
Given this is an nvme-related issue, nvmecontrol(8) is the first point to look at where you can gather useful information. (Hint: nvmecontrol power -l and nvmecontrol logpage -p0x01 and 0x02 might include some clues to what might go wrong.)
Given this is a Samsung drive, Firmware quirks/bugs are a definitive possibility - or simply a dying drive. So take a look at the SMART and logpage info of the drive and maybe check for newer firmware and/or known firmware issues for this drive.
After a fresh boot in BETA4:

Code:
[17:52][fmc000@tu45b-freebsd ~]$ doas nvmecontrol power -l nvme0

Power States Supported: 5

 #   Max pwr  Enter Lat  Exit Lat RT RL WT WL Idle Pwr  Act Pwr Workload
--  --------  --------- --------- -- -- -- -- -------- -------- --
 0:  5.2400W    0.000ms   0.000ms  0  0  0  0  0.0000W  0.0000W 0
 1:  4.4900W    0.000ms   0.000ms  1  1  1  1  0.0000W  0.0000W 0
 2:  2.1900W    0.000ms   0.500ms  2  2  2  2  0.0000W  0.0000W 0
 3:  0.0500W*   0.210ms   1.200ms  3  3  3  3  0.0000W  0.0000W 0
 4:  0.0050W*   1.000ms   9.000ms  4  4  4  4  0.0000W  0.0000W 0
[17:53][fmc000@tu45b-freebsd ~]$ doas nvmecontrol logpage -p0x01 nvme0
Error Information Log
=====================
No error entries found
[17:53][fmc000@tu45b-freebsd ~]$ doas nvmecontrol logpage -p0x02 nvme0
SMART/Health Information Log
============================
Critical Warning State:         0x00
 Available spare:               0
 Temperature:                   0
 Device reliability:            0
 Read only:                     0
 Volatile memory backup:        0
Temperature:                    304 K, 30.85 C, 87.53 F
Available spare:                100
Available spare threshold:      10
Percentage used:                2
Data units (512,000 byte) read: 18421732
Data units written:             20609985
Host read commands:             114047645
Host write commands:            324513111
Controller busy time (minutes): 839
Power cycles:                   2222
Power on hours:                 1324
Unsafe shutdowns:               138
Media errors:                   0
No. error info log entries:     0
Warning Temp Composite Time:    0
Error Temp Composite Time:      0
Temperature Sensor 1:           304 K, 30.85 C, 87.53 F
Temperature Sensor 2:           309 K, 35.85 C, 96.53 F
Temperature 1 Transition Count: 0
Temperature 2 Transition Count: 0
Total Time For Temperature 1:   0
Total Time For Temperature 2:   0
[17:53][fmc000@tu45b-freebsd ~]$
 
After the first (successful) resume:

Code:
[17:54][fmc000@tu45b-freebsd ~]$ doas nvmecontrol power -l nvme0     

Power States Supported: 5

 #   Max pwr  Enter Lat  Exit Lat RT RL WT WL Idle Pwr  Act Pwr Workload
--  --------  --------- --------- -- -- -- -- -------- -------- --
 0:  5.2400W    0.000ms   0.000ms  0  0  0  0  0.0000W  0.0000W 0
 1:  4.4900W    0.000ms   0.000ms  1  1  1  1  0.0000W  0.0000W 0
 2:  2.1900W    0.000ms   0.500ms  2  2  2  2  0.0000W  0.0000W 0
 3:  0.0500W*   0.210ms   1.200ms  3  3  3  3  0.0000W  0.0000W 0
 4:  0.0050W*   1.000ms   9.000ms  4  4  4  4  0.0000W  0.0000W 0
[17:56][fmc000@tu45b-freebsd ~]$ doas nvmecontrol logpage -p0x01 nvme0
Error Information Log
=====================
No error entries found
[17:56][fmc000@tu45b-freebsd ~]$ doas nvmecontrol logpage -p0x02 nvme0
SMART/Health Information Log
============================
Critical Warning State:         0x00
 Available spare:               0
 Temperature:                   0
 Device reliability:            0
 Read only:                     0
 Volatile memory backup:        0
Temperature:                    304 K, 30.85 C, 87.53 F
Available spare:                100
Available spare threshold:      10
Percentage used:                2
Data units (512,000 byte) read: 18421757
Data units written:             20610144
Host read commands:             114047941
Host write commands:            324515449
Controller busy time (minutes): 839
Power cycles:                   2223
Power on hours:                 1324
Unsafe shutdowns:               138
Media errors:                   0
No. error info log entries:     0
Warning Temp Composite Time:    0
Error Temp Composite Time:      0
Temperature Sensor 1:           304 K, 30.85 C, 87.53 F
Temperature Sensor 2:           308 K, 34.85 C, 94.73 F
Temperature 1 Transition Count: 0
Temperature 2 Transition Count: 0
Total Time For Temperature 1:   0
Total Time For Temperature 2:   0
[17:56][fmc000@tu45b-freebsd ~]$
 
Can someone please tell the Bugzilla maintainers that Anubis seems to fail on Brave?

It'd be nice to also change this weird furry anime.

You see the Anubis mascot from FreeBSD's bugzilla?

I see it on other websites, but with FreeBSD's I see "Making sure you aren't a bot" with no logo above it, and a quick forward to the bugzilla (no "Success" report); I use Firefox but wouldn't think the browser would affect the mascot showing?
 
Back
Top