Solved Please, assist me in correctly wording a PR to get a 12-year-old kernel module bug fixed!

Snurg

Daemon

Reaction score: 572
Messages: 1,349

Since its introduction in 2009, the vesa.ko kernel module contains a bug, presumably of the category "use of uninitialized data structures".

This bug prevents the system resuming correctly on systems with nvidia graphics card and using the sc console in text mode.
To build+install a custom kernel without vesa kernel module is the easiest workaround I know to make resuming after S3 suspend (" zzz") succeed.

Fixing the bug would require just two lines of code.

However, I am afraid I do not get the wording right, miss out some important detail etc.
I already made a PR in 2017.
However, my original interpretation of the bug was incorrect, as the very detailed technical discussion that followed revealed.
So this distraction/confusion might have contributed to the inaction following my PR.

So I kindly ask you for your assistance.
I want to make sure that this bug gets fixed and I do no longer need to build custom kernels, only to have suspend/resume work with my Nvidia cards.
Please tell me what I should add/leave out to the PR draft below.

Title draft said:
vesa.ko: Invalid BIOS call when resuming from S3 suspend/sleep causes nvidia driver hang
Extra information in the bugs' page said:
OS Version: all (is this correct? all versions since the introduction of vesa.ko are affected, e.g. all since 2009)
Ports/Packages: Kernel (is this correct? or do I have to enter port/package xorg?)
Platform: amd64 (there is reason to assume i386 is affected too, but didn't test)
Users affected: Many (e.g. all users who use nvidia card/driver and would like to use sleep/S3 suspend)
Text draft said:
In vesa.ko there is a function that gets called when resuming from sleep, e.g. at resuming after S3 suspend via 'zzz'.
This function does a BIOS call, which is related to restoring the graphics cards' previous state it had before powering off.

On Nvidia cards this BIOS function seems to be implemented in a different way than on most other cards.
For this reason, calling this BIOS function causes the Nvidia graphics driver to hang, failing to resume.
(For technical background, read my discussion with jkim in PR 224069: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224069 )

Reproducing the bug is easy:
-Install FreeBSD (eg using GENERIC kernel).
-Enable sc (kern.vty="sc" in /boot/loader.conf).
-Install xorg. Install and configure nvidia driver. Reboot and start xorg via startx.
-Enter "zzz" in a xterm.
-Watch the system/driver hang, keyboard (PS/2) and mouse becoming unresponsive when attempting to switch back to graphics mode.
-Hitting the power button results no visible change, until after timeout a message about an unresponsive stop job (presumably the nvidia driver) appears shortly before powering off.

For reproducing the bug it is essential to use GENERIC kernel!
Because, working around by building and installing a custom kernel without "options VESA" fixes the hang after suspend/resume. It is also important to not have vt and its helper modules (vt_efifb etc) in the kernel, as these pull in the vesa.ko showstopper module.

Already back in 2017 I found that skipping (commenting out) the Nvidia BIOS call fixes the issue, making resume work reliably.

So I believe the proper fix would be:
1. check whether the graphics card is Nvidia
2. if it is Nvidia, skip that BIOS call in /usr/src/sys/dev/fb/vesa.c line 520.

Pseudocode for a patch might look like this:

(+)if (! nvidia_card_is_installed) {
x86bios_intr(&regs, 0x10);
(+)}


Side note:
All my systems use the sc console, not the vt console.
So I do not know the system behaviour when using vt.
Thus please use sc in text mode when reproducing the bug!

Thank you very much for any suggestion or advice!
 
Last edited:

tingo

Daemon

Reaction score: 586
Messages: 2,460

Title: drop the word 'critical' (the persons doing triage on the bug will assign it a level), and if possible, do a better rewording of the title.

The text of the PR looks fine to me.

Of course, adding details about what version / versions of operating system and relevant ports / packages you have tested this under helps too (historical versions aren't relevant, only the ones that are currently supported).
 
OP
Snurg

Snurg

Daemon

Reaction score: 572
Messages: 1,349

Good advice! I updated the draft title in the first post!
Thank you very much, tingo !
Will still wait a while before posting the PR, in case more people with good suggestions pass by.

Edit:
Still not sure about the title, is it already too long? What could still be improved?
My title draft atm: vesa.ko: Invalid BIOS call when resuming from S3 suspend/sleep causes nvidia driver hang

Edit 2:
Regarding adding details you mentioned, I am not sure at all what is actually necessary and what would be irrelevant.
Could it be beneficial to add something like this to the PR text?
Reproducing the bug is easy:
Install FreeBSD (eg using GENERIC kernel), enable sc (kern.vty="sc" in /boot/loader.conf), install xorg, install and configure nvidia driver, reboot and start xorg, enter "zzz" in a xterm, and then watch the system hang when trying to resume after sleep.
For reproducing the bug it is essential to use GENERIC kernel, because working around by building and installing a custom kernel without "options VESA" fixes the hang after suspend/resume.

Further, I added a section "Extra information in the bugs' page" for the form fields like version, arch etc., as even there I am not 100% sure what to enter.
(Thank you again for pointing at this!)
 
OP
Snurg

Snurg

Daemon

Reaction score: 572
Messages: 1,349

Having integrated instructions how to reproduce the bug, I feel the information generally more easy to comprehend. Thanks again, tingo !

Last bump before posting the PR, hoping for possibly more potentially helpful suggestions :)
 

debguy

Active Member

Reaction score: 23
Messages: 225

if it supports your card: pkg install x11/nvidia-driver

when it installs it displays a message (use pkg info -D x11/nvidia-driver), to use "nvidia-modeset" if you need to

(in other words, for cards that nvidia-driver supports the mode-set issue is fixed)

i have a nvidia card gtx 1050 and it works for vesa and with nvidia-driver (nvidia-modeset) well and "easily". vesa worked before i installed "nvidia". i have an older nvidia ?agp nvidia card someone gave me but have not tested it on freebsd 12.2

many older AGP cards (all - not just nvidia) HAVE SERIOUS PROBLEMS IN THE SILICON reguarding vesa - many don't support vesa at all! others refuse to switch to vesa once in text mode (that's where special procedures might work). some suport VGA text, no vesa, but refuse to go to graphics mode once in VGA mode. it's a mess. do yourself a fav and get an updated card? :) not worth what it will put you through - just not worth it.

try the simple things nvidia suggests (such as their newest drivers and help), and if it don't work: i suggest not persuing it
 
OP
Snurg

Snurg

Daemon

Reaction score: 572
Messages: 1,349

debguy I am not sure whether you understood the problem here.

The bug report addresses a hang that specifically affects people using Nvidia cards together with sc console.
I have read (not tested though) that in 2019 the bug has allegedly been fixed for vt users, probably by taking care from inside vt that this call gets skipped, but I do not use vt.

The hardware affected here are various GeForce and Quadro cards, all PCI-E.
But older Nvidia hardware is affected also probably, as it is a BIOS issue common with all Nvidia cards.
 

debguy

Active Member

Reaction score: 23
Messages: 225

successful as a government grant seeking and awarding process?
 
OP
Snurg

Snurg

Daemon

Reaction score: 572
Messages: 1,349

successful as a government grant seeking and awarding process?
well I'd just be happy if this time the PR will not lead to inaction, as the bug does a lot of reputational damage to FreeBSD as desktop.

In case one seeks grant from govt here, one at least gets a reply no later than 6 months later whether it is granted or rejected, so this is not really comparable with the hit-or miss regarding PRs.
 
Top