Intermittant bug in 14.0-RELEASE DRI/crocus driver?

blackbird9 · Jan 8, 2024

I upgraded 13.2-RELEASE to 14.0-RELEASE a couple of days ago. Hardware is a thinkpad X201 with intel integrated graphics. I am loading i915kms.ko in rc.conf. I'm running a simple setup using windowmaker as the window manager. My uid is in the video group.

From cold start, running a glxgears performance test gives the following result:-

glxgears-perf-test:-

Code:

#!/bin/sh
vblank_mode=0 glxgears -samples 0

Code:

$ glxgears-perf-test
ATTENTION: default value of option vblank_mode overridden by environment.
8605 frames in 5.0 seconds = 1720.976 FPS
8980 frames in 5.0 seconds = 1795.903 FPS
9018 frames in 5.0 seconds = 1802.085 FPS

The bug is that at some later time, after running some combination of mplayer, smplayer, mpv, firefox, the FPS figure suddenly halves:-

Code:

$ glxgears-perf-test
ATTENTION: default value of option vblank_mode overridden by environment.
4795 frames in 5.0 seconds = 958.933 FPS
4825 frames in 5.0 seconds = 964.869 FPS
4790 frames in 5.0 seconds = 957.887 FPS

Once it has changed to the lower FPS figure, all subsequent tests report the lower figure. Restarting X sometimes restores full performance, whereas a reboot consistently restores full performance.

Another variant is glxgears sometimes prints an error message saying it has failed to load crocus driver.

If I do a reboot of the system, the performance is restored to the higher figure.

I have tried running various X clients one at a time and then re-testing glxgears after each try, but have been unable to find a consistent way of recreating the bug. All I can observe at this stage is that something is happening intermittantly which results in the FPS reported by glxgears halving, sometimes accompanied by an error message saying crocus could not be loaded.

Xorg.0.log shows that X is running the modesetting driver, loading glamor and crocus (see attached).

Kldstat reports:-

$ cat /tmp/kldstat.out
Id Refs Address Size Name
1 58 0xffffffff80200000 1d345d8 kernel
2 1 0xffffffff81f35000 36c8 coretemp.ko
3 1 0xffffffff81f39000 af50 cuse.ko
4 1 0xffffffff82620000 3558 fdescfs.ko
5 1 0xffffffff82624000 181c70 i915kms.ko
6 1 0xffffffff827a6000 73e80 drm.ko
7 1 0xffffffff8281a000 22b8 iic.ko
8 2 0xffffffff8281d000 1100 linuxkpi_gplv2.ko
9 3 0xffffffff8281f000 6350 dmabuf.ko
10 3 0xffffffff82826000 3080 linuxkpi_hdmi.ko
11 1 0xffffffff8282a000 12e08 fusefs.ko
12 1 0xffffffff8283d000 2200 acpi_dock.ko
13 1 0xffffffff82840000 3390 acpi_wmi.ko
14 1 0xffffffff82844000 3250 ichsmb.ko
15 1 0xffffffff82848000 2178 smbus.ko
16 1 0xffffffff8284b000 b6e0 if_lagg.ko
17 1 0xffffffff82857000 20c0 if_infiniband.ko
18 1 0xffffffff8285a000 2260 pflog.ko
19 1 0xffffffff8285d000 4d038 pf.ko
20 1 0xffffffff828ab000 2a68 mac_ntpd.ko

The list of relevent packages installed on the system (ie the result of the upgrade from 13.2 to 14.0) is:-

drm-510-kmod-5.10.163_7 DRM drivers modules
drm-kmod-20220907_1 Metaport of DRM modules for the linuxkpi-based KMS components
drm_info-2.5.0 Small utility to dump info about DRM devices
gpu-firmware-kmod-20230210_1,1 Firmware modules for the drm-kmod drivers
libdrm-2.4.116,1 Userspace interface to kernel Direct Rendering Module services
libva-2.20.0_1 VAAPI wrapper and dummy driver
libva-intel-driver-2.4.1_2 VAAPI legacy driver for Intel GMA 4500 (Gen4) to UHD 630 (Gen9.5)
libva-intel-hybrid-driver-1.0.2_3 Hybrid VP8 encoder and VP9 decoder for Intel GPUs
libva-intel-media-driver-22.4.3_1 VAAPI driver for Intel HD 5000 (Gen8) or newer
libva-vdpau-driver-0.7.4_10 VDPAU-based backend for VAAPI
libvdpau-va-gl-0.4.2_5 VDPAU driver with OpenGL/VAAPI backend
mesa-dri-22.3.7_3 OpenGL hardware acceleration drivers for DRI2+
xdriinfo-1.0.6_4 Query configuration information of DRI drivers
xf86-input-evdev-2.10.6_7 X.Org event device input driver
xf86-input-keyboard-1.9.0_5 X.Org keyboard input driver
xf86-input-libinput-1.3.0 X.Org libinput input driver
xf86-input-mouse-1.9.3_4 X.Org mouse input driver
xf86-input-synaptics-1.9.1_10 X.Org synaptics input driver
xf86-video-scfb-0.0.7_1 X.Org syscons display driver
xf86-video-vesa-2.5.0_2 X.Org vesa display driver
xorg-drivers-7.7_7 X.org drivers meta-port

I noticed the upgrade to 14.0 has pulled in drm-510-kmod and not drm-515-kmod which is reported broken; so it's not that.

Another clue:-

Before the bug occurs, mpv runs cleanly:-

Code:

$ mpv mosasaur.webm
 (+) Video --vid=1 (*) (vp9 1920x1080 23.979fps)
 (+) Audio --aid=1 --alang=eng (*) (opus 2ch 48000Hz)
error: XDG_RUNTIME_DIR is invalid or not set in the environment.
AO: [oss] 48000Hz stereo 2ch s32
VO: [gpu] 1920x1080 yuv420p
AV: 00:00:37 / 00:07:53 (8%) A-V: -0.000

Exiting... (Quit)

After that, starting firefox was enough to make the bug occur (on this occasion, but it doesn't happen consistently).

After the bug has happened and glxgears reports the slower FPS rate, mpv SOMETIMES (but not always, perhaps there is more than one bug?) produces the following error messages:-

Code:

$ mpv mosasaur.webm
 (+) Video --vid=1 (*) (vp9 1920x1080 23.979fps)
 (+) Audio --aid=1 --alang=eng (*) (opus 2ch 48000Hz)
error: XDG_RUNTIME_DIR is invalid or not set in the environment.
libEGL warning: DRI3: Screen seems not DRI3 capable
libEGL warning: DRI2: failed to authenticate
[vo/gpu/opengl] Suspected software renderer or indirect context.
[vo/gpu/drm] VT_GETMODE failed: Inappropriate ioctl for device
[vo/gpu/drm] Failed to set up VT switcher. Terminal switching will be unavailable.
WARNING: Kernel has no file descriptor comparison support: No such file or directory
AO: [oss] 48000Hz stereo 2ch s32
VO: [gpu] 1920x1080 yuv420p
AV: 00:00:13 / 00:07:53 (3%) A-V:  0.000

Exiting... (Quit)

Attached Xorg.0.log (zipped).

As a quick "am I going crazy" sanity check, I ran the same tests on ubuntu 20.04, on a different intel box also using the crocus driver, and I didn't see these problems.

Does anyone have any idea what might be causing this? Or suggest how I can debug it further?

blackbird9 · Jan 8, 2024

A bit more investigation:-

drm_info reports correct device and driver:-

───Driver: i915 (Intel Graphics) version 1.6.0 (20200917)
───Device: PCI 8086:0046 Intel Corporation Core Processor Integrated Graphics Controller

and dmesg appears to show drm being set up ok, although I don't know what the line saying it is unable to create a tmpfs mount is about?

vgapci0: child drmn0 requested pci_enable_io
vgapci0: child drmn0 requested pci_enable_io
[drm] Unable to create a private tmpfs mount, hugepage support will be disabled(-19).
[drm] Got stolen memory base 0xbe000000, size 0x2000000
lkpi_iic0: <LinuxKPI I2C> on drmn0
lkpi_iic1: <LinuxKPI I2C> on drmn0
lkpi_iic2: <LinuxKPI I2C> on drmn0
lkpi_iic3: <LinuxKPI I2C> on drmn0
lkpi_iic4: <LinuxKPI I2C> on drmn0
lkpi_iic5: <LinuxKPI I2C> on drmn0
lkpi_iic6: <LinuxKPI I2C> on drm4
[drm] Initialized i915 1.6.0 20200917 for drmn0 on minor 0
name=drmn0 flags=0x0 stride=5120 bpp=32

blackbird9 · Jan 8, 2024

After a cold start when the system is working at full speed, xdriinfo reports

me3@eep3:~ $ xdriinfo
Screen 0: crocus

After the failure mode has occurred, xdriinfo reports

me3@eep3:~ $ xdriinfo
libGL error: failed to authenticate magic 1
libGL error: failed to load driver: crocus
Screen 0: swrast

In this state, restarting the X server (log out then back in) but not rebooting the box is enough to cause xdriinfo to report screen 0: crocus again, but the glxgears test shows the half-speed FPS rate.

I think there is more than one bug here.

Erichans · Jan 8, 2024

I'm a little hesitant to make a suggestion because of lack of knowledge of the graphics (kernel) subsystems.

As your CPU/GPU is seasoned, you could try x11-drivers/xf86-video-intel instead of the modesetting driver. I see in your Xorg log:

Code:

[    38.687] (II) LoadModule: "intel"
[    38.688] (WW) Warning, couldn't open module intel
[    38.688] (EE) Failed to load module "intel" (module does not exist, 0)

Have you specified the intel driver in your X conf settings, or is the attempted loading sequence of the intel driver first and then the modesetting driver a result of having an intel GPU?
With the intel driver you could perhaps find a different error pattern.

___
P.S. I don't quite understand:

Code:

[    38.718] (WW) VGA arbiter: cannot open kernel arbiter, no multi-card support

as you have only the GPU inside the CPU and no other graphics unit, right?

blackbird9 · Jan 8, 2024

After trawling the internet I finally found the bug, or at least one of them. It's an open bug, as here:-

267915 – x11-wm/mutter: after graphics/mesa-* 21.3.8 -> 22.2.3 update OpenGL compositing fails on Intel Iris Pro

bugs.freebsd.org

To recreate: cold start the hardware; log into X session. Open a terminal and run xdrinfo, check it reports "Screen 0: crocus". Then run glxgears and check it shows the full speed, which is about 1800 FPS on my X201.
Then, ctrl-alt-F4 to change to another VT, and then back to VT 9 for X. Now running xdriinfo says

libGL error: failed to authenticate magic 1
libGL error: failed to load driver: crocus
Screen 0: swrast

And this bug is reliably recreatable.
Running glxgears with the software driver (swrast) shows about half the FPS.

So this at least appears to explain some of what I have been seeing. It looks like a VT switch crashes the crocus driver and the system then switches to using swrast. Or maybe it just forces a switch to swrast which then never gets switched back to crocus when you switch back to X. In my earlier testing when I was trying to isolate it to running other X clients, I was doing VT switches as well and not realising that those might be relevent haha!

I will do some more testing to see if I ever see the glxgears performance reduction when I make no VT switches.

Perhaps I need to switch back to the intel driver for the time being as ericchans suggested, if that is viable in 14.0. Or just live with no VT switching, but that's a major limitation, I use VT's all the time.

I don't remember this happening on 13.2... but perhaps it was happening without me noticing. After upgrade to 14.0 I did some more careful testing.

blackbird9 · Jan 8, 2024

Erichans said:
I'm a little hesitant to make a suggestion because of lack of knowledge of the graphics (kernel) subsystems.

As your CPU/GPU is seasoned, you could try x11-drivers/xf86-video-intel instead of the modesetting driver. I see in your Xorg log:

Code:

[ 38.687] (II) LoadModule: "intel" [ 38.688] (WW) Warning, couldn't open module intel [ 38.688] (EE) Failed to load module "intel" (module does not exist, 0)

Have you specified the intel driver in your X conf settings, or is the attempted loading sequence of the intel driver first and then the modesetting driver a result of having an intel GPU?
With the intel driver you could perhaps find a different error pattern.

___
P.S. I don't quite understand:

Code:

[ 38.718] (WW) VGA arbiter: cannot open kernel arbiter, no multi-card support

as you have only the GPU inside the CPU and no other graphics unit, right?

Interesting, I didn't spot that! Yes I only have one GPU in the system, it's the integrated gpu in the northbridge. It's only a warning so perhaps it's spurious, but well spotted, I'll see if I can find anything out.

blackbird9 · Jan 8, 2024

I installed xf86-video-intel. Xorg.0.log now shows the intel driver being used, modesetting is unloaded. The maximum performance I get from glxgears is only about 1100 FPS with the intel driver compared to 1800 with the modesetting driver.

However the intel driver itself loads the crocus module as its DRI driver, and the bug on VT switch still occurs, so I'm no further forward in solving the bug and performance for goodpath has gone backwards

So I think for the time being I'll take xf86-video-intel back off, use modesetting, and workaround when I need to do VT switches (usually when I need to do something as root, since I've enabled the security feature that stops normal users su'ing to root). It's a bit of a drag. Hopefully the developers will get a look at this bug and fix it. The bug appears to be in the crocus driver itself, as far as I can tell. It's kind of a shame, everything else appears to be working so well and the upgrade to 14.0-RELEASE was so smooth.

blackbird9 · Jan 8, 2024

PS I ran this same test on my ubuntu 20.04 box, ie xdriinfo/glxgears, check crocus/good perf, do a VT switch, then back to X and retest. No bug, all workies, at least on that system, which suggests that somewhere there is a fix for this.

blackbird9 · Jan 8, 2024

Erichans said:
I'm a little hesitant to make a suggestion because of lack of knowledge of the graphics (kernel) subsystems.

As your CPU/GPU is seasoned, you could try x11-drivers/xf86-video-intel instead of the modesetting driver. I see in your Xorg log:

Code:

[ 38.687] (II) LoadModule: "intel" [ 38.688] (WW) Warning, couldn't open module intel [ 38.688] (EE) Failed to load module "intel" (module does not exist, 0)

Have you specified the intel driver in your X conf settings, or is the attempted loading sequence of the intel driver first and then the modesetting driver a result of having an intel GPU?
With the intel driver you could perhaps find a different error pattern.

___
P.S. I don't quite understand:

Code:

[ 38.718] (WW) VGA arbiter: cannot open kernel arbiter, no multi-card support

as you have only the GPU inside the CPU and no other graphics unit, right?

"As your CPU/GPU is seasoned"

blackbird9 · Jan 8, 2024

On the VGA arbiter message, it appears to be generated somewhere in the graphics stack, it's been noticed on linux too, see this thread:-

[SOLVED] Help understanding Xorg most common log messages / Newbie Corner / Arch Linux Forums

bbs.archlinux.org

I think in this case it's spurious, nothing to be concerned about, it likely just means there's no mutli-card support as it says.

blackbird9 · Jan 8, 2024

I've added a bug report to the bugzilla, referencing this thread. Hopefully this will get fixed.

267915 – x11-wm/mutter: after graphics/mesa-* 21.3.8 -> 22.2.3 update OpenGL compositing fails on Intel Iris Pro

bugs.freebsd.org

blackbird9 · Jan 8, 2024

Just for completeness I did a suspend-resume cycle. Sadly that involves a VT switch which recreates the bug. In this case restarting the X server after the resume appeared to re-load crocus with full performance, although my guess is I got lucky this time.

blackbird9 · Jan 8, 2024

This bug appears to have been seen before:-

Solved - No 3D acceleration after recent updates

I recently updated my 13.1 desktop, both system and installed ports, using freebsd-update and pkg upgrade. Everything seems to be working except blender - it's painfully slow, and maxes out the CPUs. Other 3D applications like FreeCAD, Prusa Slicer, and gmsh work smoothly. However, while I...

forums.freebsd.org

Although that thread is marked 'solved' I cannot see that any solution was found.

So: it's not new in 14.0-RELEASE. It's not a regression.

Erichans · Jan 9, 2024

blackbird9 said:
I noticed the upgrade to 14.0 has pulled in drm-510-kmod and not drm-515-kmod which is reported broken; so it's not that.

Good to know, going with "510" seems appropriate for the time being.

blackbird9 said:
I installed xf86-video-intel. Xorg.0.log now shows the intel driver being used, modesetting is unloaded. The maximum performance I get from glxgears is only about 1100 FPS with the intel driver compared to 1800 with the modesetting driver.

I realise that the intel driver is a legacy one, but still a significant FPS improvement using modesetting. From your X logs I noticed that on modesetting glamor acceleration is enabled (no glamor loaded with intel); that may be one aspect of the performance difference.

blackbird9 · Jan 9, 2024

I got a reply from a developer on the bugzilla report. He says "It's not a mesa problem, it's due to a rewriting of the vt intergration, and the last rewrite didn't fixed it." From the bugzilla history, it appears that the bug has been present since 13.1-RELEASE or possibly earlier.

So it's not the crocus driver or xorg that's at fault, it's the freebsd VT infrastructure. That explains why I didn't see the problem on linux, which also uses modesetting with glamor and crocus. Sadly the developer says its unlikely to get fixed. I guess fixing something like this isn't going to be a priority for corporate users who are focused on servers.

For anyone else reading this, in a nutshell, the bug is that switching to another VT once the gui is running kills GPU hardware acceleration and forces the system to fall back to an unaccelerated software driver. In my testing I found this has various consequences depending on what program is being run; I found it is pathological for watching a video fullscreen with mpv, in which case you are locked out of the desktop until the video ends with no way to recover; that was the most severe adverse consequence I have seen so far; of course my testing is not exhaustive.

The bug also happens on suspend-resume, at least on my machine. I found after a resume, restarting the X server by logging out and back in was enough to get hardware accel working again. Whereas following a manual VT switch, the only reliable way to get accel working was a full reboot. But maybe I just got lucky with the suspend-resume case.

The developer says he uses wayland, never switches VT, and doesn't see the loss of hardware accel on suspend-resume with wayland, so maybe there is a kind of workaround if you put up with the limitation of never using VT's and use wayland.

As things stand for users running X11, basically it's broken. All you can do is never do a VT switch and remember to restart X after suspend-resume which is a bit of a pain; or just avoid running any programs that use gpu hardware accel, which is also a major pain for a desktop. It's a real shame finding this bug, as the rest of the system seems pretty solid, and the release upgrade went very smoothly. This was a nasty subtle bug that took a lot of tracking down, because switching VT's is such a fundamental operation that has worked for decades, you don't suspect there would be problems with it in a stable release.

Based on what the developer says I assume the bug is likely to be present on all hardware, rather than being due to my hardware being "seasoned" (older thinkpad).

I think I might give puffy a try, or maybe go back to slack.

blackbird9 · Jan 9, 2024

Erichans said:
Good to know, going with "510" seems appropriate for the time being.

I realise that the intel driver is a legacy one, but still a significant FPS improvement using modesetting. From your X logs I noticed that on modesetting glamor acceleration is enabled (no glamor loaded with intel); that may be one aspect of the performance difference.

Yes I wondered that, I was pleased to see the significant performance improvement with modesetting/glamor/crocus over the intel driver; the guys who did that work have done a good job

Maybe I could have tweaked the performance higher on the intel driver. But as the VT switch bug still occurred I didn't see much point in experimenting further.

blackbird9 · Jan 9, 2024

To check if your system has this bug, run this simple test (assuming X11, not wayland):-

Boot machine, start X server, log into an X session.

You can find the name of your DRI accelerated driver in /var/log/Xorg.0.log, look for a "DRI driver" line like this:-
[ 16.864] (II) modeset(0): [DRI2] DRI driver: crocus
In this case we are using the modesetting driver.

Next run xdriinfo in a terminal, note output correctly shows "Screen 0: crocus".

Here 'crocus' is the name of the accelerated gpu driver for intel, if you have non-intel graphics the driver name may be different. However this confirms that gpu acceleration is active.

Do a VT switch to a different VT, then switch back to VT 9 (the X11 VT).

Repeat the xdriinfo test and note the following erroneous output:-

$ xdriinfo
libGL error: failed to authenticate magic 1
libGL error: failed to load driver: crocus
Screen 0: swrast

Showing the system has failed to load the accelerated DRI driver and has changed to the software rasterizer 'swrast' following the VT switch, confirming the bug is present.

Cath O'Deray · Jan 10, 2024

blackbird9 said:
… I have been told by development is not going to be fixed. …

Where exactly is that stated?

Let's make things clear, so that the magazine article by piizog can paint the truest possible picture. Related:

amdgpu: Acceleration disabled by VT switch · Issue #175 · freebsd/drm-kmod

Describe the bug After switching to a text mode VT and switching back to X's VT, OpenGL acceleration no longer works and programs are falling back to software rendering. glxgears and other programs...

github.com

PMc · Jan 10, 2024

blackbird9 said:
I got a reply from a developer on the bugzilla report. He says "It's not a mesa problem, it's due to a rewriting of the vt intergration, and the last rewrite didn't fixed it." From the bugzilla history, it appears that the bug has been present since 13.1-RELEASE or possibly earlier.

So it's not the crocus driver or xorg that's at fault, it's the freebsd VT infrastructure. That explains why I didn't see the problem on linux, which also uses modesetting with glamor and crocus. Sadly the developer says its unlikely to get fixed. I guess fixing something like this isn't going to be a priority for corporate users who are focused on servers.

That has nothing to do with "corporate". Servers are fun, one can take things apart, everything builds logically on another, and when assembling it correctly, it will certainly work. This graphics beast however is one unintellegible moloch, and one can not even look at it, because - well obviousely it is the thing that looks.
Those were wonderful times when there was in-kernel drm2 that did just work. It got ever worse since then, the moloch bigger and bigger, and why? Because people want support for ever newer GPUs.
I was definitely not on VT, and the thing was lost nevertheless:

failed to authenticate magic 1
failed to load driver: crocus

I mostly don't notice it anymore - that's a 12 year old machine and it runs fine in fullscreen (WQHD) even so. Probably the newer ones can't do that anymore.

Erichans · Jan 10, 2024

It seems that this "VT-switching<->severely impacting graphics acceleration" is at least found in amd and intel graphics drivers working with KMS devices. Those are "home grown/open source". Does the problem also come up with proprietary drivers like Nvidia?

blackbird9 · Jan 10, 2024

grahamperrin said:
Where exactly is that stated?

Let's make things clear, so that the magazine article by piizog can paint the truest possible picture. Related:

amdgpu: Acceleration disabled by VT switch · Issue #175 · freebsd/drm-kmod

Describe the bug After switching to a text mode VT and switching back to X's VT, OpenGL acceleration no longer works and programs are falling back to software rendering. glxgears and other programs...

github.com

Hi Graham, I'm referring to the reply I got from a developer on the bugzilla for this problem. I quote:-

"It's not a mesa problem, it's due to a rewriting of the vt intergration, and the last rewrite didn't fixed it.
SO yeah that's something that someone (tm) should work on but since I don't have the problem with wayland on suspend/resume and I never switch VT I don't have the motivation right now."

I guess to be fair he does say he doesn't have the motivation "right now" so it leaves open the possibility it may be fixed at some time in the future! But this bugzilla report has been open for over a year since 11/2022, going by the history in the bugzilla, and the reply I got indicates not to expect a fix any time soon. I was a bit disappointed with that response, tbh; I did offer my time to test any patches they wanted testing. Of course he may not be the author of the faulty code.

That amdgpu bug link you've provided appears to be the same bug. Interesting that one of the posts says he can't reproduce it on 14.0-RELEASE, which is what I have been testing on intel gpu.

Some good news, a workaround for the suspend-resume case has been found this morning (thanks to smithi!), I have added a note to the bugzilla. The workaround is to set
sysctl kern.vt.suspendswitch=0
which supresses the VT switch on suspend-resume when running an X11 desktop. This prevents the loss of gpu hardware acceleration following a suspend-resume cycle when using X11.

However the underlying problem remains, that a manual VT switch from the X11 (or wayland) gui session to a different VT will result in loss of gpu hardware acceleration when the user switches back to the gui session. The developer's suggested workaround for this is never to use VT's, which I think is a major limitation. However, from the point of view of freebsd used purely as a desktop, perhaps this not a major issue. Speaking for myself, as a software developer using VT's is standard practice of course.

blackbird9 · Jan 10, 2024

PMc said:
That has nothing to do with "corporate". Servers are fun, one can take things apart, everything builds logically on another, and when assembling it correctly, it will certainly work. This graphics beast however is one unintellegible moloch, and one can not even look at it, because - well obviousely it is the thing that looks.
Those were wonderful times when there was in-kernel drm2 that did just work. It got ever worse since then, the moloch bigger and bigger, and why? Because people want support for ever newer GPUs.
I was definitely not on VT, and the thing was lost nevertheless:

I mostly don't notice it anymore - that's a 12 year old machine and it runs fine in fullscreen (WQHD) even so. Probably the newer ones can't do that anymore.

Yeah, all I meant was that the corporates who use freebsd in large-scale deployments in server applications (netflix et al, who contribute code and funding) are probably not affected by this bug so they probably won't be phoning up the new marketing lady to get it fixed ;-)

Did you try it from a cold boot? Try the test before you do any suspend or switch to another VT, and then after a VT switch. I found straight after a cold boot I get full hardware acceleration, and video playback speed (eg in mpv and mplayer, and youtube videos in firefox) is nice and fast. Following the VT switch and change to the swrast driver its a lot slower, because, of course, it's unaccelerated. Of course I've only got a couple of machines here so I am unable to test a wide range of hardware.

On my system at least, playing a video with mpv full screen in the unaccelerated state results in being locked out of the system, there's no way to kill mpv, it blocks switching to another vt, and all you can do is sit there and wait till the end of the video to regain control. Maybe it would be possible to ssh in and kill mpv manually but that requires another computer and I didn't try it. I didn't see that same problem with mplayer. Perhaps using unaccelerated gpu exposes a bug in mpv. However I focused on solving the gpu acceleration problem rather than looking into mpv itself.

blackbird9 · Jan 10, 2024

Erichans said:
It seems that this "VT-switching<->severely impacting graphics acceleration" is at least found in amd and intel graphics drivers working with KMS devices. Those are "home grown/open source". Does the problem also come up with proprietary drivers like Nvidia?

Don't know without testing it, I don't have any nvidia hardware, however, the reply I got on the bugzilla says the bug is actually in the VT integration rather than the video driver itself. It would be interesting to know though.

PMc · Jan 10, 2024

blackbird9 said:
Did you try it from a cold boot? Try the test before you do any suspend or switch to another VT, and then after a VT switch. I found straight after a cold boot I get full hardware acceleration, and video playback speed (eg in mpv and mplayer, and youtube videos in firefox) is nice and fast. Following the VT switch and change to the swrast driver its a lot slower, because, of course, it's unaccelerated. Of course I've only got a couple of machines here so I am unable to test a wide range of hardware.

I know about it for a long time already, certainly more than a year. Once I started to wonder why my fan is suddenly so loud when playing videos. So I figured out what the problem is.
Then I found some notices here and there, showing that others well know that there is an issue. And, since it is a BIG effort to start debugging this structure, and since there was a lot talk and people involved back when the graphics driver went into ports, I think there are people with a much better starting point than me, to debug and fix this - so I assumed it would happen at some time.

Let's calculate: my i5-3570T can run WQHD (2560x) at full load. So the real trouble will probably appear with either 4K or with only two cores.

blackbird9 said:
On my system at least, playing a video with mpv full screen in the unaccelerated state results in being locked out of the system, there's no way to kill mpv, it blocks switching to another vt, and all you can do is sit there and wait till the end of the video to regain control.

There certainly is something you can do about that:

Code:

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
63367 pmc          27  41    0   700M   301M sigwai   3   3:08 207.20% vlc
 5692 root          5  34    0   460M   165M select   0  70:47  22.76% Xorg
59668 pmc          20  22    0  2475M   150M select   1   1:25   3.97% firefox

It is apparently the user process that eats cpu. So you can do all the fancy things:
- limit the number of CPUs to use with cpuset
- tell it to stay modestly behind with idprio
- limit the max CPU consumption with rctl
I would limit the user display session to some 75% of cpu - better the video gets distorted than the machine unresponsive.

blackbird9 · Jan 10, 2024

PMc said:
I know about it for a long time already, certainly more than a year. Once I started to wonder why my fan is suddenly so loud when playing videos. So I figured out what the problem is.
Then I found some notices here and there, showing that others well know that there is an issue. And, since it is a BIG effort to start debugging this structure, and since there was a lot talk and people involved back when the graphics driver went into ports, I think there are people with a much better starting point than me, to debug and fix this - so I assumed it would happen at some time.

Let's calculate: my i5-3570T can run WQHD (2560x) at full load. So the real trouble will probably appear with either 4K or with only two cores.

There certainly is something you can do about that:

Code:

PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 63367 pmc 27 41 0 700M 301M sigwai 3 3:08 207.20% vlc 5692 root 5 34 0 460M 165M select 0 70:47 22.76% Xorg 59668 pmc 20 22 0 2475M 150M select 1 1:25 3.97% firefox

It is apparently the user process that eats cpu. So you can do all the fancy things:
- limit the number of CPUs to use with cpusetEdit
- tell it to stay modestly behind with idprio
- limit the max CPU consumption with rctl
I would limit the user display session to some 75% of cpu - better the video gets distorted than the machine unresponsive.

Em, all good points!. Actually with the workaround to maintain hardware accelerated graphics across suspend-resume events, everything now works well (so long as I don't switch to another VT!), it was only with the swrast driver that I saw a problem.

I've done a bit of belt and braces stress testing this afternoon. Ran a parallelised code compilation job in the background (in a continuous loop) to give the cpu and SSD something to do so all cpus are constantly running at 100% (as per htop) and at 100% C0 (per i7z) ("I can't let you do that, Flynn; I'm going to have to bring you down here on the game grid..."). Then played a 1080p video fullscreen using mpv (in a different virtual desktop), which forces mpv to scale it down to the X201's 1280x800 display panel. Using the hardware accelerated driver, it was pleasing to see the video playing smoothly throughout with sound lipsync maintained despite the high load. Human brain is pretty good at detecting loss of lipsync in video playback. CPU temp went up from 40C at idle to 83 deg C which is quite hot so I only left it cooking like that for 5 minutes, its only got a little fan and I don't want to damage this old laptop, but the system remained responsive, I was able for example to browse the web with firefox while the background compilation job was running and the fullscreen video also playing in different virtual desktops, with me switching between desktops regularly. Quite impressed with the performance on this old hardware, I was expecting the video playback to be jerky or lose sync but it didn't miss a beat. Nothing crashed either

I didn't do any playing around with nice-ing processes etc in this test, just used the default priorities. The key is to use the hardware accelerated crocus DRI driver, the swrast driver has no chance of delivering this performance. The modesetting xorg driver with glamor and crocus is pretty good, on old intel hardware. Of course I haven't tried actual GL 3d yet

As for the VT switch bug, I think for the time being I'll just have to accept not using VT's which is a bummer but not a total show-stopper. And its probably true that for use purely as a desktop PC, having other VT's available is probably not very important. Hopefully it will get fixed at some point though.

Intermittant bug in 14.0-RELEASE DRI/crocus driver?

blackbird9

Attachments

blackbird9

blackbird9

Erichans

blackbird9

267915 – x11-wm/mutter: after graphics/mesa-* 21.3.8 -> 22.2.3 update OpenGL compositing fails on Intel Iris Pro

blackbird9

blackbird9

Attachments

blackbird9

blackbird9

blackbird9

[SOLVED] Help understanding Xorg most common log messages / Newbie Corner / Arch Linux Forums

blackbird9

267915 – x11-wm/mutter: after graphics/mesa-* 21.3.8 -> 22.2.3 update OpenGL compositing fails on Intel Iris Pro

blackbird9

blackbird9

Solved - No 3D acceleration after recent updates

Erichans

blackbird9

blackbird9

blackbird9

Cath O'Deray

amdgpu: Acceleration disabled by VT switch · Issue #175 · freebsd/drm-kmod

PMc

Erichans

blackbird9

amdgpu: Acceleration disabled by VT switch · Issue #175 · freebsd/drm-kmod

blackbird9

blackbird9

PMc

blackbird9