Xorg nVidia driver causing page faults

byuu · Jun 1, 2014

Running 10.0-RELEASE (amd64), with a GTX 760 OC 2GB from PNY. I find the 304 driver to be the most stable, but with it, I sporidically get this kernel panic upon starting Xorg:

Code:

NVRM: GPU at 0000:01:00.0 has fallen off the bus.
NVRM: RmInitAdapter failed! (0x26:0xffffffff:1200)
nvidia0: NVRM: rm_init_adapter() failed!


Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address   = 0x20
fault code      = supervisor read data, page not present
instruction pointer   = 0x20:0xffffffff820d4f8f
stack pointer           = 0x28:0xfffffe04692f9430
frame pointer           = 0x28:0xfffffe000a34cfe0
code segment      = base 0x0, limit 0xfffff, type 0x1b
         = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags   = interrupt enabled, resume, IOPL = 3
current process      = 1115 (Xorg)
trap number      = 12
panic: page fault
cpuid = 2
KDB: stack backtrace:
#0 0xffffffff808e7dd0 at kdb_backtrace+0x60
#1 0xffffffff808af8b5 at panic+0x155
#2 0xffffffff80c8e692 at trap_fatal+0x3a2
#3 0xffffffff80c8e969 at trap_pfault+0x2c9
#4 0xffffffff80c8e0f6 at trap+0x5e6
#5 0xffffffff80c75392 at calltrap+0x8
Uptime: 26s
Dumping 746 out of 16324 MB:..3%..11%..22%..33%..41%..52%..63%..71%..82%..93%

Testing with the 319 driver revealed sporadic reboots while running Xorg and seemed less stable overall.

The official nVidia packages seem to have started supporting FreeBSD 10.0 as of 331.67. I have tried both non-beta drivers, and the beta 337 driver that support 10.x, but find a different problem with them: they sporadically crash upon closing or logging out of Xorg. Worse, it seems to be more frequent than 304 crashes upon starting Xorg.

So I've exhausted every driver I can possibly run. nouveau is no longer available, nv is hopelessly slow, and vesa is worse than just using only a text-based terminal.
My mainboard doesn't support my i7-2600K's internal video, so Intel is out. And it sounds like there's a lot of known issues running an AMD card with KMS.
And I'm guessing no one is going to know of a good fix for the 304 kernel panic from above.
So I have no choice it would seem, but to run the nVidia 304 drivers anyway.

Are there any steps I can take to try and protect my system against the nvidia.ko driver causing page faults?

From what I can find, it sounds like it's possible to build a kernel that doesn't reboot on page faults, but it only dumps you into a debugger where you can't recover. But if there is a way to only kill Xorg and nvidia.ko, and return to a prompt after a page fault, that would be perfect.

I am very worried about not unmounting my ZFS drives properly after a kernel panic. What are the realistic chances of data corruption occurring due to kernel crashing during Xorg startup? And would a zfs strub always catch any errors caused due to this?

retrogamer · Jun 1, 2014

For what it's worth, I think this is a specific issue with your card. I'm using the 331.67 x11/nvidia-driver with a 660 Ti and have had no issues, even running games/xonotic. It might be worth mentioning this over on the NVIDIA Developer Forums. https://devtalk.nvidia.com/ My experience with their non-Windows drivers is that you're better off running a card that's maybe a year old or so, as they're not as quick to fix bugs with them.

byuu · Jun 2, 2014

Well, my previous card is a huge step down. Geforce GTX 760 OC -> Quadro FX 580. But ... I suppose stability is paramount.

304.88 appears to be rock solid (so far) with it. Tested by a shell script loop to unload nvidia.ko, load it, startx, kill Xorg, repeat, 200 times. Although I only triggered the GTX crash on this driver version once, 1 > 0 and > 0 is unacceptable.

331.79 seems especially unstable on the Quadro, though. Takes down the whole system upon loading nvidia-settings once, same _nv000233 function as with the GTX on logout. Not that I have any need to run bleeding edge drivers.

The Quadro + 304.88 is what I was running on Debian for the last year or two, so I think this will just have to do. At least until I can devise a way for an nvidia.ko crash to not take down my entire system. My kingdom for a modern microkernel OS with decent OpenGL 3D acceleration :/

Thank you for the suggestion on trying an older card.

Xorg nVidia driver causing page faults

byuu

retrogamer

byuu