1d459 [Solved] Serious bug - X11 killed my system! - The FreeBSD Forums
The FreeBSD Forums  

Go Back   The FreeBSD Forums > Desktop Usage > X.Org

X.Org X.Org on FreeBSD installation & configuration.

Reply
 
Thread Tools Display Modes
  #1  
Old October 6th, 2011, 19:08
Ruler2112's Avatar
Ruler2112 Ruler2112 is offline
Member
 
Join Date: Sep 2009
Location: Michigan, USA
Posts: 408
Thanks: 11
Thanked 23 Times in 21 Posts
Default Serious bug - X11 killed my system!

Just a warning to those who may be inclined to follow the line in the handbook that reads:

Code:
As of version 7.3, Xorg can often work without any configuration file by simply typing at prompt:

startx
DON'T DO IT!!! I installed my system, then spent the better part of a week compiling/installing various ports. There were issues, like when libreoffice crashes and the KDE screensaver crashes - I was really hoping that a standalone video card would help clear up these problems. I shut down the system, installed an nVidia PCI video card, and booted back up. Everything was going well until I decided to believe that line in the handbook...

Knowing my old /etc/X11/xorg.conf wouldn't work because the onboard video was based on a radeon chipset, I deleted the file. Then I foolishly decided to see if I could bypass the X11 configuration that I went through last time by simply typing startx from a user account, figuring the worst that could happen was that X/KDE would fail to start and that I'd have to break out from a different console. How wrong I was!

I got a fast clicking/buzzing noise from my speakers after the screen went black. I knew something had gone wrong, but figured it was no big deal - I'll just Control-Alt-F1 to get to the root shell I had open on the first console and kill the process from there. No such luck. Nothing I did was able to get the screen to do anything but stay black (not no signal, but black) and nothing aside from unplugging the speaker would stop that annoying buzz. Tried switching to different consoles, tried using control-alt-delete to reboot, tried accessing over the network (before realizing I hadn't set that up yet ), and still couldn't get any response. Tapping scroll lock twice to switch to another machine on my KVM didn't even work. I let it sit for about 20 minutes in the hope it'd eventually respond. It never did and I ended up having to hold down the power button 4 seconds to get it to shut off.

Powered back on and what a surprise - the file system check complained that / wasn't properly dismounted and started a check. After the file system check completed (with the below lines resulting), it continued booting. Unfortunately, it went extremely slowly and eventually stalled out, refusing to boot any farther. Again the keyboard was frozen and I had to do a hard power-off.

Code:
INCORRECT BLOCK COUNT I=1885000 (4 should be 0) (CORRECTED)
INCORRECT BLOCK COUNT I=2802692 (4 should be 0) (CORRECTED)
INCORRECT BLOCK COUNT I=2826427 (4 should be 0) (CORRECTED)
The same process repeated itself twice more. Then tried booting from both other drives in the mirror with the same result. One interesting point is that it doesn't freeze up at the same point every time - one time the last line shown will be the start of inetd, the next when hald is starting, etc. There's no pattern that I can determine. Even left the system sit on overnight in the hopes that the fsck was still running and somehow blocking the boot process and hence would eventually complete allowing the boot to continue, but to no avail - still froze up in the same spot when I got in this morning.

I can get into single user mode, but because the boot process doesn't stop/lock up in any consistent way, don't really know what to do in order to repair whatever damage was done. The file system check completes at boot time, so I don't see where that would help. Since the mirror was active at the time this happened, all 3 copies of my system were taken down. (Already made a mental note to have one drive be a static backup that is not in the mirror; I'll write something that will mirror the root file system once a week or so if/when the system gets back up & running.) I thought that disabling one or more startup daemons might help, but it doesn't seem to. At this point, I'm about ready to wipe it and start over, which I REALLY hate to do after putting so much time & work in on it, but I really don't see what other option I have.



I guess the point of this post is to not believe what it says in the handbook about startx working without configuration first. This is exactly what I did as a normal user and it nuked the system somehow. If anybody has ideas of how I could troubleshoot or isolate or repair what was screwed up, I'd appreciate them. If a developer reads this, you might want to look at how something that a normal user does makes a system unbootable. (I'm willing to provide any information you ask in the way of log files and such, but unless I figure something out, the system's getting wiped and reinstalled so it can compile over the weekend.)
__________________
This message is made up of not less than 90% recycled electrons.
Reply With Quote
  #2  
Old October 6th, 2011, 22:38
Mormegil's Avatar
Mormegil Mormegil is offline
Junior Member
 
Join Date: Jul 2009
Posts: 59
Thanks: 10
Thanked 5 Times in 4 Posts
Default

I've had no trouble running X without a config on multiple machines. I also don't know whether I'd blame X for the system being unbootable. It sounds like you ran into an issue specific to your setup, and your filesystem got hosed by the hard reboot.

Sucks
Reply With Quote
  #3  
Old October 7th, 2011, 01:38
redw0lfx redw0lfx is offline
Junior Member
 
Join Date: Aug 2011
Posts: 94
Thanks: 18
Thanked 14 Times in 13 Posts
Default

I been running XOrg without an XOrg config file and haven't had issues. I really doubt starting X without a config file hosed your system... but could have attributed to the original lock up though when it was trying to probe and load the correct video driver and monitor settings.

The error you posted looks like something was probably writing to your filesystem and the hard reboot hosed it. For the slowness experienced during start up, I have noticed that FreeBSD actually runs a filesystem check while the system is still starting up (someone correct my if I am wrong). For example, after hard rebooting my system, it will normally boot up and run slow for a few minutes (with the HDD light showing activity) and /usr being scan/checked. Maybe this is what is happening to you?

I would try booting into single user mode, check the filesystem and then proceed and see if it locks up.
Reply With Quote
  #4  
Old October 7th, 2011, 11:24
SirDice's Avatar
SirDice SirDice is offline
Moderator
 
Join Date: Nov 2008
Location: Rotterdam, Netherlands
Posts: 13,702
Thanks: 47
Thanked 2,022 Times in 1,861 Posts
Default

Quote:
Originally Posted by redw0lfx View Post
For the slowness experienced during start up, I have noticed that FreeBSD actually runs a filesystem check while the system is still starting up (someone correct my if I am wrong).
Correct. The fsck(8) gets run in the background. Traditionally you'd have to wait for it to finish before it continued to boot.
__________________
Senior UNIX Engineer at Unix Support Nederland
Experience is something you don't get until just after you need it.
Reply With Quote
  #5  
Old October 7th, 2011, 14:04
Zare Zare is offline
Member
 
Join Date: Nov 2008
Location: Split, Dalmatia
Posts: 360
Thanks: 26
Thanked 50 Times in 41 Posts
Default

Quote:
I guess the point of this post is to not believe what it says in the handbook about startx working without configuration first
Sorry, but your point is invalid.

Your system is broken somewhere, X has nothing to do with it. The X crash was unrecoverable because Xorg directly accesses hardware. It's a hardware problem.

X autoconfiguration is a pretty normal thing; uses PCI IDs to autoload graphics driver, EDID for display configuration, etc. It's not running "without configuration", it's compiling configuration on the fly. It's what we used to do with X -configure, just done in run-time.

You'd have same configuration if you executed
# X -configure
and then copied the file to /etc/X11/xorg.conf.

People who used to autodetect settings and then ran X, can run X without generating the static configuration file. People who used 3rd party drivers or tweaks/stuff, won't. Simple as that.

Last edited by DutchDaemon; October 7th, 2011 at 18:01.
Reply With Quote
  #6  
Old October 7th, 2011, 15:35
Dies_Irae Dies_Irae is offline
Junior Member
 
Join Date: Sep 2011
Posts: 53
Thanks: 3
Thanked 13 Times in 11 Posts
Default

I think that the timeline of your apocalypse is:

Quote:
Originally Posted by Ruler2112 View Post
I shut down the system, installed an nVidia PCI video card, and booted back up.
<cut>
Knowing my old /etc/X11/xorg.conf wouldn't work because the onboard video was based on a radeon chipset
Maybe there is a conflict between the PCI NVidia and the onboard Radeon? Or a bug in X autoconfiguration with this type of video card mix? I don't know, but the result is that X crashed leaving you with a black screen.

Quote:
Originally Posted by Ruler2112 View Post
I got a fast clicking/buzzing noise from my speakers after the screen went black.
This sounds like a kernel panic with data written to unknown/random addresses.

Quote:
Originally Posted by Ruler2112 View Post
I'll just Control-Alt-F1 to get to the root shell
When the kernel panics you have 15 seconds to hit a key to stop the countdown - otherwise the system reboots.
But you have stopped the countdown by trying to get into a console, so now you are in a deadlock: you are waiting your system, and your system is waiting you. The only way out is a reset.

Quote:
Originally Posted by Ruler2112 View Post
Powered back on and what a surprise - the file system check complained that / wasn't properly dismounted and started a check. After the file system check completed (with the below lines resulting), it continued booting. Unfortunately, it went extremely slowly and eventually stalled out, refusing to boot any farther. Again the keyboard was frozen and I had to do a hard power-off.
After the reset (and even after the kernel panic) your filesystems needs some check (remember that they were not cleanly unmounted), and when fsck runs any disk access is slow - despite the fact that the check is in the background. Moreover, you have resetted the machine while fsck was trying to fix your already-dirt-filesystems.

You should let the system boot and check the filesystems - it may take some time. When this happens to me I usually login at the console, run top(1) and wait for fsck to finish his work.
Next, you could try to remove the pci card and use only the onboard one, with VESA drivers.

Hope this helps.
__________________
O quam contempta res est homo, nisi supra humana surrexerit. (Seneca)
Reply With Quote
  #7  
Old October 7th, 2011, 16:07
SirDice's Avatar
SirDice SirDice is offline
Moderator
 
Join Date: Nov 2008
Location: Rotterdam, Netherlands
Posts: 13,702
Thanks: 47
Thanked 2,022 Times in 1,861 Posts
Default

Quote:
Originally Posted by Dies_Irae View Post
You should let the system boot and check the filesystems - it may take some time. When this happens to me I usually login at the console, run top(1) and wait for fsck to finish his work.
Even better would be to boot to single user mode and run fsck there. Remember that fsck cannot fix certain filesystem errors if the filesystem is mounted.
__________________
Senior UNIX Engineer at Unix Support Nederland
Experience is something you don't get until just after you need it.
Reply With Quote
  #8  
Old October 7th, 2011, 17:12
roddierod's Avatar
roddierod roddierod is offline
Member
 
Join Date: Nov 2008
Location: On my Slingerlands!
Posts: 638
Thanks: 29
Thanked 72 Times in 62 Posts
Default

Quote:
Originally Posted by Dies_Irae
I think that the timeline of your apocalypse is:

Quote:
Originally Posted by Ruler2112
I shut down the system, installed an nVidia PCI video card, and booted back up.
<cut>
Knowing my old /etc/X11/xorg.conf wouldn't work because the onboard video was based on a radeon chipset
Maybe there is a conflict between the PCI NVidia and the onboard Radeon? Or a bug in X autoconfiguration with this type of video card mix? I don't know, but the result is that X crashed leaving you with a black screen.
I have a setup like this. I had to disable the on board video in the BIOS or all heck breaks lose.
Reply With Quote
  #9  
Old October 10th, 2011, 21:53
Ruler2112's Avatar
Ruler2112 Ruler2112 is offline
Member
 
Join Date: Sep 2009
Location: Michigan, USA
Posts: 408
Thanks: 11
Thanked 23 Times in 21 Posts
Default

Thanks for all the input guys.



Quote:
Originally Posted by redw0lfx View Post
The error you posted looks like something was probably writing to your filesystem and the hard reboot hosed it.
Quote:
Originally Posted by Mormegil View Post
...your filesystem got hosed by the hard reboot.

Sucks
I believe this is what happened. Went back and discovered I hadn't enabled soft updates or journalling when installing initially, so the hard reboot is what probably killed something.

I did notice that journalling isn't even available from the menu without going into the newfs options and manually adding -J there... Isn't journalling what saves you in the event of a power loss/unexpected reboot? (UPDATE: The system complains about a journal provider not being found on reboot with this config.)


Quote:
Originally Posted by roddierod View Post
I have a setup like this. I had to disable the on board video in the BIOS or all heck breaks lose.
Quote:
Originally Posted by Dies_Irae View Post
Maybe there is a conflict between the PCI NVidia and the onboard Radeon?
From past nightmares trying to mix onboard & standalone video cards, I disabled the onboard video in the BIOS before booting the first time after slapping the PCI card in, so there should be no problem there.


Quote:
Originally Posted by redw0lfx View Post
For the slowness experienced during start up, I have noticed that FreeBSD actually runs a filesystem check while the system is still starting up (someone correct my if I am wrong). For example, after hard rebooting my system, it will normally boot up and run slow for a few minutes (with the HDD light showing activity) and /usr being scan/checked. Maybe this is what is happening to you?
Quote:
Originally Posted by Dies_Irae View Post
You should let the system boot and check the filesystems - it may take some time. When this happens to me I usually login at the console, run top(1) and wait for fsck to finish his work.
Next, you could try to remove the pci card and use only the onboard one, with VESA drivers.
It's more than slow when starting up - it's slow and then stops completely. Left it trying to boot overnight and when I came back ~18 hours later, it was still sitting at the same exact place just as unresponsive as when I left.

The onboard video is quite slow and I was hoping that a standalone video card would eliminate the KDE screen saver crashing, the bug report about which was linked in my original post on the subject. (Obviously not everybody with KDE 3.5.10 has the problem...)


Quote:
Originally Posted by redw0lfx View Post
I would try booting into single user mode, check the filesystem and then proceed and see if it locks up.
Quote:
Originally Posted by SirDice View Post
Even better would be to boot to single user mode and run fsck there. Remember that fsck cannot fix certain filesystem errors if the filesystem is mounted.
Just tried that - fixed a bunch of stuff, but made no difference. Rebooted and froze up during the boot sequence again, so I did a binary wipe and am reinstalling now.


Quote:
Originally Posted by Dies_Irae View Post
Or a bug in X autoconfiguration with this type of video card mix? I don't know, but the result is that X crashed leaving you with a black screen.
That was my thought. I wouldn't think that a PCI nVidia GeForce6200 is all that uncommon of a video card though...

If X were to have just crashed and returned me to a prompt, it wouldn't have been a problem. Being stuck at a black screen and unable to do anything to affect the system is ridiculous IMO. (Out of curiosity, do the Num Lock, Caps Lock, and Scroll Lock lights flash on a kernel panic in FreeBSD like they do in Linux? Never had a kernel panic in BSD before... I ask because the lights were not flashing, but neither did the associated keys toggle the on/off status of the light.)


Quote:
Originally Posted by Dies_Irae View Post
When the kernel panics you have 15 seconds to hit a key to stop the countdown - otherwise the system reboots.
But you have stopped the countdown by trying to get into a console, so now you are in a deadlock: you are waiting your system, and your system is waiting you. The only way out is a reset.
This doesn't sound like the type of good design I've come to expect from BSD. Basically what this means is that if the system crashes in such a way that you don't know what's happened and you try to do anything to figure out what's happened, your only recourse is to do a hard reboot.


Quote:
Originally Posted by Zare View Post
Your system is broken somewhere, X has nothing to do with it. The X crash was unrecoverable because Xorg directly accesses hardware. It's a hardware problem.
I don't understand - the system would boot, I tried running startx as a normal user and it had a problem, then the system wouldn't boot. How does X have nothing to do with it???

You obviously know more about X than I probably ever will (or care to ), so I'm not going to argue the point. However, while I understand that a hard reset without having soft updates/journalling enabled is what was most likely the ultimate cause of the system dying, the auto-config of X causing a hard lockup is what precipitated the whole mess.

We'll never know if going through the configuration steps would have prevented the hard lockup X caused because the system wouldn't boot and is now wiped, but I know I would appreciate somebody posting a warning to not bypass those steps if this had happened to them, so this is exactly what I did.


Quote:
Originally Posted by Zare View Post
X autoconfiguration is a pretty normal thing; uses PCI IDs to autoload graphics driver, EDID for display configuration, etc. It's not running "without configuration", it's compiling configuration on the fly. It's what we used to do with X -configure, just done in run-time.

You'd have same configuration if you executed
# X -configure
and then copied the file to /etc/X11/xorg.conf.

People who used to autodetect settings and then ran X, can run X without generating the static configuration file. People who used 3rd party drivers or tweaks/stuff, won't. Simple as that.
Don't have any tweaks or 3rd party drivers that I'm aware of - just followed directions in the handbook. The configuration steps in the handbook worked for the onboard video, but I decided to be lazy and try to skip it with the standalone nvidia card because the handbook said it would be fine. It'll be interesting (to me at least) to see if the configuration steps in the handbook work for the standalone as they did for the onboard video the first time...

I'll post back once the system is installed and X/KDE compiled/installed with the results.
__________________
This message is made up of not less than 90% recycled electrons.
Reply With Quote
  #10  
Old October 10th, 2011, 22:14
Ruler2112's Avatar
Ruler2112 Ruler2112 is offline
Member
 
Join Date: Sep 2009
Location: Michigan, USA
Posts: 408
Thanks: 11
Thanked 23 Times in 21 Posts
Default

Quote:
Originally Posted by Ruler2112 View Post
I did notice that journalling isn't even available from the menu without going into the newfs options and manually adding -J there... Isn't journalling what saves you in the event of a power loss/unexpected reboot? (UPDATE: The system complains about a journal provider not being found on reboot with this config.)
Nevermind about this. Doing reading on the subject has made me realize that only soft updates are needed to protect the system in the event of a power loss. (From what I've found, gjournal eliminates the need to fsck after a hard reboot at the cost of writing everything twice and consuming considerable drive space.)
__________________
This message is made up of not less than 90% recycled electrons.
Reply With Quote
  #11  
Old October 11th, 2011, 10:00
Dies_Irae Dies_Irae is offline
Junior Member
 
Join Date: Sep 2011
Posts: 53
Thanks: 3
Thanked 13 Times in 11 Posts
Default

Quote:
Originally Posted by Ruler2112 View Post
This doesn't sound like the type of good design I've come to expect from BSD. Basically what this means is that if the system crashes in such a way that you don't know what's happened and you try to do anything to figure out what's happened, your only recourse is to do a hard reboot.
It's not a matter of good or bad design - when you have a kernel panic the game is over.
We are not talking of the crash of a simple app, but of the system itself.

When a kernel panics you have two choices: let the system reboot automatically after 15 seconds, or stop the countdown.

If you stop the countdown and your kernel is compiled with
Code:
options DDB
you could use the kernel debugger to investigate the crash - but in the end you have to reboot, you cannot go back from a panic.

You are in a critical situation, beyond the point of no return, but despite that the kernel will give you the opportunity to investigate the problem.

Compare this with the (in)famous BSOD of Windows

For more info, see here and ddb(4)
__________________
O quam contempta res est homo, nisi supra humana surrexerit. (Seneca)
Reply With Quote
  #12  
Old October 11th, 2011, 15:16
andyzammy andyzammy is offline
Member
 
Join Date: Jun 2011
Posts: 100
Thanks: 0
Thanked 3 Times in 3 Posts
Default

Probably too late now as you reinstalled, but I had a problem where I tried to start Xorg after an unclean exit that also left me with a blank screen. This was solved by removing the .Xauthority* files in my home directory.
Reply With Quote
  #13  
Old October 11th, 2011, 00:15
adamk adamk is online now
Senior Member
 
Join Date: Nov 2008
Posts: 1,605
Thanks: 6
Thanked 262 Times in 243 Posts
Default

Something to consider... The "open source" nv driver is actually highly obfuscated. It was written by developers at nvidia, and they even announced a while back that they were going to stop development on it. So you essentially left your computer in the hands of a nearly closed-source "nv" driver that, frankly, doesn't get the attention from nvidia as their actual closed source "nvidia" drivers, and doesn't get the peer review that actual open source drivers receive.

BTW, you may want to confirm that the BIOS really did disable the on-board GPU by checking the output of 'pciconf'.

Adam
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Free BSD killed my hd...? JPS Mobile Computing 4 August 11th, 2011 00:26
Disturbing security bug in XScreenSaver or careless system administration? jrm Installation and Maintenance of FreeBSD Ports or Packages 2 July 10th, 2011 21:28
x11 problem and stop in /usr/ports/x11/xorg jammer488 Installation and Maintenance of FreeBSD Ports or Packages 4 October 6th, 2010 02:31
USB and cam system bug? Seeker Peripheral Hardware 37 February 16th, 2010 00:10
gnome-system-monitor's bug. fender0107401 GNOME 1 March 9th, 2009 04:25


All times are GMT +1. The time now is 18:30.


Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2013, vBulletin Solutions, Inc.
The mark FreeBSD is a registered trademark of The FreeBSD Foundation and is used by The FreeBSD Project with the permission of The FreeBSD Foundation.
Web protection and acceleration provided by CloudFlare
0