Solved [Solved] help! system freezes

so my "drone" server has been running fine for 20+ days now...then it froze today. I rebooted and it still froze a few times...

when I tried "apachectl restart" it said "segmentation fault (coredump)".

here is one screen shot: (sorry about the poor quality of my cell phone)

Photo_050710_002.jpg


I tried to boot the "cell" into drone but the named wont work...it worked fine as a slave for named, but it wont resolve anything when rebooted as drone...(ok, possibly I forgot to change resolv.conf to itself, again!).

anyway, I took one disk from drone and booted in cell, and it also paniced once and I had to reboot. I now said "gmirror forget gm0" to see if it quiet down.

back to the drone machine, did mem test, again errors (but it passed last time! I had to exchange one stick of ddr3).

so it is possible that 1). drone memory error caused segmentation error, and somehow 2). it messed up something on HD? so now 3). drone HD rebooting in another machine also panics?

the error messages says "automatic reboot in 15 secs" but it just hang there...doing nothing (no pinging response)...
 
the drone hd in another machine paniced again, message:
Code:
dev=mirror/gm0slf, block=1, fs=/usr
panic: ffs_blkfree, freezing free block
cpuid=2

I did a "fsck" and it had tons of errors...

I tried to "reboot now" and it has been generating pages of numbers (13, 14, 5, 8 etc) for over 5 min now....usually 3-5 pages (when freebsd did not like the acpi) then it should start over...but it is not...
 
it appears a memory problem, + a HD problem...how can this happen at the same time?
I took one stick out from the box, at least now the machine is not panicking any more...one table in mysql was messed up, but it was not important one. I dropped and backed up and transferred to another one successfully.

it looks like not even enough with 2 servers and 2 mirrors...

does gmirror copy an HD error to the "slave"? it appears to be so.
 
so drone HD crashes in 2 different machines
and another HD inside drone hardware also crashes (do not know the message since I left at 8 pm and it was running but then it died around 9 pm).

so it appears hardware issues for the drone machine, which somehow affected HD which causes panic.
 
one stick of ddr3 became bad again...somehow the bad ram caused many file errors (up to a hundred when I did fsck in single user mode), in both the working HD and the mirror.

this explains why the HD in another machine will keep panicing.

and before I removed the bad ram, it will also cause crashes with a good HD from another computer...

I did not trust the fsck-ed disk, so I am now making mirrors from the good one and sticking to the main server with one bad stick of DDR3 removed.
 
Back
Top