Realtek LAN Controller on FreeBSD 8.2-STABLE, ZFS-V28

Hi

I had a serious system lock-up while deleting a large number of large files from my deduped and compressed ZFS pool on V28.

After struggling for 2 days, the only way I could recover and import the pool again was with the LAN cable disconnected. I have an Asus P8P67 (REV 3.1) motherboard which has a Realtek 8111E Gigabit LAN controller.

Is anyone aware of issues with this LAN controller? The system has been running well for the last 2 months, the only 2 times I had these issues was while deleting large files.

regards
Malan
 
How much ram do you have? Does the system completely lock up or kernel panic? Dedup & compression are both ram intensive and I have seen systems just run out of swap space and fall over. What is odd is the removal of the network cable allows the system to boot. I have this very board on a 21tb (raidz2, so total 27tb worth of disks) with 16gb of ram and never experienced any issues and I deal with very large (25gb+ of log files). Does dmesg show anything unusual relating to IRQ's?

Let me know and I'll see if I can help
 
crankyadm1n said:
How much ram do you have?
I had 8Gb, and thought that might be the reason. I installed another 8Gb two nights ago. Definitely copying to the server is much faster. Previously I got 1 second peak/valley throughput, now it is very even throughput. But this morning I found that the large test copy I did during the night failed x(

crankyadm1n said:
Does the system completely lock up or kernel panic?
No kernel panic. When I had the keyboard/monitor directly connected, I was still able to Alt/F1/F2 into the different shells, but unable to type input anymore. Unplugging the USB keyboard and replugging it gives a the standard console messages. However, no remote access is possible - e.g. apache, telnet. I can still do a ping and it responds. My theory is that the actual kernel is still running fine since I can see the periodic flush to disk still happening every few seconds.

crankyadm1n said:
Dedup & compression are both ram intensive and I have seen systems just run out of swap space and fall over. What is odd is the removal of the network cable allows the system to boot. I have this very board on a 21tb (raidz2, so total 27tb worth of disks) with 16gb of ram and never experienced any issues and I deal with very large (25gb+ of log files). Does dmesg show anything unusual relating to IRQ's?
dmesg shows nothing unusual. I have used several Asus motherboards in different systems and never had a problem. The server was running fine for 2 months before this happened. However, during the two months I did not have the intensive disk operations I have now.

crankyadm1n said:
Let me know and I'll see if I can help
Thanks for the offer. I am a little bit irritated, but can think of at least 4 next possible trouble shooting options:
  1. Check if one of the memory modules is not faulty by going back to 8Gb and using thenew memory I bought.
  2. This morning the smartctl test flagged one of the drives as a problem. I will have to test this - maybe replace the drive. However, if this is true it kind of defeats the purpose of RAIDZ :)
  3. Get PCI-E LAN card and use that in place of the on-board NIC.
  4. Recompile the kernel. I am currently on
    Code:
     FreeBSD 8.2-STABLE #0: Sat Jul  9 00:34:27 GST 2011

Any other ideas are welcome.

thanks
Malan
 
Goose997 said:
This morning the smartctl test flagged one of the drives as a problem. I will have to test this - maybe replace the drive. However, if this is true it kind of defeats the purpose of RAIDZ :)

I see during the night the disk has been taken offline and pool is now degraded. I took the disk online again and the pool is now scrubbing. I will check again this evening when I get home.
 
hi

I have done 2 things which could have solved this:
  1. Expanded the RAM from 8GB to 16GB;
  2. Replaced a faulty HDD.

Last night I deleted a large number of 2GB files again and everything is still running.

Malan
 
Back
Top