Computer not responding under heavy HDD usage

Hello,

I've tried to solve this problem as much as I could. Now I'm lost. The problem is, my computer stops responding when I use HDDs furiously.

Examples:

If I run dd, or any kind of copying with massive data, everything just get stuck. SSH connections don't respond, services don't respond; for example snmpd doesn't respond any more to queries. But, it seems that when I run ping <computer address>, it does respond, so something is alive there.

All programs though get stuck. For example the irssi IRC-client gets stuck while file copying, and loses connections. Since now, I've found this problem with dd, cp and with rrdtool handling massive .rrd-files.

And, for example while copying a big file to somewhere (it seems that the destination of file/files is not the deal, because this happens on every disk), the computer tries to breathe during copy. Sometimes I can write a word or two onto the command-line, but then again it gets stuck for like, thirty seconds or five minutes. And again, I can type a few letters and bang.

I installed this computer a few months ago, but just noticed this problem a few days ago - so I'm not sure this problem has always been there.

dmesg doesn't say anything, there are nothing related to this in the logs, and, smartctl says everything is OK with all disks. Temperatures are fine everywhere. I've managed to check out loads and CPU-usage while the computer gets stuck, and there's nothing special there. CPU-usage is usually like, nothing, 0-10%. Loads never go beyond 1.00.

Now, what's the next step to investigate this problem? What should I do? The computer is a typical Xeon workstation with three SATA-disks on a mirror for the main system, and two SATA-disks for a ZFS pool (in fact, two different size disks).

My system:

uname -a
Code:
FreeBSD mannerheim 10.0-RELEASE FreeBSD 10.0-RELEASE #0 r260789: Thu Jan 16 22:34:59 UTC 2014     root@snap.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64

gmirror status
Code:
       Name    Status  Components
mirror/boot  COMPLETE  ada0p1 (ACTIVE)
                       ada2p1 (ACTIVE)
                       ada3p1 (ACTIVE)
mirror/swap  COMPLETE  ada0p2 (ACTIVE)
                       ada2p2 (ACTIVE)
                       ada3p2 (ACTIVE)
mirror/root  COMPLETE  ada0p3 (ACTIVE)
                       ada2p3 (ACTIVE)
                       ada3p3 (ACTIVE)

zpool status pakka
Code:
  pool: pakka
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        pakka       ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            ada4    ONLINE       0     0     0
            ada5    ONLINE       0     0     0

errors: No known data errors

camcontrol devlist
Code:
<SAMSUNG SP0812C SU100-27>         at scbus0 target 0 lun 0 (ada0,pass0)
<WDC WD5000AADS-00S9B0 01.00A01>   at scbus1 target 0 lun 0 (ada1,pass1)
<SAMSUNG HD080HJ ZH100-41>         at scbus2 target 0 lun 0 (ada2,pass2)
<ST3808110AS 3.AAE>                at scbus3 target 0 lun 0 (ada3,pass3)
<WDC WD15EARS-19MVWB0 51.0AB51>    at scbus4 target 0 lun 0 (ada4,pass4)
<WDC WD20EZRX-00D8PB0 80.00A80>    at scbus5 target 0 lun 0 (ada5,pass5)

sysctl hw
Code:
hw.machine: amd64
hw.model: Intel(R) Xeon(R) CPU           W3505  @ 2.53GHz
hw.ncpu: 2
hw.byteorder: 1234
hw.physmem: 12843880448
hw.usermem: 6546685952
hw.pagesize: 4096
hw.floatingpoint: 1
hw.machine_arch: amd64
hw.realmem: 12886999040

Thanks for your advice.

PS. I tried to use tags for the first time..please let me know if I used those badly.
 
Having access to iLO / DRAC ... to dig if something is there (controller battery as example) ?

And - if possible - maybe enabling remote logging.
 
Back
Top