lost device freezes my system

Once our twice by month, I have the same issue on my server, it freezes after this message:

Code:
ugen4.2: <vendor 0x05e3> at usbus4 (disconnected)
umass0: at uhub4, port 2, addr 2 (disconnected)
(da0:umass-sim0:0:0:0): lost device - 0 outstanding, 1 refs

Nothing wrong on the last top screenshot, except this message.

"Freeze" is perhaps not the right word, as I can still change to another console by CTRL-ALT-F2, but not login (nothing is written by the keyboard). I cannot login by ssh, the web server doesn't work anymore but ping is answering.

My server is a little exotic: a eeePC 900 with FreeBSD 8.3, with root on ufs. Two zfs mirrored jails 250 Go partitions on two external drives plugged on usb ports. Power is given by usb ports to these two drives, but the eeePC has got his battery working, and is on a UPS. I followed handbook's advices about how to setup ZFS on a 32 bits FreeBSD.

Has anyone any idea about how to fix it?
 
Looks like one of your USB drives lost power, you sure it can handle multiple drives on USB power? It's by itself a low power machine anyway.
 
Could be the problem indeed. I bought a self-powered usb hub. We'll see whether it is better.
 
Unfortunalely, the self-powered usb hub didn't solve this issue. The problem is still here, with same frequency.
Does anyone have got another idea?
 
Some process slowly building the amount of memory it uses? I once fixed (temporarily) a hosed system (too many details to remember) with the freecolor[1] utility til I could pare down the debugging stuff in the kernel...
[1] which sometimes frees up memory...
Can you install, say, atop and cron it every hour, saving the results, and thence check what programs are using memory before the freeze after the reboot?
 
I have a permanent top running on a remote console, but I have never observed anything particularly significative when it freezes.

I just installed a atop; we'll see if it gives more useful information...
 
Little off-topic problem, but I ask here. I cannot make atop work. I compiled it from ports without problem, but when I launch it - manually or as a daemon, it makes a:
Code:
Bad system call(core dumped)
#dmesg
pid 8885 (atop), uid 0: exited on signal 12 (core dumped)
Does anybody know whether it needs some particular features in the kernel, as I compiled mine?
 
An example of what stays displayed on a remote console after freezing:
Code:
last pid: 46034;  load averages: 52.81, 52.63, 50.85    up 8+15:13:35  05:33:00
729 processes: 50 running, 676 sleeping, 3 zombie
CPU:  1.7% user,  0.0% nice, 98.3% system,  0.0% interrupt,  0.0% idle
Mem: 380M Active, 219M Inact, 384M Wired, 110M Buf, 4232K Free
Swap: 2048M Total, 108K Used, 2048M Free

  PID USERNAME  THR PRI NICE   SIZE    RES STATE    TIME   WCPU COMMAND
68270 jd          1  44    0  4720K  3036K RUN     12:12  1.17% top
48274 jd          1  44    0  5744K  3584K RUN     26:50  0.98% top
45410 smmsp       1  44    0  6100K  3144K RUN      1:10  0.78% sendmail
15541 root        1  44    0  3360K  1196K RUN      0:32  0.29% syslogd
42441 root        1  44    0  3360K  1164K RUN      0:30  0.29% syslogd
44126 root        1  44    0  3360K  1164K RUN      0:29  0.29% syslogd
42041 root        1  44    0  3360K  1164K RUN      0:40  0.20% syslogd
  909 root        1  44    0  3360K  1196K RUN      0:29  0.20% syslogd
42849 root        1  44    0  3360K  1164K RUN      0:52  0.10% syslogd
55771 root        1  44    0  3360K  1192K RUN      0:31  0.10% syslogd
43302 root        1  44    0  3360K  1164K RUN      0:31  0.10% syslogd
44533 root        1  44    0  3360K  1164K RUN      0:30  0.10% syslogd
 6449 clamav      1  76    0 15800K  4532K RUN     15:07  0.00% freshclam
42895    242      1  45    0 13828K  9028K RUN      3:00  0.00% lua
 1804 root        1  44    0  4912K  1764K select   1:00  0.00% ntpd
56136     88     22  44    0   464M 76004K ucond    0:35  0.00% mysqld
68266 jd          1  44    0  9532K  3412K select   0:31  0.00% sshd

Syslogd in several jails seems to try to write on disk but cannot. Is it filling the memory? In what order are things happening?
Does anybody have an idea about what is wrong?
 
Back
Top