Hi guys,
I have a SUN FIRE X4540 running
The problem: zpool "data" is not responding after a while, stalled completely. System itself is still working, but any command accessing the pool never finishes:
The second pool "rpool" works just fine.
zpool iostat also shows no io activity on the pool after the first iteration:
I added zpool_status.txt where you can see it even hangs while doing a scrub.
When further examining the problem I discovered disk da73 is in state "CORRUPT" when running a gpart list. Note that in the zpool status output da73 has status "ONLINE" and has not been taken offline or listing the specific raidz pool as "DEGRADED", like usually should happen when a disk fails. Executing camcontrol inquiry da73 also never completes:
Any ideas on dealing with this problem?
I have a SUN FIRE X4540 running
Code:
uname -a
FreeBSD <hostname> 9.1-RELEASE FreeBSD 9.1-RELEASE #0 r243825: Tue Dec 4 09:23:10 UTC 2012 [email]root@farrell.cse.buffalo.edu[/email]:/usr/obj/usr/src/sys/GENERIC amd64
The problem: zpool "data" is not responding after a while, stalled completely. System itself is still working, but any command accessing the pool never finishes:
Code:
ps ax
4644 0 D+ 0:00.00 ls -GF /data
zpool iostat also shows no io activity on the pool after the first iteration:
Code:
zpool iostat 5
capacity operations bandwidth
pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
data 71.9T 8.75T 1.15K 33 79.9M 220K
rpool 7.38G 7.50G 5 0 143K 4.30K
---------- ----- ----- ----- ----- ----- -----
data 71.9T 8.75T 0 0 0 0
rpool 7.38G 7.50G 0 9 0 36.4K
---------- ----- ----- ----- ----- ----- -----
data 71.9T 8.75T 0 0 0 0
rpool 7.38G 7.50G 0 0 0 0
---------- ----- ----- ----- ----- ----- -----
data 71.9T 8.75T 0 0 0 0
rpool 7.38G 7.50G 0 0 0 0
---------- ----- ----- ----- ----- ----- -----
data 71.9T 8.75T 0 0 0 0
rpool 7.38G 7.50G 0 0 0 0
I added zpool_status.txt where you can see it even hangs while doing a scrub.
When further examining the problem I discovered disk da73 is in state "CORRUPT" when running a gpart list. Note that in the zpool status output da73 has status "ONLINE" and has not been taken offline or listing the specific raidz pool as "DEGRADED", like usually should happen when a disk fails. Executing camcontrol inquiry da73 also never completes:
Code:
ps ax
4868 1 DL+ 0:00.00 camcontrol inquiry da73
Any ideas on dealing with this problem?