Drive disappeared when copying.

Hello,

I'm trying to do a copy from /data to /bkp.
The command used is a simple cp -Rp * /bkp/ ran from /data

Code:
/dev/ad8s1d on /data (ufs, local, soft-updates)
/dev/ad10s1d on /bkp (ufs, local, soft-updates)

Before providing logs I get, I want to tell the data given above is right even though this message appears in logs:
"Device ad8s1d went missing before all of the data could be written to it; expect data loss."
... from which I actually READ.


I start by getting those:

Code:
...
g_vfs_done():ad8s1d[READ(offset=75790696448, length=2048)]error = 6
vnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 1513 (cp)
g_vfs_done():ad8s1d[READ(offset=75790698496, length=1024)]error = 6
vnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 1513 (cp)
g_vfs_done():ad8s1d[READ(offset=75790700544, length=3072)]error = 6
vnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 1513 (cp)
g_vfs_done():ad8s1d[READ(offset=75790704640, length=1024)]error = 6
vnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 1513 (cp)
g_vfs_done():ad8s1d[READ(offset=75790706688, length=512)]error = 6
vnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 1513 (cp)
g_vfs_done():ad8s1d[READ(offset=75790708736, length=3072)]error = 6
vnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 1513 (cp)
g_vfs_done():ad8s1d[READ(offset=75791710208, length=2048)]error = 6
g_vfs_done():ad8s1d[READ(offset=75791736832, length=2048)]error = 6
g_vfs_done():ad8s1d[READ(offset=75797850112, length=2048)]error = 6
g_vfs_done():ad8s1d[READ(offset=75797688320, length=1024)]error = 6
vnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 1513 (cp)
g_vfs_done():ad8s1d[READ(offset=75797690368, length=1024)]error = 6
vnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 1513 (cp)
g_vfs_done():ad8s1d[READ(offset=75967510528, length=2048)]error = 6
g_vfs_done():ad8s1d[READ(offset=75798018048, length=2048)]error = 6
g_vfs_done():ad8s1d[READ(offset=75920769024, length=2048)]error = 6
g_vfs_done():ad8s1d[READ(offset=75920783360, length=2048)]error = 6
g_vfs_done():ad8s1d[READ(offset=75798013952, length=2048)]error = 6
g_vfs_done():ad8s1d[READ(offset=75969163264, length=2048)]error = 6
g_vfs_done():ad8s1d[READ(offset=75968409600, length=2048)]error = 6
g_vfs_done():ad8s1d[READ(offset=75968411648, length=2048)]error = 6
g_vfs_done():ad8s1d[READ(offset=75968788480, length=2048)]error = 6
g_vfs_done():ad8s1d[READ(offset=75961075712, length=2048)]error = 6
g_vfs_done():ad8s1d[READ(offset=75961573376, length=2048)]error = 6
g_vfs_done():ad8s1d[READ(offset=75968851968, length=2048)]error = 6
g_vfs_done():ad8s1d[READ(offset=75968882688, length=2048)]error = 6
g_vfs_done():ad8s1d[READ(offset=75968909312, length=2048)]error = 6
g_vfs_done():ad8s1d[READ(offset=75968759808, length=1536)]error = 6
vnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 1513 (cp)
...
Followed by those

Code:
...
vm_fault: pager read error, pid 1513 (cp)
vnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 1513 (cp)
vnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 1513 (cp)
vnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 1513 (cp)
vnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 1513 (cp)
vnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 1513 (cp)
vnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 1513 (cp)
vnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 1513 (cp)
vnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 1513 (cp)
vnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 1513 (cp)
Device ad8s1d went missing before all of the data could be written to it; expect data loss.
vnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 1513 (cp)
vnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 1513 (cp)
vnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 1513 (cp)
vnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 1513 (cp)
vnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 1513 (cp)
vnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 1513 (cp)
vnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 1513 (cp)
vnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 1513 (cp)
vnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 1513 (cp)
vnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 1513 (cp)
vnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 1513 (cp)
After that, I can't query the drive.
I had many tries where the drive even disappeared from /dev.

Both those disks are SATA.
I ran long tests using smartmontools which didn't report a particular issue with any of those drives.
I tried with an other destination drive (IDE) and I could copy my files (The source disk disconnects, not the dest!).
I tried on an other system (different motherboard, different SATA chipset) and got the same issue.

I run 8.2-RELEASE FreeBSD 8.2-RELEASE on a generic kernel.


Any help please?

Thank you.
 
Maybe not relevant:
1... Is there an onboard graphics controller you can swap out to a pci-e card?
Howsoever:
2... for a more permanent fix, search the forum for "bwlimit" and you should come across a long rsync CLI that you can put in a .sh, it runs way slower than the cp but I've never had it fail, it runs about 1/8the the speed and does not overload any SATA controller that I know of. (7th post in a Nov 2009 thread, to be exact.)
 
aragon said:
Any other errors?

Looks like a hardware issue. Maybe faulty SATA cables?

I tried with other cables, same issue.
It works fine at the beginning. I think more of a a bug, I think I seen a talk heading this way.

This froze my computer a few times, I had kernel panics (Didn't reproduce today, seems to happen when cp with verbose) on a simple copy... Sounds like elligible to a PR no?.
 
pierreact said:
I tried with other cables, same issue.
It works fine at the beginning...

Perhaps it is a temperature issue. Some years ago, I experienced occasional disk failures on another system, not FreeBSD. I figured that it would be better to transfer everything to another new disk, and it happened that at the beginning the data transfer went fine but after 30 min, the source disk went south, but it recovered some time later.

It was a very hot day in summertime, and I took the source disk out of the housing and put it in front of a portable room ventilator. With that measure at least the data transfer went through, only a few files were not readable, however the disk showed already certain problems.

Perhaps with smartmontools you can verify the temperature of your source disk does not rise above all limits during lengthy operations.

pierreact said:
I tried with an other destination drive (IDE) and I could copy my files (The source disk disconnects, not the dest!).

If you copy data to another much slower disk, the temperature may not rise that high.

Best regards

Rolf
 
jb_fvwm2 said:
Maybe not relevant:
1... Is there an onboard graphics controller you can swap out to a pci-e card?
Howsoever:
2... for a more permanent fix, search the forum for "bwlimit" and you should come across a long rsync CLI that you can put in a .sh, it runs way slower than the cp but I've never had it fail, it runs about 1/8the the speed and does not overload any SATA controller that I know of. (7th post in a Nov 2009 thread, to be exact.)

Thank you but lowering the IO rate, even though it's a good idea for the copy itself might not be a long term solution.

Fact is I copy those files because I want to recreate my software RAID1 (geom) and that the source disk is a few bytes bigger. So I copy over and will then recreate the geom RAID.

If on a simple cp I get this behavior, what will happen when synchronizing the disks?

Shall I submit a PR for that or do anyone have something else to propose?

Thank you.
 
rolfheinrich said:
Perhaps it is a temperature issue. Some years ago, I experienced occasional disk failures on another system, not FreeBSD. I figured that it would be better to transfer everything to another new disk, and it happened that at the beginning the data transfer went fine but after 30 min, the source disk went south, but it recovered some time later.

It was a very hot day in summertime, and I took the source disk out of the housing and put it in front of a portable room ventilator. With that measure at least the data transfer went through, only a few files were not readable, however the disk showed already certain problems.

Perhaps with smartmontools you can verify the temperature of your source disk does not rise above all limits during lengthy operations.



If you copy data to another much slower disk, the temperature may not rise that high.

Best regards

Rolf

Thanks.
Smartmontools didn't report temperatures over the normal. (34 celcius degrees).
 
von_Gaden said:
Hmmm... This sounds me so similar to my problem described here:
http://forums.freebsd.org/showthread.php?t=25812
What is your MB, HDD, SATA controller?

ad8:

Model Family: Hitachi Deskstar 7K1000.B
Device Model: Hitachi HDT721032SLA360
Serial Number: STF204ML0UKLVP
Firmware Version: ST2OA31B
User Capacity: 320,072,933,376 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4


ad10:

Model Family: Hitachi Deskstar 7K1000.B
Device Model: Hitachi HDT721032SLA360
Serial Number: STF204ML0UMZ3P
Firmware Version: ST2OA31B
User Capacity: 320,071,851,520 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4



Tentative 1:

Motherboard: Gigabbyte GA-VM900M
Chipset: Onboard Via VT 8237A


Tentative 2:

Motherboard: Gigabbyte GA-VM900M
Chipset: PCI SATA card Silicon Image SIL3512ECTU128

Tentative 3:

Motherboard: Chaintech 9VIL3
Chipset: PCI SATA card Silicon Image SIL3512ECTU128



Same symptoms each try.
 
pierreact said:
Reproduced the kernel panic, sounds to be provocated by the dest drive, maybe not having time to write data?
Possibly.
In general (based on personal experience), too many hardware errors while reading or writing from a ufs filesystem will end in a panic.

pierreact said:
If reading from source faster than writing to target thus doing a buffer overflow and killing the whole system?

No, not very likely - the system will not write faster than it can.

It seems that something is wrong with one or more of your drives, cables, or disk controller.
 
WRITE_DMA / READ_DMA errors are generally indicative of bad hardware (cable, port, drive, controller, etc). Putting a different filesystem on top of bad hardware won't make it work better. :)
 
Agreed, but a different driver might give additional information helping to spot the actual faulty hardware.
As you can see earlier in post, I tried different hardware.
 
Back
Top