I'm having a pretty repeatable problem on FreeBSD 9.1 with my servers either hanging or rebooting during a ZFS send/receive, or under heavy ZFS load.
I have a few OpenSolaris (nexenta) servers, a few NAS4Free servers (based on FreeBSD 9.1), and a few FreeBSD 9.1 servers, all running ZFS with lots of disks behind them.
The issue that I'm seeing is that when I'm sending a ZFS snapshot via SSH to, or from, one of the FreeBSD based machines, it will either hang or reboot somewhere within the transfer. I will say that I'm using current LSI IT mode SAS controllers with commercial Supermicro SAS expanders, and a mixture of SAS and SATA drives. The problem has presented itself on both systems with SAS drives and SATA drives. Of course, the /var/log/* files are empty of any info that would help. What should be my next step to try and troubleshoot the issue? How do I go about collecting crash type information? I can generally crash it in a day or two.
I'm also getting the following on one of the larger machines that crash when doing write I/O out to the disk:
Thanks!
I have a few OpenSolaris (nexenta) servers, a few NAS4Free servers (based on FreeBSD 9.1), and a few FreeBSD 9.1 servers, all running ZFS with lots of disks behind them.
The issue that I'm seeing is that when I'm sending a ZFS snapshot via SSH to, or from, one of the FreeBSD based machines, it will either hang or reboot somewhere within the transfer. I will say that I'm using current LSI IT mode SAS controllers with commercial Supermicro SAS expanders, and a mixture of SAS and SATA drives. The problem has presented itself on both systems with SAS drives and SATA drives. Of course, the /var/log/* files are empty of any info that would help. What should be my next step to try and troubleshoot the issue? How do I go about collecting crash type information? I can generally crash it in a day or two.
I'm also getting the following on one of the larger machines that crash when doing write I/O out to the disk:
Code:
Jun 6 03:59:46 hostname kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 0 20 8a ed d5 0 0 d3 0 length 108032 SMID 737 terminated ioc 804b scsi 0 state c xfer 0
Thanks!