ctld+zvol+ Windows iSCSI initiator+NTFS = data loss issue?

Hello,

I have this problem and I am not sure if I should open a bug for this. I couldn't find the same issue on the web and I am not sure where the problem comes from, so I am writing here with hope for directions.

Architecture:

ZFS - zvol used for an iSCSi initiator with ctld. Configuration:
Code:
portal-group pg0 {
    discovery-auth-group no-authentication
    listen 10.0.80.1
    listen [::]
}
target iqn.hostname.domain.lan:zfs-mirror {
        auth-group no-authentication
        portal-group pg0
        lun 0 {
                path /dev/zvol/disk_d/iscsi_target
        }
}
It is initiated by the Windows 7 iSCSI initiator. On top of it, there is an fdisk partition table and NTFS filesystem. The iSCSI initiator is bound to a Mellanox 10 GB interface. There was an issue with the Mellanox driver from FreeBSD10.0 described here: https://forums.freebsd.org/viewtopic.php?f=32&t=47685 and I solved it using the Mellanox vendor-provided driver as described in that topic.

Issue description:
iSCSI runs just fine, but at one moment I had to do chkdsk on the NTFS filesystem under Windows (fsck) and there the problem came. Until that moment I didn't note any data corruption, there was about 600 GB of data on that NTFS partition. Working more than three mounts I didn't notice any problems; there maybe was some silent data corruption, but I didn't note any. The filesystem check on Windows should take less than five minutes, but didn't finish in two hours. Then syslog started flooding with messages:
Code:
Sep 28 16:26:31 sentinel kernel: (0:2:0:0): SCSI Status: Check Condition
Sep 28 16:26:31 sentinel kernel: (0:2:0:0): SCSI sense: ILLEGAL REQUEST asc:20,0 (Invalid command operation code)
Sep 28 16:26:31 sentinel kernel: (0:2:0:0): Command byte 0 is invalid
Sep 28 16:26:32 sentinel kernel: (0:2:0:0): VERIFY(16). CDB: 8f 00 00 00 00 00 00 04 cd 88 00 00 00 08 00 00
Sep 28 16:26:32 sentinel kernel: (0:2:0:0): Tag: 0x9ab0000, Type: 1
Sep 28 16:26:32 sentinel kernel: (0:2:0:0): CTL Status: SCSI Error
Sep 28 16:26:32 sentinel kernel: (0:2:0:0): SCSI Status: Check Condition
Sep 28 16:26:32 sentinel kernel: (0:2:0:0): SCSI sense: ILLEGAL REQUEST asc:20,0 (Invalid command operation code)
Sep 28 16:26:32 sentinel kernel: (0:2:0:0): Command byte 0 is invalid
Sep 28 16:26:33 sentinel kernel: (0:2:0:0): VERIFY(16). CDB: 8f 00 00 00 00 00 00 05 3b 28 00 00 00 08 00 00
Sep 28 16:26:33 sentinel kernel: (0:2:0:0): Tag: 0xbdb80000, Type: 1
Sep 28 16:26:33 sentinel kernel: (0:2:0:0): CTL Status: SCSI Error
Sep 28 16:26:33 sentinel kernel: (0:2:0:0): SCSI Status: Check Condition
Sep 28 16:26:33 sentinel kernel: (0:2:0:0): SCSI sense: ILLEGAL REQUEST asc:20,0 (Invalid command operation code)
Sep 28 16:26:33 sentinel kernel: (0:2:0:0): Command byte 0 is invalid
After I waited some time I stopped ctld and I found that the partition table of the LUN is corrupted and Windows sees the disk as RAW. After recreating the partition table on FreeBSD, Windows still didn't see the NTFS filesystem. It was pure luck that I was able to mount it with ntfs-3g on my FreeBSD system and extract the data. It seems that the Windows check disk didn't only damage the filesystem, but it also did damage to the partition table, this is strange :D

This may be a Windows issue which should not take place here, but I am not sure if it really is, so I have to explain both sides - FreeBSD and Windows in order to do a full issue description.

After I recovered my data I created a pure clean new LUN (zvol). I presented and mounted it and transferred about 200 GB of data to it with no issues. I tried check disk from Windows again and I hit the same issue. This time I got a ZFS snapshot which made things easy to recover ;)

What makes me think the problem is in FreeBSD? Well, after rolling back the ZFS snapshot, I presented the same LUN via istgt and ran a filesystem check from Windows and it executed perfectly fine with no issues at all. Data transfers also run fine. It seems to me that the problem may be in ctld, but I can't tell for sure.

Please advise - could it be a configuration issue, or it is a bug?

Thank you.
 
Last edited by a moderator:
What exact FreeBSD version are you using? CTL is under active development now. The messages you have shown tell that the VERIFY SCSI command is not supported. But I implemented it about a couple of months ago together with many other features. Please update your system and try again before submitting bug reports. After updating, feedback is welcome. :)
 
Last edited by a moderator:
Hello,

I am using FreeBSD 10.0-RELEASE amd64 (x86-64, x64). I will update the system the next weekend and I will reply if the issue is solved.

Thank you.
 
Last edited by a moderator:
Back
Top