Solved 3ware: only 16 of the 21 LUN's are detected / tws driver

jcatrysse

New Member

Reaction score: 1
Messages: 15

FreeBSD 10.3-RELEASE-p7 (GENERIC)

I have a 3Ware 9750-8i controller on a Supermicro Systemboard. I have one RAID6 array from about 40TB spanned into 21 smaller volumes but only 16 of them are detected by the system.

The system does dot recognize more that 16 lun's on the same BUS and the same TARGET. It seems to me that LUN numbering is not correct.
I think LUN's should be numbered 0 to 15 (and upwards) and not 0 to F.

CAMCONTROL DEVLIST
Code:
<LSI 9750-8i    DISK 5.12>         at scbus0 target 0 lun 0 (pass0,da0)
<LSI 9750-8i    DISK 5.12>         at scbus0 target 0 lun 1 (pass1,da1)
<LSI 9750-8i    DISK 5.12>         at scbus0 target 0 lun 2 (pass2,da2)
<LSI 9750-8i    DISK 5.12>         at scbus0 target 0 lun 3 (pass3,da3)
<LSI 9750-8i    DISK 5.12>         at scbus0 target 0 lun 4 (pass4,da4)
<LSI 9750-8i    DISK 5.12>         at scbus0 target 0 lun 5 (pass5,da5)
<LSI 9750-8i    DISK 5.12>         at scbus0 target 0 lun 6 (pass6,da6)
<LSI 9750-8i    DISK 5.12>         at scbus0 target 0 lun 7 (pass7,da7)
<LSI 9750-8i    DISK 5.12>         at scbus0 target 0 lun 8 (pass8,da8)
<LSI 9750-8i    DISK 5.12>         at scbus0 target 0 lun 9 (pass9,da9)
<LSI 9750-8i    DISK 5.12>         at scbus0 target 0 lun a (pass10,da10)
<LSI 9750-8i    DISK 5.12>         at scbus0 target 0 lun b (pass11,da11)
<LSI 9750-8i    DISK 5.12>         at scbus0 target 0 lun c (pass12,da12)
<LSI 9750-8i    DISK 5.12>         at scbus0 target 0 lun d (pass13,da13)
<LSI 9750-8i    DISK 5.12>         at scbus0 target 0 lun e (pass14,da14)
<LSI 9750-8i    DISK 5.12>         at scbus0 target 0 lun f (pass15,da15)

DMESG
Code:
(probe0:tws0:0:0:10): INQUIRY. CDB: 12 00 00 00 24 00
(probe0:tws0:0:0:10): CAM status: Invalid Lun
(probe0:tws0:0:0:10): Error 22, Unretryable error
(probe0:tws0:0:0:11): INQUIRY. CDB: 12 00 00 00 24 00
(probe0:tws0:0:0:11): CAM status: Invalid Lun
(probe0:tws0:0:0:11): Error 22, Unretryable error
(probe0:tws0:0:0:12): INQUIRY. CDB: 12 00 00 00 24 00
(probe0:tws0:0:0:12): CAM status: Invalid Lun
(probe0:tws0:0:0:12): Error 22, Unretryable error
(probe0:tws0:0:0:13): INQUIRY. CDB: 12 00 00 00 24 00
(probe0:tws0:0:0:13): CAM status: Invalid Lun
(probe0:tws0:0:0:13): Error 22, Unretryable error
(probe0:tws0:0:0:14): INQUIRY. CDB: 12 00 00 00 24 00
(probe0:tws0:0:0:14): CAM status: Invalid Lun
(probe0:tws0:0:0:14): Error 22, Unretryable error

How can I fix this issue? I had contact with 3Ware support and they claim it is an OS or systemboard issue... I am clueless.

Thank you,
Jan
 
  • Thanks
Reactions: Oko

Oko

Daemon

Reaction score: 796
Messages: 1,620

What happens if you download Springdale DVD (Red Hat clone from Princeton University which I use) and boot in the live/rescue mode

http://springdale.princeton.edu/data/springdale/7.2/x86_64/iso/Springdale Linux-7.2-x86_64-DVD.iso

and try to see HDDs. Unfortunately it is very likely that DVD doesn't come with megacli or storcli installed. You will probably need to build your own DVD rescue image with one of these tools installed. IIRC megacli is now forked open source. If Red Hat can't see all the drives then you have a real problem.
 
OP
J

jcatrysse

New Member

Reaction score: 1
Messages: 15

I will give it a try tomorrow. At 3ware they told me this could be an INT13 BIOS issue… but I don't see how…
 
OP
J

jcatrysse

New Member

Reaction score: 1
Messages: 15

I'm still thinking about the bizar numbering of the LUN's. Is it a coinsidence that it stops after 0xF? Maybe not enough bytes reserved somewhere?

In the mean time I had contact with SuperMicro support (Systemboard) and they assure me this is not a BIOS or Systemboard issue.

MegaCli did not work because it is a "3Ware" type adapter, I have http://www.freshports.org/sysutils/tw_cli where everything seems to work fine.

Does this makes sense to someone... I cannot find any straithforward issues. (see next post, message to long)

//nasbezoom03/c0/u0> show all
Code:
/c0/u0 status = OK
/c0/u0 is not rebuilding, its current state is OK
/c0/u0 is not verifying, its current state is OK
/c0/u0 is initialized.
/c0/u0 Write Cache = on
/c0/u0 Read Cache = Intelligent
/c0/u0 volume(s) = 21
/c0/u0 name = nasbezoom03
/c0/u0 serial number = F07682438D01A80096BB
/c0/u0 Ignore ECC policy = on
/c0/u0 Auto Verify Policy = on
/c0/u0 Storsave Policy = balance
/c0/u0 Command Queuing Policy = on
/c0/u0 Rapid RAID Recovery setting = disable
/c0/u0 Parity Number = 2

Unit     UnitType  Status         %RCmpl  %V/I/M  VPort Stripe  Size(GB)
------------------------------------------------------------------------
u0       RAID-6    OK             -       -       -     256K    40978.1
u0-0     DISK      OK             -       -       p8    -       3725.28
u0-1     DISK      OK             -       -       p9    -       3725.28
u0-2     DISK      OK             -       -       p10   -       3725.28
u0-3     DISK      OK             -       -       p11   -       3725.28
u0-4     DISK      OK             -       -       p15   -       3725.28
u0-5     DISK      OK             -       -       p13   -       3725.28
u0-6     DISK      OK             -       -       p14   -       3725.28
u0-7     DISK      OK             -       -       p17   -       3725.28
u0-8     DISK      OK             -       -       p18   -       3725.28
u0-9     DISK      OK             -       -       p19   -       3725.28
u0-10    DISK      OK             -       -       p21   -       3725.28
u0-11    DISK      OK             -       -       p22   -       3725.28
u0-12    DISK      OK             -       -       p20   -       3725.28
u0/v0    Volume    -              -       -       -     -       250
u0/v1    Volume    -              -       -       -     -       2048
u0/v2    Volume    -              -       -       -     -       2048
u0/v3    Volume    -              -       -       -     -       2048
u0/v4    Volume    -              -       -       -     -       2048
u0/v5    Volume    -              -       -       -     -       2048
u0/v6    Volume    -              -       -       -     -       2048
u0/v7    Volume    -              -       -       -     -       2048
u0/v8    Volume    -              -       -       -     -       2048
u0/v9    Volume    -              -       -       -     -       2048
u0/v10   Volume    -              -       -       -     -       2048
u0/v11   Volume    -              -       -       -     -       2048
u0/v12   Volume    -              -       -       -     -       2048
u0/v13   Volume    -              -       -       -     -       2048
u0/v14   Volume    -              -       -       -     -       2048
u0/v15   Volume    -              -       -       -     -       2048
u0/v16   Volume    -              -       -       -     -       2048
u0/v17   Volume    -              -       -       -     -       2048
u0/v18   Volume    -              -       -       -     -       2048
u0/v19   Volume    -              -       -       -     -       2048
u0/v20   Volume    -              -       -       -     -       1816.08

Regards,
Jan
 

SirDice

Administrator
Staff member
Administrator
Moderator

Reaction score: 12,346
Messages: 38,863

I'm still thinking about the bizar numbering of the LUN's. Is it a coinsidence that it stops after 0xh?
You may be right if it shows 0xh, that's not a valid hexadecimal number. Hexadecimal digits are 0 to F (decimal 0 to 15).
 
OP
J

jcatrysse

New Member

Reaction score: 1
Messages: 15

I tried that Springdale DVD and this is the error I get:
Code:
scsi 1:0:0:0: lun17 has a LUN larger than allowed by the host adapter
scsi 1:0:0:0: lun18 has a LUN larger than allowed by the host adapter
scsi 1:0:0:0: lun19 has a LUN larger than allowed by the host adapter
scsi 1:0:0:0: lun20 has a LUN larger than allowed by the host adapter

Any ideas on that one?
 

tingo

Son of Beastie

Reaction score: 638
Messages: 2,544

Hmm, is maximum LUN number (or something like that) a parameter you can set or change via tw_cli or another tool?
 
OP
J

jcatrysse

New Member

Reaction score: 1
Messages: 15

Well, I checked the 3ware bios, the web interface, tw_cli and the driver boot options… none of that I could find.

I reopened a ticket at 3ware… hoping to bump in another technical support agent.
 

Terry_Kennedy

Aspiring Daemon

Reaction score: 340
Messages: 970

Well, I checked the 3ware bios, the web interface, tw_cli and the driver boot options… none of that I could find.

I reopened a ticket at 3ware… hoping to bump in another technical support agent.
You have a 3Ware 9750 8-port internal controller for SAS / SATA drives. Your output shows 13 physical drives, so you have a SAS / SATA expander somewhere in that system. If you have SATA drives, that can cause you serious headaches down the road (search for "SATA on SAS expander problems").

Is this a volume with data on it that you're trying to import from some other operating system? If not, you might want to try some native FreeBSD solutions instead of creating a single large volume and slicing it up into 2TB chunks on the controller.

First, it looks like you controller has the "auto-carve 2TB" option set. That is for older operating systems that don't support volumes larger than 2TB. There's no reason you couldn't export the whole 40TB u0 to FreeBSD in a single chunk. Note that with "classic" filesystems like UFS, a fsck(8) is likely to be painfully slow.

You might want to consider exporting the individual drives to FreeBSD and then building a ZFS pool out of them. ZFS lets you specify mount points, quotas, etc. so you could replicate your "lots of 2TB volumes" if that's what you really want to do.

Please bear in mind my caution about SATA drives on SAS expanders, though - no matter what system you chose, this can cause problems when you need them least. [Executive summary: When a SATA drive behind a SAS expander experiences a problem, the expander usually goes "resets for everybody!" which causes multiple other drives to drop out of the RAID set / pool. Not Good.]
 
OP
J

jcatrysse

New Member

Reaction score: 1
Messages: 15

Hi Terry,

Thank you for your point-of-view. We created it that way because we have some other older servers and older legacy software that depends on it. But we should indeed move on, your info is very interesting.

We all have SAS drives, on this server anyway ;-)

At this time I am tying to locate enough drives to move all my data and to rebuild on another
, more modern, type of volume.

But no other solution at this time and no answer from 3ware ... yet... bummer.

Regards,
Jan
 

mav@

Aspiring Daemon
Developer

Reaction score: 220
Messages: 704

In tws(4) driver sources I see:
if (ccb_h->target_lun >= TWS_MAX_NUM_LUNS) {
...
ccb_h->status |= CAM_LUN_INVALID;
}
, where TWS_MAX_NUM_LUNS is 16. So it is clear why it does not work. The only question is whether it should really work, but it should be addressed to the vendor as driver author.
 
OP
J

jcatrysse

New Member

Reaction score: 1
Messages: 15

Hi guys,

Would anyone give me a hint if it would be possible just to change the value to let's say 32 and recompile the driver? I'm not really into that, but it can't be that difficult, no?

tws.h
Code:
#define TWS_MAX_NUM_LUNS 16

Thnx
 

mav@

Aspiring Daemon
Developer

Reaction score: 220
Messages: 704

Looking on code after that higher LUNs (at least 255) seems like supposed to work, but it is difficult to say for sure without having specs.
 
OP
J

jcatrysse

New Member

Reaction score: 1
Messages: 15

I've set the value to 32 and I am recompiling. I will keep you posted on the result.
 
Last edited:
Top