iscsi Multipath target / Network multipath

Hi,

We are in the process of setting up a new iscsi based on FreeBSD, ZFS and istgt(1).

Virtual hosts are running esxi an we have a redundant storage network and using MPIO from the hosts to our current storage.

Using istgt(1) we have support for MPIO but the FreeBSD network stack has no support for multipathing in the kernel other than using setfib(8). We have support for round-robin on incoming traffic but traffic out always leaves the default interface.

Has anyone got a working solution for iscsi MPIO / Multipath on a FreeBSD target?

//JO
 
Hi onob,

Where you able to get MPIO with istgt? I setting up a similar system with 4 NICs and I see the same thing with round-robin on writes, but single interface on reads.

Where you able to get lagg(4) to work?
 
We are up n running with MPIO and lagg. It works very well.

lagg setup is in failover mode with 2 ports in each lagg interface and using 2 lagg (lagg0 and lagg1) for MPIO. Each lagg has their own subnet.

Our other SAN from "the big hardware provider" has the ability to put all ports on the same subnet and we were trying to do the same thin in FreeBSD using tools like policy routing and setfib. Turns out this is not doable when it comes to daemons like istgt. No problem with incoming traffic but everything leaving the FreeBSD box will take the route for the destination subnet.

//JO
 
Hello,

If you are running on the same network and same switch, you can use lagg.

But my experience with vmware and iSCSI shows that if you have few networks, you can use multipathing(usually Round Robin) to get better performance and higher fault tolerance.
Because if the entire switch goes down, not just a single port, then you are in trouble.
But this apply only if you have more than 1 network and more than 1 switch.

How I did this? Just make istgt to listen on all interfaces and allow initiators from all your networks. Then in the iSCSI adapter of your ESX/ESXi manually add all IPs of your SAN in all the networks and the multipathing will be setup. The initiator just checks the LUN devise name, serial number and all that stuff. If at all IPs listed there, the characteristics of the LUN is the same, it will be automatically recognized as a single devise with many paths to it.
I know that there is a dynamic discovery possibility, because when I used FreeNAS, trough the web configuration I set up the istgt and on the ESX, I just put a single IP of the SAN, then it discovered them all automatically. Then I switch to FreeBSD, configured manually the istgt, dynamic discovery didn't work and I just list manually all IP addresses of the SAN. Failover still works, aggregation still works, so I didn't worry about the dynamic discovery.
However if you are using single switch and single network, lagg may be is the better way to get higher performance, because it does it osi layer 3, and it is sort of hardware acceleration given by the switches.

Hope this was useful, if you need istgt config file, I can provide it for you. About ESX/ESXi configuration, it's all done by GUI, and you can't go wrong there.

Good luck.
 
Hello,

This is what I got, please note that this is a little bit risky configuration, because it allows everything from everywhere. The restrictions I do with my firewall, not with the build-in restrictive rules of istgt. So consider putting authentication, or restriction for the initiators. There are comments in the sample config file how to do that.
Code:
san root >cat /usr/local/etc/istgt/istgt.conf | grep -v \#
[Global]
  Comment "Global section"
  NodeBase "san.mysystem.com"
  PidFile /var/run/istgt.pid
  LogFacility "local7"
  Timeout 30
  NopInInterval 20
  DiscoveryAuthMethod Auto
  MaxSessions 64
  MaxConnections 16
  MaxR2T 256
  MaxOutstandingR2T 16
  DefaultTime2Wait 2
  DefaultTime2Retain 60
  FirstBurstLength 262144
  MaxBurstLength 1048576
  MaxRecvDataSegmentLength 262144
  ImmediateData Yes
  DataPDUInOrder Yes
  DataSequenceInOrder Yes
  ErrorRecoveryLevel 0
[UnitControl]
  Comment "Internal Logical Unit Controller"
  AuthMethod Auto
  AuthGroup AuthGroup10000
  Portal UC1 127.0.0.1:3261
  Netmask 127.0.0.1
[PortalGroup1]
  Comment "ANY IP"
  Portal DA1 0.0.0.0:3260
[InitiatorGroup1]
  Comment "Initiator Users"
  InitiatorName "ALL"
[LogicalUnit1]
  Comment "My LUN"
  TargetName Name-of-LUN
  TargetAlias "alias-name-of-lun"
  Mapping PortalGroup1 InitiatorGroup1
  AuthMethod Auto
  AuthGroup AuthGroup1
  UseDigest Auto
  UnitType Disk
  QueueDepth 255
  LUN0 Storage /dev/zvol/datacore/istgt.block.device Auto

san root >cat /usr/local/etc/istgt/auth.conf | grep -v \#
  Comment "Auth Group1"
san root >

Hope this was useful.
 
Thank you.

I will try things out.

We now use NFS to share our virtual machines.
But we have diabled sync on the dataset. ZFS, NFS and ESXi is really slow when using NFS with sync enabled.
But maybe it is more wisely to use iscsi for the ESXi datastore.

regards
Johan
 
I like the iSCSI mostly because of the multipathing. You can get fault tolerance and aggregation in the same time. For example midnight when the backups are running, I have a cronjob that brings down the interface connected to the switch that rolls the backups. After the backups are finished, the script bring the interface up and in a few minutes all HBAs are rescanned automatically and the path is restored.
Best practices says that the LUN should be directly on a block devise, not on a file inside already mounted filesystem in your SAN.
And also if you are using ESXi5.1, it supports iSCSI over jumbo frames with MTU 9000. Istgt also got no trouble using MTU 9000, so this is a real network performance hit.
 
The one thing i did like about NFS was the fact that i can see the files on my filesystem.
So it was quit easy to insert files from FreeBSD itself into the datastore.

But the fact that the sync is disabled gives me more and more the shiffers.

gr
Johan
 
Has anyone experienced very poor performance on iscsi (istgt) on FreeBSD 9.0?

Seems like I cannot get good performance even sync=disabled. This is especially true for smaller block-sizes < 16k.
I am using zvols as backing for iscsi. Is it better to use a file?
 
Sylhouette said:
Thank you.

I will try things out.

We now use NFS to share our virtual machines.
But we have diabled sync on the dataset. ZFS, NFS and ESXi is really slow when using NFS with sync enabled.
But maybe it is more wisely to use iscsi for the ESXi datastore.

regards
Johan

Disabling sync is never the answer. Period. You are facing a real danger of corrupting your data, and I sincerly hope for your sake that you have good backups for when that time comes. I advise you to shell out a couple of bucks on two Vertex 4 256GB MLC that you configure as mirrored SLOG, so that I can sleep better at night, not worrying on your behalf:)

/Sebulon
 
Hi Sebulon,
Thank you for your reply.

I know that disabling the zil cache is a bad thing, this was done only to test the iscsi connection.

My setup is 24X Intel 520 480GB SSDs and therefore no need to have a separate SSD zil cache, a ramdisk may give me more performance but I am banking on my all SSD array to have enough performance.

The setup with FreeBSD is just a test before I take it into production using Nexenta. My plan was to benchmark with FreeBSD for possible future installations. However, I see awful iscsi performance using istgt.

I share iscsi to Windows 7 over 1GB ethernet, this will be MPIO in production using 10GB.

At 4k random writes I get about 8MB/s on FreeBSD. Measured using IOMeter in Windows.
With Nexenta with same test I get 76MB/s at 100% random 4k writes.

The pool conf is striped in a raid0 (again only for testing:), production setup will be mirrored).

Any ideas on what is causing the bad performance on FreeBSD? Have you seen successful setups like this one before on FreeBSD?
 
bjwela said:
Hi Sebulon,
Thank you for your reply.

I know that disabling the zil cache is a bad thing, this was done only to test the iscsi connection.

My setup is 24X Intel 520 480GB SSDs and therefore no need to have a separate SSD zil cache, a ramdisk may give me more performance but I am banking on my all SSD array to have enough performance.

The setup with FreeBSD is just a test before I take it into production using Nexenta. My plan was to benchmark with FreeBSD for possible future installations. However, I see awful iscsi performance using istgt.

I share iscsi to Windows 7 over 1GB ethernet, this will be MPIO in production using 10GB.

At 4k random writes I get about 8MB/s on FreeBSD. Measured using IOMeter in Windows.
With Nexenta with same test I get 76MB/s at 100% random 4k writes.

The pool conf is striped in a raid0 (again only for testing:), production setup will be mirrored).

Any ideas on what is causing the bad performance on FreeBSD? Have you seen successful setups like this one before on FreeBSD?

Wow, a bit different budget than my tinker systems usually have had, haha! In that case you may want to look up STEC´s ZeusRAM SSD, it´s a real killer;)

We are using istgt very sucessfully in our organisation over 1GbE so it is definitely possible. Some things to keep in mind though is that 1) the documentation is horrible, so you´re basically just guessing your way through, and so did we, and 2) be very careful how you design your pool(partitioning/ashift) and performance benchmark locally first, then configure and performance test remotely to know if you´re on the right path at all.

When creating the zvol it is very important to specify the same block size the remote file system will be using. NTFS defaults to 4k, which you can set with:
# zfs create [b]-b 4k[/b] -o sync=always -o compress=on -s -V 1t pool/lun
And if you´re going to store larger files, you can set NTFS/zvol block size all the way up to 64k to get even better performance. istgt does not sync by default, which is the reason you get "better" throughput over iSCSI than you´d normally get over NFS, but is dangerous in regards to data corruption. I usually specify "-s" to have the zvol thin provisioned. Also I´d recommend you to configure the network interfaces for Jumbo frames to get a lower overhead of packets, that can give you better performance if your switches are stressed enough as it is.

/Sebulon
 
Aloha,
Has anyone experienced very poor performance on iscsi (istgt) on FreeBSD 9.0?

Seems like I cannot get good performance even sync=disabled. This is especially true for smaller block-sizes < 16k.
I am using zvols as backing for iscsi. Is it better to use a file?

I had this problem and it is discussed in this topic.
As a result, it seems that no mater the good hardware you got, geom raid will aways be faster than zfs. You can get very expensive hardware and get great performance with ZFS, but even then geom will be faster. Of course you will lose great features and from data integrity point of view zfs is aways the best choice.
 
Hi Sebulon,

Thanks for your reply.

I have setup all disks with gpt and gnop:

Code:
gpart create -s GPT /dev/$drive
gpart add -t freebsd-zfs -l disk$i -b 2048 -a 4k /dev/$drive
gnop create -S 4096 /dev/gpt/disk$i

Then created zpool using the .nop devices.

Code:
zpool create tank1 disk1.nop disk2.nop ....

I will try to setup the zvols with 4k block and sync=always.

Are there any obvious settings in istgt that could cause really bad performance?

istgt.conf:

Code:
[Global]
  Comment "Global section"
  NodeBase "iqn.2012-12.com.adq"
  PidFile /var/run/istgt.pid
  AuthFile /usr/local/etc/istgt/auth.conf
  MediaDirectory /var/istgt
  LogFacility "local7"
  Timeout 30
  NopInInterval 20
  DiscoveryAuthMethod Auto
  MaxSessions 16
  MaxConnections 4
  MaxR2T 32
  MaxOutstandingR2T 16
  DefaultTime2Wait 2
  DefaultTime2Retain 60
  FirstBurstLength 262144
  MaxBurstLength 1048576
  MaxRecvDataSegmentLength 262144

  # NOTE: not supported
  InitialR2T Yes
  ImmediateData Yes
  DataPDUInOrder Yes
  DataSequenceInOrder Yes
  ErrorRecoveryLevel 0

[UnitControl]
  Comment "Internal Logical Unit Controller"
  AuthMethod CHAP Mutual
  AuthGroup AuthGroup10000
  Portal UC1 127.0.0.1:3261
  Netmask 127.0.0.1

[PortalGroup1]
  Comment "igb0"
  Portal DA1 10.30.0.165:3260

[InitiatorGroup1]
  Comment "Initiator Group1"
  InitiatorName "ALL"
  Netmask 10.30.0.0/24
  
[LogicalUnit1]
  Comment "zvol1"
  TargetName zvol1
  TargetAlias "DiskTarget zvol1"
  Mapping PortalGroup1 InitiatorGroup1
  AuthMethod Auto
  AuthGroup AuthGroup1
  UseDigest Auto
  UnitType Disk
  QueueDepth 64
  LUN0 Storage /dev/zvol/tank/zvol1 Auto

Any ZFS tunables that needs to be set properly?
 
@bjwela

That looks textbook, nicely done. Tuning is mostly evil. If you have any, remove them and see if that actually gives you better performance than you had with them set. Next step would be to install benchmarks/bonnie++ to verify your performance locally first:
# bonnie++ -d /foo/bar -u 0 (if running as root)

/Sebulon
 
Hi all,

Our setup works fine when it comes to throughput but worse when it comes to latency (as in to much):

Hardware:
Xeon E3-1270 V2 @ 3.50GHz
32GB RAM
Supermicro Motherboard and chassis
2xSSD for ZIL
1xSSD for cache
2xSATA mirrored vdev

Network using 2 lagg-interfaces in failover mode, should be able to do 2x1Gbit/s using MPIO and round-robin in ESXi5.

[root@storage2 ~]# zpool status
pool: tank1
state: ONLINE
scan: scrub repaired 0 in 0h2m with 0 errors on Thu Nov 22 13:57:59 2012
config:

NAME STATE READ WRITE CKSUM
tank1 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
da2 ONLINE 0 0 0
da3 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
da4 ONLINE 0 0 0
da5 ONLINE 0 0 0
logs
mirror-2 ONLINE 0 0 0
ada1 ONLINE 0 0 0
ada2 ONLINE 0 0 0
cache
ada0 ONLINE 0 0 0

errors: No known data errors

When we put some load on the target our monitoring software (Veeam One) starts to alarm on high latency. When monitoring the system the network interfaces burst up to ~45MB/s, latency then raises up to about 200ms and sometimes as high as 500ms. Nothing strange in top, gstat or netstat.

istgt.conf:
[Global]
Comment "Global section"
# node name (not include optional part)
NodeBase "storage2.x.istgt"

# files
PidFile /var/run/istgt.pid
AuthFile /usr/local/etc/istgt/auth.conf

# syslog facility
LogFacility "local7"

# socket I/O timeout sec. (polling is infinity)
Timeout 30
# NOPIN sending interval sec.
NopInInterval 20

# authentication information for discovery session
DiscoveryAuthMethod None
#DiscoveryAuthGroup AuthGroup9999

# reserved maximum connections and sessions
# NOTE: iSCSI boot is 2 or more sessions required
MaxSessions 32
MaxConnections 8

# iSCSI initial parameters negotiate with initiators
# NOTE: incorrect values might crash
FirstBurstLength 262144
MaxBurstLength 262144
MaxRecvDataSegmentLength 262144

[UnitControl]
Comment "Internal Logical Unit Controller"
#AuthMethod Auto
AuthMethod CHAP Mutual
AuthGroup AuthGroup10000
# this portal is only used as controller (by istgtcontrol)
# if it's not necessary, no portal is valid
#Portal UC1 [::1]:3261
Portal UC1 127.0.0.1:3261
# accept IP netmask
#Netmask [::1]
Netmask 127.0.0.1

[PortalGroup1]
Comment "esx-grp1 lagg0"
Portal DA1 172.16.68.10:3260
Portal DA2 172.16.69.10:3260

[InitiatorGroup1]
Comment "Initiator Group esx-grp1"
InitiatorName "ALL"
Netmask 172.16.68.0/24
Netmask 172.16.69.0/24

[LogicalUnit1]
Comment "esx-grp1-sata2"
TargetName esx-grp1-sata2
TargetAlias "esx-grp1-sata2"
# use initiators in tag1 via portals in tag1
Mapping PortalGroup1 InitiatorGroup1
# accept both CHAP and None
AuthMethod None
AuthGroup AuthGroup1
UnitType Disk
# Queuing 0=disabled, 1-255=enabled with specified depth.
QueueDepth 128
#QueueDepth 16
LUN0 Storage /tank1/sata2/sata2 3TB

[LogicalUnit2]
Comment "esx-grp1-sata3"
TargetName esx-grp1-sata3
TargetAlias "esx-grp1-sata3"
# use initiators in tag1 via portals in tag1
Mapping PortalGroup1 InitiatorGroup1
# accept both CHAP and None
AuthMethod None
AuthGroup AuthGroup1
UnitType Disk
# Queuing 0=disabled, 1-255=enabled with specified depth.
QueueDepth 128
#QueueDepth 16
LUN0 Storage /tank1/sata3/sata3 3TB

Anyone with an idea on how to decrease latency under a "not that heavy" load? Peaks at 2x45MB/s should not be a problem causing this?

//JO
 
@onob

As I stated earlier, it is very important how you design your pool with regards to partitioning and optimizing for 4k, where logs and cache are just as important. bjwela posted an accurate setup you can follow. Benchmark your performance locally first to know if you are on the right path, you cannot expect better values remotely than you have locally.

Code:
LUN0 Storage /tank1/sata2/sata2 3TB

Looks to me like a file called "sata2" within a zfs filesystem, also called "sata2". I suggest you instead try to export a zvol instead:
# zfs create -b 4k -o sync=always -o compress=on -s -V 1t pool/lun
Code:
LUN0 Storage /dev/zvol/pool/lun 1TB
And be sure to format the lun with the same block size on the client(initiator) as you specified for the lun, in this case 4096(4k), but with e.g. NTFS, you can raise that all the way up to 64k, depending on your application use-case.

/Sebulon
 
Hello,

I was wondering if I can also get better performance with switching istgt to work on udp.
Code:
iscsi-target    3260/tcp   # iSCSI port
iscsi-target    3260/udp   # iSCSI port
But I couldn't find anywhere in the configuration files an option for tcp/udp protocol.
google didn't do much better.

Anybody knows if istgt can work on udp, or I should try another iscsi target solution?

Thank you.
 
gnoma said:
Anybody knows if istgt can work on udp, or I should try another iscsi target solution?

Please see the RFC 3270:

Code:
   - Connection: A connection is a TCP connection.  Communication
     between the initiator and target occurs over one or more TCP
     connections.  The TCP connections carry control messages, SCSI
     commands, parameters, and data within iSCSI Protocol Data Units
     (iSCSI PDUs)
Anything working differently just breaks a standard.
 
Back
Top