Hello,
As its my first post here on this board, just a little background on myself:
I'm quite new to BSD, but a linux user for about 14 years now and network/sysadmin since ~5 years. I was using mainly debian since the sarge/etch days, now thanks to systemd I switched to devuan. Over the years I gazed on different BSDs several times, but never found the time to do it right.
Thanks to ZFS and the pfSense box I'm using for ~2 months now on my Network, my interest in FreeBSD got quite a boost over the last months, and I finally found the time to dive in a little deeper during the last few weeks - and I really like it
I currently have FreeBSD 10.3 installed on a diskless test machine, which is booting from FC targets. The system is an old-ish Intel S5000 with a single Xeon L5410 and 8GB FB-ECC RAM and a Qlogic QLE2462 4Gbit HBA.
The targets are zvols on my devuan storage server with ZFS on Linux, 16GB RAM, 3x mirror vdevs and ZIL + L2ARC on 2 SSDs. The mirror devices and SSDs are spread over 2 SAS-Controllers, so performance of the pool is rather overkill for my small home network...
After a little research, the FreeBSD setup on a multipath FC-target with ZFS went really smooth - actually even without much BSD-experience it was much faster and more stable than with linux, where dm-multipath is constantly blowing up...
The only issue I ran into with FreeBSD is the secondary GPT Table overwriting the gmultipath-labels (or the labels overwriting the secondary GPT...). So as a workaround I had to set up the mp-labels on each partition individually.
Sadly, the performance is quite bad. And by bad I mean only ~30% of what I get with a devuan jessie install on the same box. (Well, until dm-multipath shoots itself in the foot again, and I can only use individual paths...)
For the test I created a new 20GB zvol and exported it to the client, so I'm not measuring against the LUN with the system on it.
For I/O measurement I'm using this tool: https://github.com/cxcv/iops
Performance on devuan jessie with active/active multipath was quite good out of the box:
This should be the PCIe1.1 x4 interface of the HBA being maxed out at just over 800MB/s, given a theoretical max. throughput of ~1000MB/s without protocol overhead. So absolutely nothing to complain here.
But with Freebsd the metrics are rather bad:
Changing the multipath from active/active to active/read decreases performance even more.
Not only is the throughput much lower, but IOPS are just hopelessly bad - 140k vs 30k at 4kB blocksize...
During the test CPU load is ~20% on both OS. Local caching shouldn't affect the measurement, as the tool is syncing everything directly to disk. RAM-usage is staying low on both installs and only fluctuates over a few MB, so apparently none of the OS is "cheating" here.
One thing I recognized was camcontrol reporting a blocksize of 512bytes for the LUNs:
Is the detection by CAM affecting any internal command queuing or how the system is handling IO to/from the device?
If the system tries to access 512byte blocks this might add just enough overhead to explain the bad performance...
Is it possible to manually set the blocksize of a device? camcontrol seems to only offer reading the BS (or I overlooked it in the manpage...)
Are there any other places to look at or knobs to tweak?
Thanks,
Sebastian
As its my first post here on this board, just a little background on myself:
I'm quite new to BSD, but a linux user for about 14 years now and network/sysadmin since ~5 years. I was using mainly debian since the sarge/etch days, now thanks to systemd I switched to devuan. Over the years I gazed on different BSDs several times, but never found the time to do it right.
Thanks to ZFS and the pfSense box I'm using for ~2 months now on my Network, my interest in FreeBSD got quite a boost over the last months, and I finally found the time to dive in a little deeper during the last few weeks - and I really like it
I currently have FreeBSD 10.3 installed on a diskless test machine, which is booting from FC targets. The system is an old-ish Intel S5000 with a single Xeon L5410 and 8GB FB-ECC RAM and a Qlogic QLE2462 4Gbit HBA.
The targets are zvols on my devuan storage server with ZFS on Linux, 16GB RAM, 3x mirror vdevs and ZIL + L2ARC on 2 SSDs. The mirror devices and SSDs are spread over 2 SAS-Controllers, so performance of the pool is rather overkill for my small home network...
After a little research, the FreeBSD setup on a multipath FC-target with ZFS went really smooth - actually even without much BSD-experience it was much faster and more stable than with linux, where dm-multipath is constantly blowing up...
The only issue I ran into with FreeBSD is the secondary GPT Table overwriting the gmultipath-labels (or the labels overwriting the secondary GPT...). So as a workaround I had to set up the mp-labels on each partition individually.
Sadly, the performance is quite bad. And by bad I mean only ~30% of what I get with a devuan jessie install on the same box. (Well, until dm-multipath shoots itself in the foot again, and I can only use individual paths...)
For the test I created a new 20GB zvol and exported it to the client, so I'm not measuring against the LUN with the system on it.
For I/O measurement I'm using this tool: https://github.com/cxcv/iops
Performance on devuan jessie with active/active multipath was quite good out of the box:
Code:
# ./iops /dev/mapper/mpathc
/dev/mapper/mpathc, 21.47 GB, 32 threads, random:
512 B blocks: 138146.1 IO/s, 70.7 MB/s (565.8 Mbit/s)
1 kB blocks: 139215.5 IO/s, 142.6 MB/s ( 1.1 Gbit/s)
2 kB blocks: 137167.2 IO/s, 280.9 MB/s ( 2.2 Gbit/s)
4 kB blocks: 140778.5 IO/s, 576.6 MB/s ( 4.6 Gbit/s)
8 kB blocks: 89767.0 IO/s, 735.4 MB/s ( 5.9 Gbit/s)
16 kB blocks: 49945.3 IO/s, 818.3 MB/s ( 6.5 Gbit/s)
32 kB blocks: 25116.8 IO/s, 823.0 MB/s ( 6.6 Gbit/s)
65 kB blocks: 12702.6 IO/s, 832.4 MB/s ( 6.7 Gbit/s)
131 kB blocks: 6203.5 IO/s, 813.1 MB/s ( 6.5 Gbit/s)
262 kB blocks: 2008.3 IO/s, 526.5 MB/s ( 4.2 Gbit/s)
524 kB blocks: 1193.9 IO/s, 625.9 MB/s ( 5.0 Gbit/s)
1 MB blocks: 418.8 IO/s, 439.1 MB/s ( 3.5 Gbit/s)
2 MB blocks: 161.9 IO/s, 339.6 MB/s ( 2.7 Gbit/s)
4 MB blocks: 90.3 IO/s, 378.9 MB/s ( 3.0 Gbit/s)
8 MB blocks: 46.8 IO/s, 392.5 MB/s ( 3.1 Gbit/s)
16 MB blocks: 23.2 IO/s, 389.4 MB/s ( 3.1 Gbit/s)
But with Freebsd the metrics are rather bad:
Code:
# ./iops /dev/multipath/test
/dev/multipath/test, 21.47 GB, 32 threads, random:
512 B blocks: 30555.5 IO/s, 15.6 MB/s (125.2 Mbit/s)
1 kB blocks: 29949.8 IO/s, 30.7 MB/s (245.3 Mbit/s)
2 kB blocks: 29921.4 IO/s, 61.3 MB/s (490.2 Mbit/s)
4 kB blocks: 29601.1 IO/s, 121.2 MB/s (970.0 Mbit/s)
8 kB blocks: 24925.1 IO/s, 204.2 MB/s ( 1.6 Gbit/s)
16 kB blocks: 17865.4 IO/s, 292.7 MB/s ( 2.3 Gbit/s)
32 kB blocks: 10089.4 IO/s, 330.6 MB/s ( 2.6 Gbit/s)
65 kB blocks: 5067.6 IO/s, 332.1 MB/s ( 2.7 Gbit/s)
131 kB blocks: 2535.4 IO/s, 332.3 MB/s ( 2.7 Gbit/s)
262 kB blocks: 1271.0 IO/s, 333.2 MB/s ( 2.7 Gbit/s)
524 kB blocks: 634.1 IO/s, 332.4 MB/s ( 2.7 Gbit/s)
1 MB blocks: 316.7 IO/s, 332.1 MB/s ( 2.7 Gbit/s)
2 MB blocks: 153.9 IO/s, 322.8 MB/s ( 2.6 Gbit/s)
4 MB blocks: 63.0 IO/s, 264.2 MB/s ( 2.1 Gbit/s)
8 MB blocks: 30.0 IO/s, 251.7 MB/s ( 2.0 Gbit/s)
Changing the multipath from active/active to active/read decreases performance even more.
Not only is the throughput much lower, but IOPS are just hopelessly bad - 140k vs 30k at 4kB blocksize...
During the test CPU load is ~20% on both OS. Local caching shouldn't affect the measurement, as the tool is syncing everything directly to disk. RAM-usage is staying low on both installs and only fluctuates over a few MB, so apparently none of the OS is "cheating" here.
One thing I recognized was camcontrol reporting a blocksize of 512bytes for the LUNs:
Code:
# camcontrol readcap /dev/da4 -b
Block Length: 512 bytes
If the system tries to access 512byte blocks this might add just enough overhead to explain the bad performance...
Is it possible to manually set the blocksize of a device? camcontrol seems to only offer reading the BS (or I overlooked it in the manpage...)
Are there any other places to look at or knobs to tweak?
Thanks,
Sebastian