Solved KVM over NFS

I need to migrate a few KVM virtual machines (8) to a central ZFS storage. So far, I have used iscsi for that and the IO results were good.

In this case, I can't because the virtual machines already exist as .img and my only option is to use NFS. The idea is to shut off the VMs, copy them to the storage and then boot them from their new location.

So, my question is, does this sound good or bad performance wise? Any tips?

Thanks
 
I manage a 40TB search engine over nfs that serves up content in under 25ms.

You will see the same read performance, but could see up to a 25 - 30% write penalty with nfs.

While this is the case for nfsv3 not sure how it is for v4.
 
So as I understand it the issue would not be result of more or less disk performance, but rather the NFS protocol itself over the network.

However before you could attribute the write penalty to NFS you would have to eliminate other possible bottle necks.

How is the network setup specifically between the KVM host and the NFS server?

Is it gigabit, 10g, bonded lagg(4) lacp,etc. How many concurrent KVM clients, of your clients how many would see concurrent writes?
 
Last edited by a moderator:
So as I understand it the issue would not be result of more or less disk performance, but rather the NFS protocol itself over the network.

However before you could attribute the write penalty to NFS you would have to eliminate other possible bottle necks.

How is the network setup specifically between the KVM host and the NFS server?

Is it gigabit, 10g, bonded lagg(4) lacp,etc. How many concurrent KVM clients, of your clients how many would see concurrent writes?

I think that my bottleneck will be the protocol. The storage has 6 Gbit NICs configured in 3 pairs of LACP. The active KVM server will use 2 Gbit NICs in LACP also.

Right now, the storage (86 TB) is sharing videos over NFS to a web server farm. I get very good speeds when there is a demand and the load average is always less than 1.
 
Last edited by a moderator:
Check this out:

Benchmarks

It's probably in your situation your limiting factor will be network in general, not ISCSI or NFS. Lets say you you had 4 bonded copper cat 6 connections over a good gigabit switch in lacp lagg(4). Your disk can write 266MBps in this scenario. You have 3 KVM host connected with a single gigabit connection each. You start to write with one host using iperf for example. You would see around 120MBps. If you had your second host start writing with you would see 120MB there also. If you add a third host all three host would drop to around 80MBps.

Now lets imagine I added a second NIC to each LVM host with a bonded lagg(4) and repeated the experiment. Your first host still would only write at the speed that can travel over one nic, so the bonded nics on the kvm host would not improve speed to or from the nfs server.

If you wanted your NFS server to be able to send and receive at 4 GBps to a single host I have been told by Cisco engineers that this would be possible in my scenario only by using some kind a virtual switch in between the NFS server and the KVM client, like open switch for example or vswitch in ESXi, otherwise each connection would always be limited to 1 GBps.

iperf would be a great way to see this for your self.

I have built out such a scenario.
 
Last edited by a moderator:
In my case and for the needs that we have, utilizing the 2 port LACP NICs is more than adequate. My concerns regarding NFS are that in tests I have performed, when transferring data to the server, I could easily saturate 1 Gbit with rsync over ssh. But when using rsync over NFS the performance was about 70%.
 
Right, but in both those cases you were still capped by the speed of one NIC.

I suspect your disk perf is more than 1GBPS.

With virtual switches you could see the combined speed of two NICs, or in your case with NFS, combined speed at 70% of two NICs.

Would not combined speed of two NICs at 70% be better than 100% of one NIC?

Virtual switches set up right will give you better speeds on iSCSI or NFS, and if you have to go to NFS you still be faster than your past performance seen on iSCSI over 1 gigabit connection.

Check out this drawing, this is about what you want to do:

HowToDoIt.jpg
 
Yes

I recall thinking 6 months ago when I was looking to get this working in FreeBSD there was some issue after I installed the port causing segmentation faults, but I didn't spend much time trying to resolve it.

In your scenario where your KVM host has the 2 port LAGG, your NFS server to KVM host would have 2GBPS virtual connection and you could only set up 2GBPS connection from the virtual switch to each VM in the KVM host.

Openvswitch is well supported on newer Linux kernels, it would be nice to see someone figure this out on FreeBSD.
 
I decided not to use Openvswitch, mainly because I do not want to test software on the production storage. So far I have installed 2 VM's and the performance is really excellent. So, I am marking this as solved.
 
Hey gkontos, I was wondering how this was still going for you.

I have recently setup iscsi and nfs, and have an esxi server using both.

What I found in my home FreeBSD situation was that nfs had terrible write latency with esxi, quite terrible.

The iscsi however was spectacular.

NFS write latency average 482 milliseconds, where as iscsi write latency average .45

I have been looking into the issue and think this article describes the issue well.

I am not willing to use his first suggestion and the latency I have is with log zil already in place.

I can only imaging how much worse it may be without logzil

https://www.ateamsystems.com/tech-b...ith-freebsd-zfs-backed-esxi-storage-over-nfs/

What kind of write latency are you seeing?

I turned off atime, but was wondering what the implication or benefit may be of disabling sync?

Currently I just use nfs for snapshot.

When I setup a cloud stack cluster backed by kvm, it would be more convenient to use nfs stores, but not if it sees similar issue.

Here is a screenshot:

Peaks are all nfs, the iscsi is near flat lined

I am not currently using the NFS store for virtual machines as it was unacceptable

latency.png


Also, how have you been monitoring the performance characteristics? nfsstat, or something better?
 
Matthew Dresden, I currently run only 2 VMs from a KVM host. I don't have much experience with Vmware in HA. My plan is to use Proxmox in HA. But this is going to delay a bit until the developers provide some usable code and I can mark the VM's as production.

In any case, the performance that I have now is really much better than when I was hosting the machines in local storage.

EDIT: I am not sure if this helps but this is the disk throughput from a NFS hosted mail server, sending around ~1K messages per day.

diskstats_throughput-day.png
 
I don't know if you will be interested later, but I am going to find some ways to really dig down into server and client side metrics with iscsi and nfs backed by zfs.

I suppose I will also have to compare esxi and kvm to see where problems may be esxi specific rather than zfs specific.

I will post back some ways to collect the data when I get around to it.
 
Yes, of course. Any information is very valuable. Like I mentioned before, I don't want to install any other software and I try to keep my storage running only with limited monitor clients.
 
Hey gkontos, I was wondering how this was still going for you.

I have recently setup iscsi and nfs, and have an esxi server using both.

What I found in my home FreeBSD situation was that nfs had terrible write latency with esxi, quite terrible.

The iscsi however was spectacular.

NFS write latency average 482 milliseconds, where as iscsi write latency average .45

I have been looking into the issue and think this article describes the issue well.

I am not willing to use his first suggestion and the latency I have is with log zil already in place.

I can only imaging how much worse it may be without logzil

https://www.ateamsystems.com/tech-b...ith-freebsd-zfs-backed-esxi-storage-over-nfs/

What kind of write latency are you seeing?

I turned off atime, but was wondering what the implication or benefit may be of disabling sync?

Currently I just use nfs for snapshot.

When I setup a cloud stack cluster backed by kvm, it would be more convenient to use nfs stores, but not if it sees similar issue.

Here is a screenshot:

Peaks are all nfs, the iscsi is near flat lined

I am not currently using the NFS store for virtual machines as it was unacceptable

latency.png


Also, how have you been monitoring the performance characteristics? nfsstat, or something better?

You need to set the sync of ZVOL on ISCSI to always instead of standard, because the standard setting on ZVOL + ISCSI won't safely commit the data from writeback cache to the disk, but NFS does.

I have simulated power loss incident on both NFS and ISCSI + ZVOL , the ISCSI+ZVOL with sync=standard was encountered data loss during power disruption on storage, but it can be resolved by changing the sync=always.

NFS with ZIL , no problem even power sudden cut off on storage.
 
With I iscsi I use a sparse file in a dataset as filio is suppose to provide better performance than using a device.

In regard to the sync I have read this is an acceptable risk when the iscsi server and host are ups protected which would allow time for proper scripted shutdowns.

In this situation, do you have an additional recommendation?

I also have mirror Zil which acts as write cache that completes un-committed writes on reboots.
 
Back
Top