Bhyve: Investigating poor guest performance when host is busy

Problem intro:
- Started a make -j 20 buildworld on the host to peg the cpus
- Started a 2 and 4 vcpu FreeBSD guest and time how long it takes to boot up
- Started a 1 vcpu FreeBSD guest and time how long it takes to boot up. It is always much faster than the 2 or 4 vcpus.

My suspicion:
I suspect that the ULE scheduler makes some bad decisions with scheduling because it is not aware that bhyve threads are actually running vcpus on top of them. The classic way to solve this is with gang scheduling to ensure that all vcpus are running simultaneously.

1. Can anyone tell me what the cause might be? What may be happening?
2. Do you know if there is currently any work in investigation this problem? Or anything related?
3. Is Gang Scheduling or Coscheduling implemented in FreeBSD?
4. Do you know of any other solution to this kind of problem?
5. Can you recommend me any papers/videos/links in anyway related to this?

I'm sorry if the question seem vague. But I don't understand the problem so well either and I'm relatively new to FreeBSD
 
To attempt to answer the questions:

1. The main issue is 'lock holder preemption', where a vCPU that is holding a spinlock has been pre-empted by the host scheduler, resulting in other vCPUs that are trying to acquire that lock to spin for full quantums.

Booting is a variant of this for FreeBSD since the AP spins on a memory location waiting for a BSP to start up.

2. There's some minor investigation going on.

3. No.

4. I don't know that 'classic' gang scheduling is the answer (see 5). What has been thought of for bhyve at least is to a) have the concept of vCPU 'groups' in the scheduler, b) provide metrics to assist the scheduler in trying to spread out threads associated with a vCPU group so they don't end up on the same physical CPU (avoidance of lock-holder preemption), and c) implement pause-loop exits (see the Intel SDM, 24.6.13) in the hypervisor and provide that information to the scheduler so it can give a temporary priority boost to vCPUs that have been preempted but aren't currently running.

5. The classic reference on this is VMWare's scheduler paper: www.vmware.com/files/pdf/techpaper/VMware-vSphere-CPU-Sched-Perf.pdf
 
Another solution I have found is one used by Xen (a bare-metal hypervisor) to prevent unnecessary active waiting. It issues a hypercall when waiting longer than a certain threshold. On reception of the hypercall the VMM schedules another VCPU of the same guest, in hope that it will eventually schedule the lock-holder.
Does anyone know if bhyve supports hypercalls? And any information about it would be welcomed.

Thank you.
 
Back
Top