calomel.org has a nice writeup about how they achieved nearly 10gbit with PF on a single FreeBSD box (a few years ago with FreeBSD 9.1!):
https://calomel.org/network_performance.html
Also a short article about network tuning on FreeBSD in general:
https://calomel.org/freebsd_network_tuning.html
So it shouldn't be a problem achieving the targeted throuhput with PF on a single system nowadays - at least on bare metal.
But I suspect the limiting factor in your scenario might be the virtualization layer - even with paravirtualized network hardware, the general overhead and performance penalty from virtualization might be just too high to achieve the desired throughput.
OTOH with this size of network you should eliminate the single point of failure regardless of what throughput can be achieved with one system/VM. (but would be rather pointless if you run them on the same virtualization host...)
I just provisioned a new storage system (FreeBSD 11.0-RELEASE), which is connected to a smartOS host via 10gE and 8G FC. So for the upcoming week I'm planning on some performance tests between hosts/jails/zones/VMs - mainly iSCSI vs FC performance will be of interest, but I'm also curious about networking performance and the impact of KVM overhead on both.
Both systems and ZFS pools are quite beefy, so (hopefully) there shouldn't be any hardware induced bottlenecks.
I'd be happy to share results, and as these systems are not yet in production I could carry out some tests that might be of interest to others.