I believe you are underestimating what these nvme technologies can do today. They can give you 800 Mb/s of sustained random reads, today, at commodity hardware prices. This is literally swapping from RAM to slower RAM. The latency response at high queue depths does not resemble what you are used to for storage at all. They behave like RAM.
The limiting factor is not
only the transfer rate but indeed the access time. Looking for numbers, I found these are in the single-digit µs range for those modern drives, which is awesome, but still a factor 1000 from modern RAM (single-digit ns range). So, the performance loss from swapping is still substantial, when compared to just accessing physical RAM.
The
other question would be whether this could still be "acceptable" in practice with this kind of modern hardware. A definitive answer to this is only possible by testing/benchmarking, but let's do the maths for a hypothetical scenario anyways:
Let's assume this (uncommon) scenario of a single application working in a huge allocation and let's assume it needs 1GB of pages currently swapped out. In this scenario, we'll probably have many "superpages" of size 2M (on amd64), but as FreeBSD splits up superpages into regular 4k pages again under memory pressure, we'll also have some of them.
A lot of assumptions needed here, I'll assume further to find 128MB in 32768 regular pages and 896MB in 448 superpages (and a similar assumption for the pages that must be swapped out to make room). With this, we must transfer a total of 66432 pages to/from swap in order to swap in this 1GB of memory. With an access time of 5µs, we'd end up in sum with 0.53s just waiting for transfer to start. Even assuming some of the pages could be arranged contiguously on the swap device, say it's 0.4s. Add the transfer itself for 2GB at 4GB/s (0.5s), this makes for a total of 0.9s.
Now, if our application has 300GB to process and can organize its accesses in a way that every piece of memory has to be swapped only once, the additional processing time due to swapping will be ~4.5 minutes. With random and repeated accesses causing every piece to be swapped, say, 5 times, you're already at 22.5 minutes…
Yes, this is nowhere near the catastrophic figures with old drives. But it's still substantial, although it
does look "acceptable", depending on the usecase.
Side note: your "typical" memory pressure scenario caused by just lots of processes will still be a lot worse, because you can assume to have
much more regular 4k pages and most of the time the access time for each and every one of them.
—
If you achieved one thing, it's inducing me a wish for new hardware, although I don't need it for my private desktop. I rest my case swap can't "replace" RAM, still these speeds are awesome of course.