Thanks for the information!
My original reasoning for using CPU only was the quality per bitrate ratio. I have tested NVIDIA, AMD and Intel GPU/iGPU ASICs several years back (actually 5+), at a preset bitrate and maximum quality settings, and compared the result to a much slower and energy-intensive x265. My goal was to reach maximum quality for a given bitrate limit.
In terms of quality per bitrate, x265 easily won. Back then, Intel QuickSync came in second, NVIDIA NVENC third and AMD VCE dead last. Not sure how it is today, but the GPU ASICs were all way worse than x265 in terms of quality per bitrate.
But I guess it depends on what you want to achieve. The GPU ASICs are many times faster than x265, so much so that the factor can easily be in the three- or even four-digit range. If you don't care about disk space, you could always just double or triple the bitrate and achieve good quality output. But given I'm a compression efficiency fetishist, I went for x265.
To be clear: The reason for wanting fast storage was the (de)multiplexing stage. Sometimes you will see media containers the size of ~100 GiB. Doing (de)multiplexing operations on those A/V media containers to get to the elementary streams can be quite tedious if the underlying I/O is super slow for whatever reason. And NVMe
DEALLOCATE
making it even worse because you just deleted another 300 GiB of data was... discouraging.
But it's all great now, just because I significantly increased the stripe block size by a factor of 64 and because I've given real-time priority to certain kernel processes.