UFS Question about software RAID-0 with SSDs

Ah, same locale as here. And for the most part the same language 😉

Yes, that word be some good performance, but two GB/sec is still not to be sneezed at.

But it might break the memory bandwidth, maybe.
 
[...] But it might break the memory bandwidth, maybe.
That, at least, wouldn't be likely. The machine in question is running quad-channel DDR4/3200, which results in a theoretical maximum transfer rate of 12,8 TiB/s. Of course there is DRAM latencies and bursting and all that to be considered, but it wouldn't affect a block device performing at "just" 2 TiB/s. Sadly, I can't really verify that claim, as I'm lacking said 2 TiB/s block device. 😇
 
I did just that for giggles. I created a 16 GiB md device using malloc() and formatted it with UFS at 64 kiB block size, then ran a dd read benchmark on it, also with 64 kiB block size. The result is mighty confusing. So here's what I did:​

Code:
# mdconfig -a -t malloc -o reserve -o nocompress -s 16g -S 512 -L ramdisktest
# newfs -L testdisk -S 512 -U -b 65536 -j /dev/md0
# time dd bs=65536 if=/dev/md0 of=/dev/null

But the result is just shy of 730 MiB/s at a load value of around 110, so it's slower than my SSDs? That shouldn't be possible. Then again, I have zero experience with md devices under FreeBSD. Today was the first time I ever created one I think. :)

Edit: Ah... Should've probably mounted it before reading to even involve the UFS driver.

Edit 2: Alright, mounted it, wrote a file to it using 1 MiB block size, netting about 1,6 TiB/s write speed, then unmounted it to get rid of the cache. Remounted it, then ran another read test on that file with 64 kiB block size. Result: About 897 MiB/s at similar load. Not sure how to explain this though. ;)
 
There seems to be a performance pig in the md driver. What speed does it read from /dev/mem? That is the raw system memory, so no touching.
 
I tried to dump 100 GiB of data from /dev/mem to /dev/null at a block size of 1 MiB:

# time dd bs=1M count=102400 if=/dev/mem of=/dev/null

After a short while the machine became unresponsive and I immediately heard the fans spin down, indicating it has dropped all compute jobs. And: ssh: connect to host hostnamehere port 22: No route to host.

It appears that wasn't the smartest of ideas. Last thing I could see was significant kernel load on one CPU core, which would make some sense:​

read-from-devmem.png

It's on CPU 44​

I'll wait for a while to see whether it recovers and then look at the local console. But I guess it crashed or froze up?

Edit: Yup, it froze up. Hooked up a USB keyboard, no num lock LED working. Hooked up an interrupt-capable PS/2 keyboard (I boot the machine with one connected, so PS/2 gets initialized) and no num lock LED working there either.

I guess dumping 100 GiB of RAM was just too much somehow? There are no error messages on the local terminal either, no panic or anything.​
 
I think you ran out of the physical memory range and it read PCI registers - some of which may be strobe registers. No way to say if that is what happened or not - but I am deeply sorry for my suggestion leading to such an outcome.
 
Ah well, it's not like I double-checked and read up on what I was doing when I just launched that as superuser root, so blame's on me. ;) A few weeks of work were lost, but that's not much of a problem, actually. No deadlines are involved and I'm allowed to use my employer's electricity for it, so no money lost either.

At least I learned something: Reading from /dev/mem can potentially be dangerous. By the way, the machine has 128 GiB of physical memory. But no idea in what part of the address space I was messing around...​
 
I use x265, which is a pure CPU encoder. To be honest, I wouldn't even know how to use a GPU's H.265/HEVC ASIC for encoding on FreeBSD. Is that even possible? With what hardware? GeForces? Radeons?

My machine usually doesn't even have a graphics card installed, running headless. But last time I forgot to remove the small GeForce I had plugged in for debugging, which is why I had a local console to access this time around. Sadly, no IPMI on my mainboard.​
 
NVidia >=Turing chip (1650 and later), ffmpeg and vaapi. If you have some home computer with gpu you can test it on it.

 
Thanks for the information!

My original reasoning for using CPU only was the quality per bitrate ratio. I have tested NVIDIA, AMD and Intel GPU/iGPU ASICs several years back (actually 5+), at a preset bitrate and maximum quality settings, and compared the result to a much slower and energy-intensive x265. My goal was to reach maximum quality for a given bitrate limit.

In terms of quality per bitrate, x265 easily won. Back then, Intel QuickSync came in second, NVIDIA NVENC third and AMD VCE dead last. Not sure how it is today, but the GPU ASICs were all way worse than x265 in terms of quality per bitrate.

But I guess it depends on what you want to achieve. The GPU ASICs are many times faster than x265, so much so that the factor can easily be in the three- or even four-digit range. If you don't care about disk space, you could always just double or triple the bitrate and achieve good quality output. But given I'm a compression efficiency fetishist, I went for x265. ;)

To be clear: The reason for wanting fast storage was the (de)multiplexing stage. Sometimes you will see media containers the size of ~100 GiB. Doing (de)multiplexing operations on those A/V media containers to get to the elementary streams can be quite tedious if the underlying I/O is super slow for whatever reason. And NVMe DEALLOCATE making it even worse because you just deleted another 300 GiB of data was... discouraging.

But it's all great now, just because I significantly increased the stripe block size by a factor of 64 and because I've given real-time priority to certain kernel processes.​
 
Note that the hardware video encoders have quality differences of their own even at the same bitrate. From what I heard software on CPU is still the highest quality at any given bitrate.
 
It's not like super bad. We compared the hardware encoders at maximum quality settings (there is not much to tweak really) versus x265 running with the "veryslow" preset, all of them at 2500 kbit/s for 1080p animated video, so with a rather homogenous image with no overly complex content.

To be honest, it was hard to spot the differences while the test video was in motion. When comparing still images, certain blocking artefacts became visible, yes, but once you raise the bitrate to let's say 10000 kbit/s, you wouldn't be able to spot the difference I'd say. We have not compared live-action content though, it was a very brief test.

The cards/chips tested back then were an AMD Radeon RX Vega 64, an NVIDIA GeForce GTX 1080 and some Intel iGPU, I forgot the exact model.

Speed was crazy though. I don't remember the runtimes exactly, but I think the ASIC hardware encoders were about a 100 times faster than x265 and consumed very little power while encoding! If electricity weren't free in my case, I couldn't do it on the CPU. That would be cost-prohibitive, especially these days, where electricity is about 3 times as expensive as it used to be like 5 years ago.​
 
Back
Top