AMD TRX40 and Ryzen Threadripper 3960X/3970X support?

Hello,

After skipping all Threadripper chips from AMD so far, I have decided that now is the right time to build my new headless encoding machine based on the newly released AMD Ryzen Threadripper 3970X processor and the TRX40 chipset. My mainboard of choice would be the ASRock TRX40 Taichi for now.

Since the predecessor (A now-dead AMD FX-9590 on 990FX) was running FeeBSD, I would like to use it for this machine as well, because I just enjoyed using the operating system. :)

But after looking at the hardware notes for 12.1-RELEASE, I'm a bit concerned. Not only doesn't it mention any Ryzen processors or modern AMD chipsets, but other things like the 2.5Gbit network chip on my board of choice are missing from that list as well. That would be one Realtek RTL8125AG. The other one, an Intel I211AT is supported by the em4 driver I assume. Also, I cannot find any definitive information about whether FreeBSD supports AMD Turbo Core on the Zen 2 architecture. It would be rather bad if the CPU couldn't use its turbo under FreeBSD.

On top of that, I'm going to use a PCIe 4.0 NVMe SSD for storage. I looked around on the web, and it seems FreeBSD has supported NVMe for a while now. But will it also work with a TRX40 chipset that the kernel doesn't yet know? I can't find any information about NVMe in the hardware notes - or I'm not searching correctly, not sure.

That's why I thought I better ask around here: Will this work? Or should I wait a little and maybe use Linux in the meantime?

Thank you!
 
Phoronix did a test a week ago on a TRX40 motherboard and a TR 3970X. They used an ASUS board instead of the ASRock you linked to, but everything worked out of the box for them, better than their Linux distributions did!

https://www.phoronix.com/scan.php?page=article&item=freebsd-amd-3970x&num=1

You might want to consider the ASRock TRX40 Creator. Instead of the Intel I211-AT, It, like the ASUS board reviewed by Phoronix, uses the Aquantia AQC107, which is fully supported on FreeBSD 12.1. It currently is listed for $50 less than the Taichi on NewEgg, so if you need two network interfaces, you could put the cost difference into a NIC.
 
Thank you very much! I will read that test later today.

(Un?)fortunately, the ASRock TRX40 Taichi is already here, together with a noctua cooler. ;) The other components - CPU included - should arrive by next week. Also, I have need of only one network interface. This is going to be kind of a headless workstation, so all I need is the capability to reach it via SSH2.

And since the network it's going to be hooked up to is limited to 1Gbps, the faster 2.5Gbps interface doesn't make any sense anyway.

Is the I211-AT somehow problematic? Just asking because you're saying the AQC107 would be "fully" supported.

I will report back, once everything is assembled. I have prepared three operating systems for performance comparisons using my typical software (x265 with high resolution input) as well: Windows 10 1909, Fedora Linux 31, and FreeBSD 12.1 UNIX. For Windows, the software will be compiled with VisualC++ 2017, for Linux with GCC (probably 9.2.0) and for FreeBSD with the latest available clang/LLVM from the package repository.

Edit: Ok, I read the phoronix article. Looks pretty good, I'm looking foward to testing it myself!
 
Thank you very much! I will read that test later today.

(Un?)fortunately, the ASRock TRX40 Taichi is already here, together with a noctua cooler. ;) The other components - CPU included - should arrive by next week. Also, I have need of only one network interface. This is going to be kind of a headless workstation, so all I need is the capability to reach it via SSH2.

And since the network it's going to be hooked up to is limited to 1Gbps, the faster 2.5Gbps interface doesn't make any sense anyway.

Is the I211-AT somehow problematic? Just asking because you're saying the AQC107 would be "fully" supported.

I will report back, once everything is assembled. I have prepared three operating systems for performance comparisons using my typical software (x265 with high resolution input) as well: Windows 10 1909, Fedora Linux 31, and FreeBSD 12.1 UNIX. For Windows, the software will be compiled with VisualC++ 2017, for Linux with GCC (probably 9.2.0) and for FreeBSD with the latest available clang/LLVM from the package repository.

Edit: Ok, I read the phoronix article. Looks pretty good, I'm looking foward to testing it myself!
The i211-AT should not be a problem at all. I was referencing the AQC107 vs. the RTL8125AG. The ASRock TRX40 Creator has both the ACQ107 and the RTL8125AG. I saw that Realtek has some old drivers for FreeBSD that do not list the 2.5Gb speeds, and I do not know of any current work on open source drivers for that interface.

The Phoronix article I listed did tests similar to what you are doing. I look forward to your results!
 
I started running my tests, and everything looked pretty nice with x264. Windows 10 was slightly faster than Linux (which Phoronix also reported), but FreeBSD beat them both.

I've hit a serious problem with x265 though, and that's the actually important piece of software. While it works marvelously under Linux (and has problems loading the CPU with Win32 Threads on Win10), it behaves horribly under FreeBSD. It does fully load the CPU, but more than two thirds is just kernel load. It's running horribly slow. So I disabled SMT, because some people were observing this with Intel Hyperthreading before, and its a bit faster now, but there is still far too much kernel load.

It's absolutely not competitive at all, as its 2-3 times slower than on Linux, which just can't be right. x265 on my old AMD FX-9590 with FreeBSD 12.0-RELEASE behaved nothing like this, and was compiled with clang just as much as the version I just compiled now.

Maybe I'll try GCC to see whether the compiler makes some difference.

Edit: Recompiled with GCC 9.2.0, same behavior.. uh...

Edit 2: I have conducted further tests and believe that this is at least partially my benchmark being at fault; It's based on a patched version of x265 to allow for very high resolutions like 8192×4608 for better scaling. It appears some part of my patches or the age of the x265 version bundled with the benchmark (2.5+48) are at fault here, because a modern stock x265 3.2.1 works as expected. Crap. :( Didn't show up on Linux... Maybe my version only needs some small fix..

But aaaanyway: The system works. The only thing that I expected to work but found to be not working is the amdtemp driver. I loaded it, but sysctl won't report any chiplet temperatures at all. It appears official support will be coming with FreeBSD 12.2-RELEASE according to PR 239607, if I understand this correctly.

Edit 3: I hope it's okay to link to an external website (There's no ads or anything like that). While not perfectly meaningful, my x264 results can be found [on this list]. Unfortunately, that benchmark was designed to be binary-locked, hence all custom builds are "invalid" and marked in red. But it's still good enough to compare the Threadripper chip across three operating systems. It's still not very meaningful, because that benchmark is almost 10 years old and doesn't scale well across so many cores and threads.

Since my x265 benchmark is weirdly broken on this machine when using FreeBSD, I can only provide Windows and Linux results for it. :( Well, you can find them at the top of [this list].
 
That's great to hear. I will probably not be getting a Threadripper, but the performance and usability should scale down fine to Ryzen and X570, which I will most likely be getting with the next tax refund.

Just to verify in my mind - your x265 benchmark appears to be broken, but the actual work being done is functioning fine, correct? If so, it would be very nice to see actual benchmarks once that is fixed!
 
I can only fix it once I understand what's broken (which I currently do not). Hints as to how I can debug this issue are welcome!

And yes, an "almost" stock x265 3.2.1 appears to work absolutely fine, so I can do my work! I compiled it yesterday with just a very small fix that makes it identify FreeBSD propely instead of showing an "Unknown OS" for the platform. Then I fed it some 4K videos, and it was happily crunching away with very little kernel load, and with impressive performance! It cannot run my benchmark though, because it needs the extended resolution patch for it.

I will have to try two things: Fist, recompile a stock x265 2.5+48, which is the version my benchmark is based on. If this works fine, then it has to be either my patches or my input data that are at fault somehow. This.. may take a while.

In the meantime, I'm just happy to see how well FreeBSD works on this very new platform. As I said, I can live without the Realtek RTL8125AG. Fedora 31 Linux also didn't detect that chip. And amdtemp will be patched soon I guess.
 
Good to hear. Working on a putting together a desktop box with 3700x/X570, expecting it to work well with FreeBSD.

That's a top of the line system you have, a mid-range system should be plenty for me. Amazing how powerful these latest gen systems are, crazy. The laptop I'm currently using is an i7 six core job with NVMe drives and it makes my previous desktop system look like a slug.

I do a little encoding with HandBrake ripping blu-rays into my library and for sure there's no such thing as too much CPU power for that. I'll just be happy to speed things up a bit with the desktop system. Not something I do all the time, only to add a title to my library here and there.
 
Absolutely. The PCIe 4.0 NVMe SSD I got for this system (because if I have PCIe 4.0, why not use it) is crazy fast. I can demultiplex a 20GiB A/V containerfile in less than 10 seconds, where my old machine would take several minutes. And that's sync! Couldn't believe my eyes when I saw it. Not to mention the insane CPU. It's crazy that AMD's going to release a processor with double the core count of this monster in Q1/2020 already..

Oh, and I can confirm, that FreeBSD boots without issues on the new threadrippers. For Linux, I had to turn off machine check exception support in the kernel (parameter "mce=off"), or it would just lock up.
 
That's pretty radical having that many cores. I'll be happy with eight cores and sixteen threads. Should give me a nice boost over the 6/12 core performance of my laptop. Plus the laptop throttles for thermal and power so I'll get full core speed with a desktop.

Yeah the disk speed is almost shocking even with 3.0x4. My laptop actually came with a SATA M.2 for some odd reason even though it supports NVMe. I changed out that drive out with a Samsung Evo Plus unit (those are the fastest 3.0 drives). I thought there was something wrong with Crystal Disk when I benchmarked that disk initially, getting over 3k on both seq read and write. That has to be the most pleasantly surprising benchmark result I've experienced.

The 4.0 drives are definitely worth the trouble getting an additional third in speed. Though one thing I don't like about them is they tend to run pretty warm. So that's one thing good about the 3.0 drives is you don't have to be so concerned with heat sinks and ventilation.
 
In the meantime, I've been trying to write my own clock frequency measurement script for the Threadripper 3970X, relying on AMDs [Open-Source Register Reference] for family 17h processors. According to the information in point "2.1.2 Effective Frequency", one has to reset the CPU's machine-specific registers Core::X86::Msr::MPERF (0xE7) and Core::X86::Msr::APERF (0xE8), and then let them both count up for a while. After that, it should be possible to calculate an average of the effective clock speed over that measurement time window by doing this: Content of APERF / content of MPERF × P0 clock frequency.

For this, I used the cpuctl driver and the cpucontrol command as well as the basic calculator bc. For testing, I only played around with logical CPU 0:

Bash:
#!/usr/bin/env sh

### Information about this script
# This script requires the cpuctl driver to be loaded:
#   # kldload cpuctl
#
# It also needs the 'printf' 'bc', 'sleep' and 'cpucontrol' commands
# present on your system or it will fail.


### User-configurable part

# Set P0 (reference / base, not maximum turbo) clock frequency of your
# processor in MHz:
P0freq=3700


### Non-user-configurable part (don't change anything below this line)

# Set Core::X86::Msr::MPERF (0xE7) & Core::X86::Msr::APERF (0xE8) to zero
# to start measurement
( cpucontrol -m 0xE7=0x0 /dev/cpuctl0 & )
( cpucontrol -m 0xE8=0x0 /dev/cpuctl0 & )

# Wait for a very short bit for measurement data to accumulate. If we wait
# for too long we'll just get an average over all clock speeds achieved
# over the measurement time window, but we rather want the current clock
# speed instead. The sleep timer is currently set to 1s (subject to
# change).
sleep 1

# Read back Core::X86::Msr::MPERF (0xE7) and Core::X86::Msr::APERF (0xE8),
# convert the resulting numbers from hexadecimal to human-readable decimal
# and compute the effective clock rate from them
E7raw=$(cpucontrol -m 0xE7 /dev/cpuctl0)
E8raw=$(cpucontrol -m 0xE8 /dev/cpuctl0)

E71h=$(printf "${E7raw}" | cut -d'x' -f3 | cut -d' ' -f1)
E72h=$(printf "${E7raw}" | cut -d'x' -f4);
E7h=$(printf "${E71h}${E72h}" | tr '[:lower:]' '[:upper:]')
E7=$(printf "obase=10; ibase=16; ${E7h}\n" | bc)

E81h=$(printf "${E8raw}" | cut -d'x' -f3 | cut -d' ' -f1)
E82h=$(printf "${E8raw}" | cut -d'x' -f4);
E8h=$(printf "${E81h}${E82h}" | tr '[:lower:]' '[:upper:]')
E8=$(printf "obase=10; ibase=16; ${E8h}\n" | bc)

# Output
printf 'CPU clock rate in MHz is: '
printf "scale=16; ${E8} / ${E7} * ${P0freq}\n" | bc

The problem with this is, that the clock speed seems "about right", whith this timer (1 second), but I'm sceptical about the correctness of the output. Especially when setting the waiting time to low values like 100ms with sleep 0.1, the values tend to be off by quite a wide margin, even though 100ms is pretty long in terms of CPU time. My assumption is that either the spawning of the two cpucontrol commands for resetting the registers or more likely the ones for re-reading them aren't as simultaneous as I'd like them to be. Due to this, the two values obtained from the registers will always be a bit off, which can only be mitigated by widening the measurement time window.

But this is another problem, because if I widen it too much (measure for too long), I'll get a smooth clock frequency average over the whole time window rather than a "current clock frequency" value.

I suspect that the MSR reset and re-read would need to be done programmatically (=not in a shell script) to be fast enough for correct "on the spot" clock frequency measurement. AMD's documentation also seems to suggest that timing is critical here. They're even suggesting the use of assembly language to write to and read from the registers as fast as possible. Unfortunately, I am a lousy programmer. Like, really lousy.

Any suggestions?

Or should I just rely on tools like x86info? Though its output seems sketchy sometimes as well...

Thanks!

Edit: Maybe I should elaborate on why I want this. sysctl dev.cpu.0.freq_levels only shows 3700/4070 2800/2800 2200/1980, which I guess means the CPU can reduce its clock frequency, but not use AMD turbo core? After all, the maximum turbo core frequency should be 4500MHz. Under full load (even when only on one core), sysctl dev.cpu.0.freq always shows 3700. This is the exact same behavior I've seen with FreeBSD 12.0 on my old AMD FX-9590, which would always report 4.7GHz, but never its turbo core frequency of up to 5GHz.

I'm not sure whether the CPU really doesn't use its turbo frequency or whether it's just a display / readout issue, and the CPU does use its turbo without FreeBSD noticing. I'm running powerd btw. I've tried setting its parameters to -a hiadaptive -n hiadaptive -r 10% -m 2200 -M 4500 to no avail.

I've been confused by this back then, and I'm still confused by it on this new system, so I'd like to use AMD's suggested method to determine the core's "real" clock speeds.
 
Last edited:
Just to report back: The machine has now been running under heavy 24/7 load for about three weeks with no issues to report. Everything is stable and fast. When I say "heavy load", it's like this:
BEAST-Mark-II-load-6jobs.png

In the meantime, I attempted to get amdtemp to report temperatures for the chip, but no luck. I tried a newer amdtemp driver with Zen 2 patches from r350624, but even that won't create the dev.cpu.*.temperature OIDs and won't report any temperatures.

Seems the driver will have to be patched for this to work, so I've submitted a bug report: PR 243406

Edit: Fixed, works as intended. I made a stupid mistake, loading the old versions of the kernel modules instead of the newly compiled ones.
 
My apologies for all the multi-posting, but I just wanted to show off the finalized machine. The pointless graphics card has finally been removed, and the missing fans and fan grills were installed. The machine is now booting headless. FreeBSD 12.1 with its UEFI bootloader required no additional configuration to enable headless booting. It just works out of the box, which is very nice.

Here are some pictures of this newly assembled FreeBSD "remote workstation", SilverStone Primera PM02 steel case, Seasonic Prime TX-650 power supply, ASRock TRX40 Taichi board, Ryzen Threadripper 3970X, 64GiB of Kingston HyperX RAM and that 2TB Phison-based M.2 PCIe 4.0 NVMe SSD, all cooled by Noctua (whom I prefer not just for build quality, but also because it's a company from my home country):

BEAST-MarkII-angleshot1.jpg BEAST-MarkII-frontopen.jpg BEAST-MarkII-rear.jpg BEAST-MarkII-inside-from-front.jpg BEAST-MarkII-frontstickers.jpg
(Click to enlarge)

I've got to say, I'm really pleased with this build! Works marvellously well and looks beautiful while doing so!
 
Back
Top