upgrade old FreeBSD home server

I'm not knowledgeable about ECC, so don't know about that....

In this case, I will help, and teach you everything you need to know about it: YOU WANT IT, REALLY BAD.

Here's what happens. Computers have been getting bigger, and have more and more memory. Today's file systems use memory very efficiently, as buffer caches: Data that was read or written recently is likely to be re-read soon, so let's keep it in memory; and data that is soon to be written can be held in memory for a while before actually going to disk. The problem is that memory hasn't gotten any more reliable, but we have more of it, and keep data in it for longer: more potential for data corruption. Data on disk is quite reliable, as disk interfaces have good error detection/correction while the data is on the wire, and disks have reasonably good error detection (with a published rate of one bit error per 10^-17 bits, which I believe). This is particularly problematic since the better file systems (such a ZFS) now protect data on disk and over the wire with extra checksums, but data in memory is completely unprotected: so if a bit flips, that corrupted bit will be written to disk, or served to an application.

Enter ECC = Error Correcting Code memory: We store a few extra bits for every word in memory, and that allows the motherboard to (automatically and transparently) correct any single-bit error (one bit in a word damaged), and detect any two-bit error (and typically do something sensible, and crashing the OS is not a bad thing to do, as double bit errors are extremely rare, and recovering fro them is not easy).

Relatively cheap insurance, and it fixes what is today probably the largest source of silent data corruption.
 
I'm not knowledgeable about ECC, so don't know about that....
It's a weird story. Somebody came up with the stance that one must use ECC when using ZFS. It boils down to the claim that ZFS would kind of "amplify" data corruption from memory errros. But that appears to be bogus.
What remains is that a cosmic ray can and will flip memory cells - and that means data corruption, which, without ECC, will go undetected. But a cosmic ray hits about once a year (except when You're at high altitude) and many memory cells aren't critical - except when You're running ZFS in a fileserver, where most memory is used as file cache.
So You can calculate the odds.
 
From this paper: https://storage.googleapis.com/pub-tools-public-publication-data/pdf/35162.pdf
“We find that DRAM error behavior in the field differs in many key aspects from commonly held assumptions. For example, we observe DRAM error rates that are orders of magnitude higher than previously reported, with 25,000 to 70,000 errors per billion device hours per Mbit and more than 8% of DIMMs affected by errors per year. We provide strong evidence that memory errors are dominated by hard errors, rather than soft errors, which previous work suspects to be the dominant error mode. We find that temperature, known to strongly impact DIMM error rates in lab conditions, has a surprisingly small effect on error behavior in the field, when taking all other factors into account. Finally, unlike commonly feared, we don’t observe any indication that newer generations of DIMMs have worse error behavior.

Note that bit flips due to cosmic rays are soft errors.
 
Are there issues with builtin video on AMD Ryzen motherboards? So many people seem to think I need a GPU, but I don't know what they think I'm doing... just ordinary web browsing (& text stuff!). I did note the issue with the 2.5G ethernet drivers.
I use an AMD Ryzen 3 3300X 4-Core Processor and AMD X570 motherboard with FreeBSD 12.2 without problems.

The 3300X has no integrated GPU, and I just use a really old ATI Radeon PCIe card. So I have no experience with integrated graphics.

You don't need a GPU unless you play high-end video games or mint crypto currency.

However you still need graphics for basic web browsing, and a lot of the Ryzen CPUs come with an integrated GPU, which may be a cost effective way to get video.

Some of the Ryzen 3 CPUs still seem to be in short supply. If you want to choose one with an integrated GPU, pick one and ask if anyone has one running.
 
... but I don't know what they think I'm doing... just ordinary web browsing (& text stuff!).
Laptop.

How much money are you going to spend upgrading the box you've got now with all the nice things suggested in this thread to have a desktop you can do "just ordinary web browsing (& text stuff!)" with?

I have 2 T61 Thinkpads that I paid $50 each for a couple years ago and posted a shot of my ebay page showing the price of one. You can see them both still at work last week in the screenshot thread.

I have the Gateway full sized tall gaming tower that came with Win98SE, a 500MHz Katmai PIII (reported to be NSA Backdoored) and 1GB RAM in my closet. By the time I upgraded it to the specs of the W520 I'm running now I'm pretty sure the price would exceed the $286 I paid for this one.

The W520 has a footprint of approx 15"x10" inches. The tower, monitor and keyboard would be taking up the space the two Thinkpads to my left are sitting in now. The T43 and T400 that is.
 
From this paper: https://storage.googleapis.com/pub-tools-public-publication-data/pdf/35162.pdf


Note that bit flips due to cosmic rays are soft errors.
If I get that right, then "hard error" is something I would simply call "broken device".

But there is one thing strange with this: If this were true, I should have occasional failures that cannot be tracked down to a cause. This is obviousely accepted by the consumer market: if your (Windows etc.) system somehow fails, just do a reboot. But I am running unix because I do not accept it. And, I practically do not experience such errors. I afford the luxury of (almost) always doing a root cause analysis, and they are usually succeessful - i.e. they can be tracked to a software defect, configuration flaw or a defective component (where then replace is successful),

OTOH, I am always running my stuff "on the safe side", not driving it to the limits. And I am indeed a fan of ECC memory, but mostly of regECC, which is slightly slower, but the memory is driven via latches, and so is run under less extreme conditions.

Looking closer into that paper.
They come to more or less the same figure as me: about one error per year (per machine/device). And they don't really know where these come from:
Conclusion 7: Error rates are unlikely to be dominated
by soft errors.

We observe that CE rates are highly correlated with sys-
tem utilization, even when isolating utilization effects from
the effects of temperature. In systems that do not use mem-
ory scrubbers this observation might simply reflect a higher
detection rate of errors. In systems with memory scrubbers,
this observation leads us to the conclusion that a significant
fraction of errors is likely due to mechanism other than soft
errors, such as hard errors or errors induced on the datap-
ath. The reason is that in systems with memory scrubbers
the reported rate of soft errors should not depend on uti-
lization levels in the system. Each soft error will eventually
be detected (either when the bit is accessed by an applica-
tion or by the scrubber), corrected and reported. Another
observation that supports Conclusion 7 is the strong corre-
lation between errors in the same DIMM. Events that cause
soft errors, such as cosmic radiation, are expected to happen
randomly over time and not in correlation.

Putting it shorter: some chips are crap.
 
Heh, well, my "web browsing" example is indeed the right reference for my graphic & computational needs. However this thing will be running 24x7 with various devices behind it, and what I mostly do *at* the desktop is my own writing work, which is neither graphical nor computationally intensive, but I do want good equipment that will last me 10 years. "Pennies a day" as they say....
 
I'd like to toss out another option instead of a laptop:
NUC format. If you like a more traditional (monitor, keyboard, etc) layout than a laptop, a NUC works very nicely as a general purpose desktop. I've got one running an i3 with 16G ram and a 250G mSATA device that works very nicely. The integrated i915 graphics works fine, plenty of room on the "disk". NUC is fanless (looks like a big heatsink) so very quiet. Almost makes me want to replace all the desktops with NUCs and put something like an ixSystems mini in for storage. Oh, a NUC may be a tad more upfront, but I think you recoup the difference quickly.

Oh, the ECC/cosmic rays discussion: it happens in the real world. I've been on the receiving end of it at work a few times. Pain in the butt to find, take a core file, start walking stack frames and after much eyestrain you find the one stack variable that has a single bit flipped between 2 stack frames.
 
Well the best I can show is 306 days uptime on a Thinkpad X61 that came with Vista on it that served as my dedicated .mp3 player till it threw a fan error and gracefully shut itself down:

my_toy.png


This one and the rest came with Win7 so who knows how much longer any of them will last. But I have 8 of them.
 
  • Thanks
Reactions: mer
To begin with, I actually run ZFS without ECC. Unfortunately, that's not by choice: If I take all the requirements for my motherboard, and add ECC, then there is no model that will do the job (or at least there wasn't when I bought it, about 4 or 6 years ago). I would like to use ECC, but I can't get there with reasonable effort. It also only has 3 usable GByte of memory.

It's a weird story. Somebody came up with the stance that one must use ECC when using ZFS. It boils down to the claim that ZFS would kind of "amplify" data corruption from memory errros. But that appears to be bogus.
If you phrase it that way, it is indeed bogus.

But: ZFS is very good at using memory as a buffer cache. We always hear about that from the bad side, when people complain that ZFS and its ARC have eaten too much memory. Look at it from the other side: ZFS is using more memory, for file data buffers,

The error rate of RAM is much higher than cosmic rays make you believe. And I think a lot of "inexplicable" errors and people whining about "low quality Microsoft software" are actually caused by (undetected) memory errors.

But here's the real difference: ZFS is the first free and mass-production file system to have implemented checksum protection for disks, which also protects the path to/from the disk, against undetected data corruption. It used to be that disk were the #1 cause of data corruption (and with it things like file system errors and crashes). With that problem gone, on a ZFS system it makes sense to look at the #2 cause, which is memory errors.

The other thing is that there is so much more memory around, which means that a lot more data is held in memory for the buffer cache, and for longer. I used to run Linux on a 386 with 4 MByte (including XWindows and scientific data analysis), but it was painfully slow, so I upgraded to 16 MByte. Today, consumer laptops are sold with 16 GByte (1024 times more), and servers are measured as 1/4 or 1/2 Terabyte. That makes for a gigantic target for memory errors. Thence the need for protection.
 
The other thing is that there is so much more memory around, which means that a lot more data is held in memory for the buffer cache, and for longer. I used to run Linux on a 386 with 4 MByte (including XWindows and scientific data analysis), but it was painfully slow, so I upgraded to 16 MByte. Today, consumer laptops are sold with 16 GByte (1024 times more), and servers are measured as 1/4 or 1/2 Terabyte. That makes for a gigantic target for memory errors. Thence the need for protection.
somehow equivalent to "The faster the computer, the faster it fscks up" :)
 
To begin with, I actually run ZFS without ECC. Unfortunately, that's not by choice: If I take all the requirements for my motherboard, and add ECC, then there is no model that will do the job (or at least there wasn't when I bought it, about 4 or 6 years ago). I would like to use ECC, but I can't get there with reasonable effort. It also only has 3 usable GByte of memory.
Why is it so hard to find reasonably-priced boards that support ECC? The memory market seems to be driven by ever-higher clock frequencies, though I betcha you need very sophisticated benchmarks to notice any difference in speed with faster RAM.

The error rate of RAM is much higher than cosmic rays make you believe. And I think a lot of "inexplicable" errors and people whining about "low quality Microsoft software" are actually caused by (undetected) memory errors.
I realized this when I started playing World of Warcraft on my Windows 98SE desktops at the time it came out. I noticed the one machine with ECC would never crash while playing the game, but the other one did. Clean install made no difference. After some trial and error, I found several bad sticks of RAM that would "work" just fine until I played the game for a while.

Edit: All the bad sticks would passed every memtest I could find with flying colors. They had to be stressed for hours before they became flaky.
 
Why is it so hard to find reasonably-priced boards that support ECC? The memory market seems to be driven by ever-higher clock frequencies, though I betcha you need very sophisticated benchmarks to notice any difference in speed with faster RAM.


I realized this when I started playing World of Warcraft on my Windows 98SE desktops at the time it came out. I noticed the one machine with ECC would never crash while playing the game, but the other one did. Clean install made no difference. After some trial and error, I found several bad sticks of RAM that would "work" just fine until I played the game for a while.

Edit: All the bad sticks would passed every memtest I could find with flying colors. They had to be stressed for hours before they became flaky.
It's not? Most (not all) ASRock and ASUS AM4 boards using B550 or X570 supports ECC
 
It's not? Most (not all) ASRock and ASUS AM4 boards using B550 or X570 supports ECC
Those are very new AMD core logic chipsets. Also, I avoid Asrock motherboards, the only two that I've bought have flaky BIOSes. I love ASUS boards, but they tend to be expensive.
 
The last time I checked, AMD leave the ECC componentry in their Ryzen CPUs and chip sets, but they don't test it.

It was left to the individual motherboard manufacturers to do that.

I don't know if that policy has changed recently, but would be reluctant to believe it has without seeing a statement form AMD.

There's certainly clear evidence that Ryzen ECC could be problematic, on at least some motherboards.

I would not assume that a Ryzen motherboard would perform ECC correctly unless I saw a clear statement from the manufacturer (beyond the assertion that ECC DIMMs will fit into the memory slots).
 
Oh, and... most motherboard manufacturers publish a Qualified Vendors List (QVL) of memory modules that they have had tested with the motherboard.

If you can find ECC memory in the QVL, then you have some basis for confidence.
 
Thank you for that. I see that it's quite recent, and relates only to B550 and X570 chipsets, and only to some CPUs. But it is certainly good news.
 
I use an AMD Ryzen 3 3300X 4-Core Processor and AMD X570 motherboard with FreeBSD 12.2, without problems, on my ZFS server.

The old AM3 motherboard died suddenly, and had to be replaced quickly, last October. My ECC research on AMD X570 back then was inconclusive. I really hoped it would work, but was not sure.

The new motherboard (ASUS ROG STRIX X-570-F Gaming) has PCI 4.0, two M2 slots, eight SATA headers, external USB 3.2 (Gen 1 and Gen 2), and (I now know) ECC capability!

It's a little expensive, but well suited to ZFS server, since, with 2 x M2 slots, and 8 x SATA ports, I don't need to use an additional disk controller. Plus I have PCI 4.0 moving forward.

The USB 3.2 Gen 2 ports allow me to export ZFS snapshots to external USB disks. I use USB 3.1 Gen 2 to SATA adapters with free standing 3.5" SATA disks for this.

The re-build is still in progress. I used some non-ECC memory, and still use the very old LSI SAS2008 disk controller for the ZFS tank, and the root is still on a pair of 2.5" SSDs (separate ZFS mirror). I need to buy some M.2 SSDs for the root mirror and ECC DDR4, plus retire the LSI SAS2008 to complete the rebuild.
 
My only reservation is that I have not verified that FreeBSD will boot from an M.2 SSD. Also note that:
  • the AMD Ryzen 3 3300X is currently a pretty sweet spot in CPU price/performance, but had no GPU;
  • so you have to use one of those PCIe slots for a graphics card; and
  • not all PCIe slots on the ASUS ROG STRIX X-570-F Gaming are necessarily usable (read the fine print very carefully).
 
I use an AMD Ryzen 3 3300X 4-Core Processor and AMD X570 motherboard […]
Just a few weeks ago I didn't catch the PCIe 4 ride, and stayed on the "old" chipsets; The only part that would make use of it would be a NVMe disk, and all my researches told me: No one has real benefit of transfer rates above 2.5 GB/s in home scenarios (and even PCIe 3 goes up to nearly 4 GB/s). On the other side those chipsets need bigger heatsinks (and some even fans!). That means more power is used as a heating. Also a closer look on the available NVMe disks tells me that PCIe 4 has to be cooled, while my NVMe disk with PCIe 3 stays always below 40°C (at the moment and normally: 31°C). Comparings with the official power consumptions of these components didn't reflect those facts, but I think that belongs to the testings scenarios… Reality:

I'm using two NVMe disks: Corsair MP510 1,92 TB (PCIe 3, without heatsink), and Corsair MP600 2 TB (PCIe 4, with heatsink). The newer MP600 should use less power, but on the same chipset it is always ~8°C warmer than the MP510 - and that despite of its heatsink. And so far I cannot notice anything of the fact that the MP600 can make use of PCIe 3s furthest corner.

So I still wouldn't go with a X570 (or any other PCIe 4) mainboard. And when the word "home" is used I would go with the lowest heating, and fans as quiet as possiple (or even fanless where possible). Since ~10 years I don't use the fastest components I can get for my money, but the most usefull.
 
PCIe 4.0 support is troublesome although AMD is working on fixing it however I highly doubt you'll notice a difference in performance in the end.
 
Back
Top