ECC or non-ECC

Terri_Kennedy · Dec 26, 2017

Snurg said:
Memory safety always had a high esteem in "serious" computing. Before the advent of the IBM PC microcomputing were factually toys for enthusiasts. Maybe except the S-100 bus based CP/M and MP/M systems, which ran the killer app "WordStar" and were much used professionally even though belittled by the "real computing" mainframe world. And all these 8-bit systems had no memory parity checking.

Not true. I was designing and my company was selling MP/M systems with 512KB of parity-protected 70ns static RAM. And if you think it is still a toy system, it ran 8 copies of WordStar simultaneously when the average S-100 system had troubles running one. Clarke's 2010 was typeset (original US hardcover) after being input on one of my 8-terminal MP/M systems by Fisher Composition's Arkville, NY office.

Enough for now - I have to run to dinner...

vermaden · Dec 27, 2017

If Your platform supports ECC, use it, if not, then You will not be able to use it ... as that simple.

Snurg · Dec 27, 2017

Terry_Kennedy said:
Not true. I was designing and my company was selling MP/M systems with 512KB of parity-protected 70ns static RAM. And if you think it is still a toy system, it ran 8 copies of WordStar simultaneously when the average S-100 system had troubles running one. Clarke's 2010 was typeset (original US hardcover) after being input on one of my 8-terminal MP/M systems by Fisher Composition's Arkville, NY office.

I have serious doubt that this is true.

Clarkes's 2010 was released 1982.
At that time 6 MHz Z80 was the top end processor for CP/M, but most computers ran at 4 MHz. (Cromemco introduced the first 10 MHz 16-bit MP/M 68k stuff not before 1982)

70 ns SRAMs (Cache RAMs) were introduced around 1982, and were very, very expensive.
They saw widespread use not before the onset of the 386 boards in 1987.
I heavily doubt it makes sense to use such fast and expensive SRAMs as main memory for 2, 4 or at most 6 MHz processors.

Imagine the sheer size of installed memory boards with almost 300 16kbit SRAM memory chips in large DIL packages.
1980 the memory price was around $10 per kilobyte DRAM and about $40 per kilobyte SRAM.
Using SRAM instead of DRAM would mean back then, that the extra cost for a 512k system would be about $20000. (four times the number of more expensive and larger chips, at least 5-6 sq. ft more PCB, more casing, more power supply,...)
This is very much when you consider that a typical office system (computer, terminal, daisywheel printer, OS, software) cost about $10-15k that time.
In short, the cost issue is the reason why practically nobody uses SRAM as main memory for large installations.
And Cromemco, a technology leader back then, afaik never produced parity protected memory boards.

As I cannot remember having seen any memory board for 8-bit computers that has parity in 40 years, I would appreciate very much if you can show me any proof, for example pics, schematics, or the like.

As the 8-bit time went over 35 years ago, long ago, I thus guess you just were confusing some things.
Otherwise I would kindly ask you for more technical detail, as I am sure you can tell things from that time that are interesting to a lot of the forum's readers.

P.S. Regarding Wordstar:
It was predestined for multi-user application, as it is an application that most of the time idles.
So it was well possible to run multiple Wordstar terminals on a 2 MHz Z80 system.
Of course it would get laggy if all 8 users would print at the same time, for example.
But people knew this and adjusted their workflows accordingly.

ralphbsz · Dec 28, 2017

Fast SRAM chips were available much earlier. We had them on CAMAC-to-Unibus converters called "MBD2" in the early 80s; those things were able to run at 40MHz (they were sold at 20 or 25MHz, but we upgraded them by optimizing the circuits, which were completely wire-wrapped), so that must have been 25ns memories (or somewhat faster). Now I'm not sure of the density of those chips, but it wasn't very high.

I can not remember whether 8-bit Z80 computers with parity existed or not; I only started using 8-bit machines in the era of 4K DRAM memory chips (4kBit each, so 64kByte of memory required 128 chips, which is why many computers shipped with 16 or 32kByte). The 16K chips were easier to live with, and the first 64K chips were a godsend; for the first time we could put over 64KByte onto a single Eurocard.

P.S.: The first paragraph above talks about the speed of SRAM; the second paragraph about density of DRAM. Sorry if that causes confusion.

Snurg · Dec 28, 2017

ralphbsz
Yes, overclocking was way easier back then

If 50% higher clock didn't work, it was usually only the matter of finding the murky chip that was too near at the minimum spec. (Ok I am exaggerating a bit

but it was totally different than nowadays...)

Iirc 16k DRAM were introduced 1976 or 77, but the 4k DRAMs were sold in computers until about 1980. Their speed was as low as down to 350ns. The 64k DRAM were introduced around 1980. These were substantially faster, too, up to 150ns. ~1984 then 256kbit, down to ~100 ~88 then 1Mbit, down to 60ns. (tCAS->Data ready)

I know because memory hardware was one of my special interests back then.

recluce · Dec 28, 2017

For me, it is ECC where ever possible. Even my new laptop will come with ECC memory, I would never consider building/buying/using a server without it. Also a good argument for Ryzen CPUs, which mostly support ECC (I believe only Ryzen 3 does not).

Terri_Kennedy · Jan 5, 2018

Snurg said:
I have serious doubt that this is true.

I had been working for Expertype in NYC doing support on the Computer Composition International phototypesetter front-end systems (Data General Novas). Expertype and CCI came to some agreement where Expertype would market and support CCI systems in the NY region. One of the first customers was Fisher Composition, just up Park Ave from Expertype. After a round of layoffs at Expertype, I was approached by Steve Fisher to design a "front-end for the front-end", as additional terminal licenses on the CCI hardware were incredibly expensive.

The first system I did for Fisher was an Intertec Superbrain which replaced a primitive entry system with a 1-line display that punched on paper tape (which was then fed to the tape reader on the CCI Nova). The Superbrain allowed local editing and more than a line of text, and the output from the Superbrain went directly into the CCI Nova, initially by pretending to be a fast paper tape reader (since there was no support for paper tape on the Superbrain).

Fisher wanted something much more customizable, so I proposed a multi-user (MP/M) system with 8 Ann Arbor Ambassador terminals. The system communicated with the terminals at 38.4Kbaud - 4 times the "industry standard" of 9600. And Rob Barnaby implemented some esoteric WordStar features for us - in addition to relinquishing the CPU to the MP/M scheduler instead of spinning in an idle loop, a lot of the "optional performance improvement feature" stuff like line insert with control of scroll direction were added to support the very advanced features of the Ambassador terminals.

The chassis was a rackmount 22-slot TEI chassis with a constant current power supply, with a custom front cover which replaced the power on/off toggle and reset button with an ACE cylinder lock. We upgraded the chassis with Torin fans. The CPU and some of the supporting cards were from Ithaca Audio, with a custom CPU daughterboard of my own design to provide a line-time-clock and (somewhat) High Precision Event Timer for MP/M. I went with static memory because of its great compatibility - an Altair dynamic memory board wouldn't work in an IMSAI system and vice versa - only front panels were less compatible with foreign systems. I put the bank select and parity logic on the card edge and left side areas, leaving the rest of the card for the memory chips. Parity errors lit up LEDs on the memory card as well as sending an interrupt via my LTC / HPET CPU daughterboard. The system would report the card and bank of the error to the maintenance terminal (A Volker Craig VC404 - that's the TTL one, not the VC4404 ("chat") which was microprocessor-controlled)

Everything that could possibly be interrupt-driven was. I re-used a lot of that technology on later single-user systems. You might be interested in https://www.glaver.org/cpmstuff/pc8cpm.txt - that's the manual for the CP/M we shipped for single-user boxes. Note the TOD support and background, interrupt-driven print spooler (which used threading support we added to CP/M, since it had to be able to do filesystem reads independently of any running user program.

Clarkes's 2010 was released 1982.

This was not the first book input through my systems - "The Islamic Bomb" by Weissman & Krosney (1981) was the first. The second was "Licence Renewed" by Gardner (the first authorized James Bond continuation novel).

If you still don't believe me, go contact Dave Burstein on LinkedIn - https://www.linkedin.com/in/daveburstein - he's the actual source of the "handles 8 WordStar users faster than most computers handle 1" quote as well as being the mover who delivered the system in question to Fisher Composition in Arkville.

tingo · Jan 7, 2018

I enjoyed this post - thanks!

Snurg · Jan 8, 2018

Terry_Kennedy
Thank you very much for your very enjoyable post, too!
It makes me reappear long ago memories vividly.

The Superbrain

An all-in-one computer, ready to be placed onto the desktop! No more separate units, like CPU, drives, terminal.
But if I remember correctly, when I saw it back then, I disliked that its display's dot matrix was very coarse. I mean, you could clearly see every char being composed of seven or so lines, like the IBM logo split into horizontal lines.

And as the whole stuff was way more expensive than today, and way less standardized, there was much custom manufacturing, as ralphbsz' also very interesting report confirms.
If I understand correctly, you basically designed a custom system for Fisher, designed to be well-adapted for publishing, which they sold as a complete hw/sw package to publishing customers.
But it was no mainstream system like Cromemco, NorthStar, Altos, etc, whose components were advertised in BYTE etc and of which I do not remember having seen more than 8 chips per byte.

The reliability issues you mentioned regarding the fickle DRAM timing and its potential to cause instability and mangle data are a good reason to use SRAM, especially when high-value work involved.
To be honest, I had been curious which chips you used (I guess 6167 or the like). I guess then you will have put 64k (or at most 128 if tightly packed) onto a memory board and have 8 or 4 memory cards in that 22-slot monster rack, maybe filling only every second slot to maintain a good ventilation?

Whatever, a bit sad is that there seem to be no photos etc. Such would illustratively document what design quality is possible when there is no need to sell individual components as cheap as possible.
Thus, despite the fact that I believe you, I can only say that I never saw parity protected memory systems on 8 bit. I can say I know of anecdotal source that apparently some existed in some high-end small-scale manufactured systems, but cannot prove it.

Anyway, your story and the ECC topic points to the crucial questions:
How cheap is sensible?
How much quality and security can we afford, do we want to afford to sacrifice?

Highly actual, in the view of Meltdown and Spectre, as these things reportedly originated in the attempt to cut reliabilty of safety in order to lower manufacturing costs.
In the end, the extra cost per chip would have been minor, considering the whole system cost.

But, isn't it a crazy idea if, for example, a nuclear plant blows up causing trillions of damages, only because safety logic was ommitted in a processor which would have increased the whole system cost by, say, $10 at max?

ralphbsz · Jan 8, 2018

And that's why nuclear power plants (or life safety applications) don't rely on a single computer, much less on a consumer-grade one, for critical functions.

Snurg · Jan 8, 2018

I remember the Apollo missions, they had usually three computers, and if one deviated from the two others, it was deactivated and repaired. This happened quite a few times.

By the way, regarding nuclear incidents, the main causes are not failing computers, but human error, often in conjunction with usage of safety override mechanisms (intentional and unintentional ones).
Chernobyl is an example of this. The old shift leader pressed the very young reactor operator to override the emergency shutdown program of the Scala computer, and what happened then is well-known.

The counterfeit stuff problem also has already led to some incidents in the nuclear industry.
And I think it's hard to sell counterfeit modules with bad memories if there is parity checking.
But even ECC can be counterfeited using parity generator chips.

Datapanic · Jan 8, 2018

It's as simple as this: If you are running FreeBSD with ZFS, then you WILL use ECC memory or else regret the BIT ROT that comes up later.

tingo · Jan 8, 2018

Datapanic said:
It's as simple as this: If you are running FreeBSD with ZFS, then you WILL use ECC memory or else regret the BIT ROT that comes up later.

That is your opinion, and you are entitled to have one.

However, it is not necessarily a fact, nor "a truth" or something like that. Me (and probably others) have many years of experience that says that zfs works nicely and troublefree even without ECC memory. There is nothing "special" inside zfs that makes it require / depend on ECC memory any more than any other filesystem. One of the people behind zfs has stated that in writing (you can google it if you are interested.)
ECC memory is nice, but it is certainly not required.

ralphbsz · Jan 8, 2018

I think what Datapanic is trying to say (*) is: If you use ZFS and configure it to have both checksums and RAID redundancy, the biggest source of bit rot in a computer (which is the disk drives) is eliminated. Thereby main memory bit rot becomes the dominant source of corruption, which ECC memory cures. Now, whether your particular usage needs to worry about bit rot at that level or not, and whether using ECC is a good cost/benefit tradeoff for your particular usage pattern is still a different question. For certain uses, not having ECC, and even not having checksums in your file system is perfectly acceptable.

(* Footnote: I'm deliberately putting things in Datapanic's mouth here, to try to get to a synthesis of opinions.)

poorandunlucky · Jan 19, 2018

Datapanic said:
It's as simple as this: If you are running FreeBSD with ZFS, then you WILL use ECC memory or else regret the BIT ROT that comes up later.

Would you mind elaborating a bit?

SlySven · Jan 19, 2018

Might I make the observation that for some other Operating Systems, the robustness of the core memory is not going to be the most significant cause of data/bit rot or other catastrophic failure?

LVLouisCyphre · Feb 25, 2020

Use ECC whenever possible. I have a pair of HP Microserver G7 N54Ls. I made sure they had ECC for ZFS. Anything that's a memory hog should have ECC. There's a few horror stories of people losing their RAIDZ2 on the FreeNAS forum by simply not using ECC. Fortunately I found out that the HP MS G7s use the same memory as a Lenovo Thinkserver TS430. Just buy a pair of 8 GB ECC UDIMMs for one of those and you're set.

LVLouisCyphre · Feb 25, 2020

poorandunlucky said:
Would you mind elaborating a bit?

It's well documented on the FreeNAS forum. If you don't use ECC with ZFS which is a memory hog, you are putting your data in a noose awaiting for the chair to be kicked out from under it.

shkhln · Feb 25, 2020

LVLouisCyphre said:
It's well documented on the FreeNAS forum. If you don't use ECC with ZFS which is a memory hog, you are putting your data in a noose awaiting for the chair to be kicked out from under it.

Anything the FreeNAS forum has to say on ECC is complete bullshit. (It's the birthplace of non-ECC-mem-will-break-your-entire-ZFS-pool FUD.)

PMc · Feb 25, 2020

shkhln said:
Anything the FreeNAS forum has to say on ECC is complete bullshit. (It's the birthplace of non-ECC-mem-will-break-your-entire-ZFS-pool FUD.)

Thanks. I was wondering how this came up. Probably too many people running a computer who shoudn't.

shkhln · Feb 25, 2020

Well, there was (is?) a certain moderator named Cyberjock (cyberj0ck) with a very strong opinion on ZFS reliability in a non-ECC scenario. He's mildly infamous for that reason.

6502 · Feb 25, 2020

Unfortunately, it is not easy to find ECC motherboard. In addition in some cases it may pretend for ECC support but not support it. How to test? There is not easy way. Some manufacturer has to produce special RAM modules with switch "ECC failure mode" - to verify that ECC will catch it.

Vadim_Mkk · Feb 25, 2020

RAM
It’s not surprising that Sun’s documentation said you needed ECC RAM to use ZFS well. Sun sold high-end servers. But according to Matt Ahrens, “ZFS on a system without ECC is no more dangerous than any other filesystem on a system with ECC.” ZFS’ built-in error correction compensates for most but not all memory-induced errors. The generic arguments in favor of ECC RAM are still valid, of course. A machine with non-ECC memory can suffer memory corruption, and it’s possible for some of those errors to get to disk. That would happen regardless of the filesystem you’re using, however, and ZFS checksums offer a hope of identifying the problem. If you’re running a high-availability service, you want ECC for all of the usual reasons. But your ZFS laptop or home movie server will function just fine with normal RAM.
© FreeBSD Mastery: Storage Essentials by Michael W Lucas & Allan Jude

ralphbsz · Feb 25, 2020

The answer is not black and white, it is complicated. What is the biggest danger to data? To data becoming inaccessible temporarily or permanently?

We used to think it was disk failure. That was actually somewhat true. We fixed that with RAID, and many other technologies. We fixed many other things too, for example accidental deletion; one attempt is Microsoft Windows asking "are you sure" if you type "del *.*": that was a good starting point. Another smaller cause of data loss is corruption of memory content. There are many ways to combat that. One is to use ECC, which protects against single bit errors (alpha particles, cosmic ray air showers). Another is to use checksums on in-memory data structures. Yet another is software design that prevents things like wild pointers and accidental memory overwrites.

The answer to "does ZFS need ECC" is multi-faceted. Compared to what? ZFS on a non-ECC system is no less safe than UFS or ext4 on a non-ECC system. It is actually considerably safer (because of checksums, both at rest and in flight). On the other hand, with ZFS in normal installations having solved the disk reliability problem (with RAID and on-disk checksums), the next biggest hardware source of data problems is memory, so it is important to fix.

In reality, the largest danger to data is humans. Accidental deletion and mis-administration is by far worse than anything ECC can fix. Instead of arguing over whether ECC is important or not, we should all site down with the man pages for ZFS commands and learn how it works.

6502 · Feb 25, 2020

My first desktop with 386sx16 had ECC. I think at that time all RAM modules (SIMM) had ECC.

ECC or non-ECC

ECC or non-ECC

ECC

Non-ECC

What?

Depends...

Terri_Kennedy

vermaden

Snurg

ralphbsz

Snurg

recluce

Terri_Kennedy

tingo

Snurg

ralphbsz

Snurg

Datapanic

tingo

ralphbsz

poorandunlucky

SlySven

LVLouisCyphre

LVLouisCyphre

shkhln

PMc

shkhln

6502

Vadim_Mkk

ralphbsz

6502