Tried it for fun and found there's actually a caveat about that: CP437 codepoints 0x00 to 0x1f are context dependent, they can mean some control characters or actually printable symbols, and you obviously want the latter. But iconv(3) interprets them as control characters and I found no way to change that ....this can e.g. be done using iconv(3).
#include <stdlib.h>
#include <iconv.h>
#include <unistd.h>
static const uint16_t nonprint[] = {
0x0020,0x263A,0x263B,0x2665,0x2666,0x2663,0x2660,0x2022,
0x25D8,0x25CB,0x25D9,0x2642,0x2640,0x266A,0x266B,0x263C,
0x25BA,0x25C4,0x2195,0x203C,0x00B6,0x00A7,0x25AC,0x21AB,
0x2191,0x2193,0x2192,0x2190,0x221F,0x2194,0x25B2,0x25BC
};
static char ucs2[32*4];
static char cp437[224*2];
static char utf8[256*2*4];
int main(void)
{
char *up = ucs2;
for (int i = 0; i < 32;)
{
*up++ = nonprint[i] >> 8;
*up++ = nonprint[i];
*up++ = 0;
*up++ = ++i % 16 ? ' ' : '\n';
}
up = ucs2;
size_t us = sizeof ucs2;
char *o = utf8;
size_t os = sizeof utf8;
iconv_t cd = iconv_open("UTF8", "UCS-2");
iconv(cd, &up, &us, &o, &os);
iconv_close(cd);
char *p = cp437;
for (int i = 32; i < 256;)
{
*p++ = i;
*p++ = ++i % 16 ? ' ' : '\n';
}
p = cp437;
size_t ps = sizeof cp437;
cd = iconv_open("UTF8", "437");
iconv(cd, &p, &ps, &o, &os);
iconv_close(cd);
write(STDOUT_FILENO, utf8, sizeof utf8 - os);
}
No modern OS would ever use this old 8bit vendor-specific encoding for its text representation, so this is expected.(the sad part is, that not only on FreeBSD, but neither on Linux nor on Windows does this work)
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
static const uint16_t cp437low[] = {
'\n', 0x263A,0x263B,0x2665,0x2666,0x2663,0x2660,0x2022,
0x25D8,0x25CB,0x25D9,0x2642,0x2640,0x266A,0x266B,0x263C,
0x25BA,0x25C4,0x2195,0x203C,0x00B6,0x00A7,0x25AC,0x21AB,
0x2191,0x2193,0x2192,0x2190,0x221F,0x2194,0x25B2,0x25BC
};
static const uint16_t cp437high[] = {
0x2302,
0x00C7,0x00FC,0x00E9,0x00E2,0x00E4,0x00E0,0x00E5,0x00E7,
0x00EA,0x00EB,0x00E8,0x00EF,0x00EE,0x00EC,0x00C4,0x00C5,
0x00C9,0x00E6,0x00C6,0x00F4,0x00F6,0x00F2,0x00FB,0x00F9,
0x00FF,0x00D6,0x00DC,0x00A2,0x00A3,0x00A5,0x20A7,0x0192,
0x00E1,0x00ED,0x00F3,0x00FA,0x00F1,0x00D1,0x00AA,0x00BA,
0x00BF,0x2310,0x00AC,0x00BD,0x00BC,0x00A1,0x00AB,0x00BB,
0x2591,0x2592,0x2593,0x2502,0x2524,0x2561,0x2562,0x2556,
0x2555,0x2563,0x2551,0x2557,0x255D,0x255C,0x255B,0x2510,
0x2514,0x2534,0x252C,0x251C,0x2500,0x253C,0x255E,0x255F,
0x255A,0x2554,0x2569,0x2566,0x2560,0x2550,0x256C,0x2567,
0x2568,0x2564,0x2565,0x2559,0x2558,0x2552,0x2553,0x256B,
0x256A,0x2518,0x250C,0x2588,0x2584,0x258C,0x2590,0x2580,
0x03B1,0x00DF,0x0393,0x03C0,0x03A3,0x03C3,0x00B5,0x03C4,
0x03A6,0x0398,0x03A9,0x03B4,0x221E,0x03C6,0x03B5,0x2229,
0x2261,0x00B1,0x2265,0x2264,0x2320,0x2321,0x00F7,0x2248,
0x00B0,0x2219,0x00B7,0x221A,0x207F,0x00B2,0x25A0,0x00A0
};
static char inbuf[1024];
static char outbuf[4*sizeof inbuf];
static void toutf8(size_t *outsz, uint16_t c)
{
if (c < 0x80)
{
outbuf[(*outsz)++] = c;
return;
}
unsigned char lb = c & 0xff;
unsigned char hb = c >> 8;
if (c < 0x800)
{
outbuf[(*outsz)++] = 0xc0U | (hb << 2) | (lb >> 6);
outbuf[(*outsz)++] = 0x80U | (lb & 0x3fU);
}
else
{
outbuf[(*outsz)++] = 0xe0U | (hb >> 4);
outbuf[(*outsz)++] = 0x80U | (hb << 2) | (lb >> 6);
outbuf[(*outsz)++] = 0x80U | (lb & 0x3fU);
}
}
int main(int argc, char **argv)
{
int printlower = 0;
if (argc > 1 && !strcmp(argv[1], "-p")) printlower = 1;
size_t insz;
while ((insz = fread(inbuf, 1, sizeof inbuf, stdin)))
{
size_t outsz = 0;
for (size_t inpos = 0; inpos < insz; ++inpos)
{
unsigned char c = inbuf[inpos];
if (printlower && c < 0x20) toutf8(&outsz, cp437low[c]);
else if (c >= 0x7fU) toutf8(&outsz, cp437high[c-0x7fU]);
else outbuf[outsz++] = c;
}
size_t outpos = 0;
while (outpos < outsz)
{
size_t outwr = fwrite(outbuf + outpos, 1, outsz - outpos, stdout);
if (!outwr) exit(EXIT_FAILURE);
outpos += outwr;
}
}
return EXIT_SUCCESS;
}
-p
to display them, in which case it will interpret 0 (NUL or zero) as a newline, because this is the only codepoint that doesn't have any glyph associated in CP347.This can never work. Let me give you two simple examples: If you print the character for i=10, the console will end the current line and go to the next line. That's because char(10) is newline a.k.a. "\n". Similarly, if you print the character for i=27 (a.k.a. escape or char(0x1B), what happens depends on the next few characters: If they happen to be "[31m", then everything will be printed in red from here on.So, what do I have to do to set up the text console in a way that
for (int i=0; i<=255;i++) printf ("%c",i);
will display all of codepage 437 glyphs correctly?
Actually, this is just another property of the encoding used (there's a reason the C standard doesn't require a fixed encofing for the newline character, for example). US-ASCII put all control characters in the range 0x00-0x1f. Nowadays, every 8bit encoding (or, 8bit representation, like UTF-8) is fully ASCII compatible, therefore the control characters are in this range.This can never work. Let me give you two simple examples: If you print the character for i=10, the console will end the current line and go to the next line. That's because char(10) is newline a.k.a. "\n". Similarly, if you print the character for i=27 (a.k.a. escape or char(0x1B), what happens depends on the next few characters: If they happen to be "[31m", then everything will be printed in red from here on.
True in general, but there are certain areas where it survives. Emulation and preservation of old software would be one of them ... graphics characters were used a lot for designing user interfaces. Also other "artwork" ("(extended) ASCII Art", "PETSCII Art", ...) used one of these historic encodings, and while it can be displayed almost everywhere by translating it to Unicode, the original files will keep their original encoding ... back in these days, the set of available glyphs on a machine made for their "character", there are iconic PETSCII works on the C64, EXTASCII works on the PC, and so on, and looking at them, you'll immediately recognize the machine they were made for. An example still in use sometimes is theAnd it seems that CP437 is not an encoding that people care about.
NFO
file that comes with PC crack releases ... to this day, CP437 is one of the most used encodings in these files.I see your point, but with Unicode around, what's the point of using any 8bit encoding (offering just a very limited set of characters) on a modern machine? Many still support the ISO-8859-* family of 8bit encodings (at least they are standardized), but CP437 is also proprietary...Point number 2: it wasnt such a bad encoding that it deserves obliteration, and to be made nearly impossible to use it on my own computer if I wish so.
Elaborating on this a bit as well:ralphbsz :
Thanks for clarifying. Yes its true, the terminal is not a graphics ROM rendering machine, it is much more as you explained it well.
but...
I'd like to think about it as if it were. Among the multiple functionalities it carries out, it ought to be also the most basic thing: a text rendering machine that uses the built in hardware capabilities, or at least it lets you configure it to do so. And it comes short of this...
CHROUT
KERNAL function of a Commodore OS does some translation, even statefully considering whether "reverse mode" was currently turned on.Yep. Well, whether there's more "joy" in it outside of a work context, I don't know ... I can see that "feeling", but on the other hand, I also see a lot of people at work who find this joy in coming up with elegant and simple solutions there ... anyways, simplicity is also an important engineering objective.For me the greatest thing in programming and computers besides the practical side of it - that it gives a solution to a problem - is that it gives me the joy of creation, the joy of seeing some idea of mine being materialized (aka. I wrote a program, and I see it working). But not only that, when you write a program not because its your job and you have to, but out of joy, then you appreciate more how elegant, how beautiful, how well done is the code itself.
And beauty lies in simplicity.
I think you have an error in your chain of thought here, by not considering the context. The IBM PC was designed as a typical microcomputer of its time, very much like the "home computers", just with a focus on business use. Still it was meant to be used "as is", as a single system. It featured an "open system architecture", yes, still the available components were pretty tightly coupled, and for some standalone microcomputer, this wasn't an issue at all. With that setting, deciding for one(!) encoding and shipping a "hardware font" with glyphs organized by that encoding certainly was the simplest solution and a good overall match.With all that said, in this setting it makes no sense to think "it is pointless to use 8bit 1:1 text representation" or "I do not adhere to this standard or that".
The "sanest" place to play with that would be a machine running the original OS (PC-DOS / MS-DOS) of the original IBM PC. Or nowadays of course, a suitable emulator, like e.g. emulators/dosbox or emulators/dosbox-x.My toy is FreeBSD. I sort of like CP437, I am used to it, it is simple to use it, program it, it has all the symbols I need for my play.
I don't see any really useful purpose, but, it certainly sounds like a lot of fun! I guess you could even do it in a pretty portable way by implementing a "virtual terminal" on top of another one "speaking" UTF-8 ?the little devil says inside me, I still want a terminal that displays cp437 for me. Maybe I'll write it some day
The reason I asked was to prevent an XY problem. Which happens quite frequently on support forums.Why do I need this?
Short answer: because.
They're still there, just typically not as dedicated hardware any more, but a piece of software instead.But as you you said, those hardware terminals are history by now, most of us know of them only from history books.
Actually, you do assume wrong hereWhat it remained is individual computers, laptops, mainframes, interconnected by networks and the internet. every one of them having some sort of graphic device, the graphic card. I assume I am not wrong if I say that all graphic cards today share as a common standard the textmode, which still uses 8 bit representation 16 colors, and as heritage all use as the default cp347.
this couldn't be farther from truth.All I say is that I find a bit antagonistic the all modern unix like operating systems are still embracing this old concept of
terminal capabilities and what not, where those devices are long gone. Instead what we have is a fairly standard text device
(aka the textmode) which is the same on every computer. The terminal capabilities are always the same, you can safely assume that a program written for text mode will run and look alike on any system, even on raspberry pies.
MGA/CGA/EGA/VGA is "grampa" here. Not in terms of actual age, but in terms of being outlived by changing requirements (see, distributed systems and interoperability). So, answering the rest of your post would be just repetition on my part, therefore I'll skip it ?I do not even say that its a bad thing, let be historical devices be supported, why anger grandpa?
So it is historical only kept alive by softwareThey're still there, just typically not as dedicated hardware any more, but a piece of software instead.
Can we agree on the fact, that the wast majority (if not all of) the computers today, that do support hardware textmode, also support 8 bit 1:1 cp437 as default?This is only true for "IBM PC compatibles", and only for historic reasons. There are different architectures around, most notably most tablets and mobile phones based on some embedded ARM platforms
Shall I understand this, that you think that a program assuming plainVGA textmode capabilities on modern hardware that supports hw textmode is mistaken?this couldn't be farther from truth.
"Hardware text mode" is a historic leftover by itself, so, this might or might not be true, but is just irrelevant for modern systems.Can we agree on the fact, that the wast majority (if not all of) the computers today, that do support hardware textmode, also support 8 bit 1:1 cp437 as default?
Yes.Shall I understand this, that you think that a program assuming plainVGA textmode capabilities on modern hardware that supports hw textmode is mistaken?
You're mixing up the concept of a text terminal (or even console) with the hardware offering a dedicated "text mode". The latter is dead for good, and the unix-style terminals survived that very well because they were designed in a flexible way (abstracting from the concrete hardware right from the beginning). Back in the days when hardware text-modes were a common thing, there were a lot of machines (micro-computers and home-computers) hardwiring their "terminal" to that hardware and CP437 was just the encoding and character set of one of those machine, so there was no way this could ever be interoperable.VGA may be the grandpa, than what the piece of software we call terminal and what it is supporting is great great grandpa, and it is far more outlived, even by VGA