CP437 display on console

I see we cant come on the same page, as I already said some posts before, maybe I'll just do that what you also suggested above...
 
Well, my tool just reached a first milestone, I converted some old MS-DOS/ANSI.SYS file and the result looks correct to me ? So far it assumes hardcoded values: width 80cols, input encoding cp-437, input EOL ms-dos, output encoding utf-8, output EOL unix...

Give it a try if you want: https://github.com/Zirias/dos2ansi .... try something like dos2ansi <dosfile> | less -r to view the result paged :cool:

Edit: new tool in action:
1705839298522.png


edit2: screenshot uses konsole (KDE's terminal emulator) because good old xterm seems to get a few colors wrong ... weird ... it DOES work on vt(4) as well, but can't screenshot that obviously ?
 
Can we agree on the fact, that the wast majority (if not all of) the computers today, that do support hardware textmode, also support 8 bit 1:1 cp437 as default?
Unfortunately, no for most recent hardwares.
UEFI firmware does not support "text-mode" by default and requires CSM as an optional part of firmware to support it. And I heard some motherboards already stopped providing CSM at all, if I recall correcty.

Just a FYI, ancient NEC PC-9801 series of computers, which was already beaten by IBM-PC architecture, had tooooooooooo slow a BIOS for text rendering. So almost all commerscial applications on it that uses full-screen doesn't use ansi.sys and/or BIOS to render texts and rendering by itself to hardware text VRAM directly.Imagine 600 to 1200 bps serial console. Using BIOS or ansi.sys for text renderring on PC-9801 series was so slow like that.
ansi.sys called BIOS for text rendering, so it was toooooooooo slow, too.
 
Unfortunately, no for most recent hardwares.
I wouldn't call it "unfortunately" though ... it's just not the right hardware to boot and run MS-DOS any more, kind of about time ... ?

UEFI firmware does not support "text-mode" by default and requires CSM as an optional part of firmware to support it. And I heard some motherboards already stopped providing CSM at all, if I recall correcty.
Still I think it WOULD be possible to access the "text-mode" with lots of trickeries ... you'll find (assembler) code around to switch from protected mode to real mode (which is, well, very very involved already) and from there, it should be possible to directly access VGA hardware, bypassing the (here nonexistent) BIOS. Not that it would make any sense to attempt doing all that of course ?
 
I looked at vgl, it seems to be a simple graphics library (which does not use X).
VGL was broken when FreeBSD switched console infrastructure (sc -> vt).

I am in the process of remapping the vgl API ontop of libdrm (the same system underneath most Xorg / Wayland drivers).

But frankly, using libdrm directly is quite easy. I have a port of a gameboy emulator to it here. In particular you can see the screen driver here.

It was originally for OpenBSD but a few minor changes (outside the keyboard) can get it working on FreeBSD if you wanted to go that route.
 
I'll now add something just for entertainment ....

well, how to introduce it ... I was curious, so tried to find info about how much sense it would make for a converter tool (this thing I just started coding) to even attempt to map the "control character range" to those printable characters CP-437 has there as well, and found answers:
  • The short one: No sense at all. Reason: these characters were only meant to be used by applications directly writing to the memory-mapped "text screen", while any "text output" method of the IBM PC was meant to interpret/execute control characters there.
  • The slightly longer one: Well, a lot of sense for some of these characters, no sense at all for some others. Reason: not all control characters defined in ASCII had some actual meaning on the IBM PC. For those that weren't implemented, "text output" indeed resulted in the printable character shown ?
Add to that the fact that Windows to this day still supports CP-437 (really? REALLY???) and the Windows terminal ("conhost", whatever it's called) developed into a huge pile of somewhat smelly cruft (ok, not so surprised here), and the stage is set for:

Microsoft Terminal - Issue #166

I think it's really a fun read! The poor guys assigned the task to revamp the mess of conhost are confronted with a report that their new implementation "breaks" the obscure display of SOME "printing characters" from the control character range of CP-437 ?, and they really get into detail with explanations :D

You'll probably have most fun if you know C and especially the win32 C API, also something about character encodings, including Unicode and its representations, and Microsoft's early adoption of Unicode as UCS2 in 16bit wide characters, which proved "too little" later on, so they were forced to use UTF-16 instead and are still struggling with UTF-8 ....

But even without that, you might have some fun learning about some of the mess ... including stuff that's at the same time politically incorrect AND technical nonsense ?
 
I've forgot to mention.

sc(4) driver on FreeBSD, which uses text-mode, doesn't work if booted via UEFI. It needs legacy BIOS boot to work.
Only vt(4) driver can be used via UEFI boot. It uses graphics mode and is default now both for UEFI boot and legacy boot.

And possibly, in the future, video chits/cards vendors could drop support for text mode both on video chips and/or BIOSes when almost all brand-new motherboards drops CSM. Do you remember VL-bus and EISA? Does recent video chips/cards support them? Yes, not yet, but dropping text-mode could happen.
 
Sometimes I really do feel sorry for the Windows Terminal developers.
Years of weirdness and compat hacks that need to be catered for or justified for breakage.

They dropped the ball for the couple of decades for erroneously thinking that the CLI wasn't the future.
 
They dropped the ball for the couple of decades for erroneously thinking that the CLI wasn't the future.
Well, it certainly isn't "the future" (still depends on the task to be done), but it seems they expected it to become obsolete, which won't happen any time soon ?.

But I'd say that's just ONE of the reasons for the mess they have to clean up now. Others include the backwards compatibility fetishism (like here, why exactly do you need CP437 support AT ALL? I mean, it DID make some sense as long as you could run most ancient DOS software in vm86 mode, but that's also dead now for quite a while). And then quite some bad decisions taken earlier as well (like here adding Unicode support with the now-dead UCS-2 representation) ...

Yep, maybe we should feel sorry for the "Windows Terminal" team ... but OTOH, if their sense of humor touches the "sarcastic" area, they might even have some good fun :cool:
 
Someone remember this speficic kind of ASCII-Art misrendering?
1706033852536.png

Then I guess you used some MS-DOS PC in western europe :cool:.

Yep, "codepage" selection is now implemented. But the real stupid work still needs to be done, add the translation tables for ALL these MS-DOS codepages (so far, we only have CP-437, CP-850 and CP-858) ?

It sure will be tedious. Starting my search for references, I found e.g. https://learn.microsoft.com/en-us/previous-versions/cc195061(v=msdn.10)

Yes, guess that's a scan of some old paper someone found in MS archives ?
 
Deep down this rabbit hole now, I added a "test mode" using some 8bit "encoding table" as the input instead of a real file, which allowed me to find and fix a few bugs ....

.... and also do some final tuning to the mapping to better match the appearance on original VGA, e.g. I just added some code to replace the normal pipe bar (|) with a broken bar (¦) by default, it just looked like this on most machines ?

Here's the result for CP-437 and CP-858 using Microsoft's "Consolas" font:

1706292682929.png
 
I see I inspired great work
At least partially :cool: -- as you can see from the code I previously posted here, "just" converting CP-437 to UTF-8 is pretty simple and if it wasn't for the symbols "hidden" in the control-character area, iconv(3) would already be enough.

But then, some other guy I know saw what I'm playing with and told me "hey, this could help me with my old collection of DOS ANSI-Art" ? -- and having a look at these files, I found lots of other stuff that needed special handling, like e.g. relying on a fixed terminal width of 80 columns (assuming an implicit line wrap there), like even using embedded "cursor movement" sequences ? ... plus a few other things. So, a somewhat more complex conversion tool was needed there. And once doing that, there's suddenly this engineering motivation to cover as much as possible, which includes other MS-DOS "codepages", coming with other interesting issues, like e.g. arabic letters having many forms, all with their own Unicode codepoint ....

I just released v0.4 handling a pretty large set of codepages: https://github.com/Zirias/dos2ansi/releases/tag/v0.4

Currently refactoring a lot, I decided to separate different conversion steps in some pluggable "stream-writer" scheme, allowing to flexibly combine output formats (UTF-8, UTF-16, UTF-16LE) with methods to colorize (simple ANSI, terminfo-based sequences, even native Windows Console, and maybe RGB-colors for enforcing exact CGA-like palette....)
 
dos2ansi v0.8 is approaching a "feature complete" state. ? Some polish here and there and it's probably ready for the 1.0 ....

It now supports terminfo/curses output (compile-time optional), legacy Windows Console output (for old Windows versions), an "exact colors" mode (for both generic ANSI and terminfo) using colors from the standard 256-colors ANSI palette to exactly match CGA/VGA, with the option to have this dirty "dark yellow" (instead of the adjusted brown), reading and parsing SAUCE metadata and if available, automatically configure the display width, codepage and whether blink is bright background from that, a "test mode" just rendering a codepage table plus color palette instead of real input, etc pp...

Anyone wants it as a port? ?

1707244715958.png
 
I'm gonna wait to see how it pans out first, in-case you randomly add auto-paging to the project again ;)
I won't change my mind that piping long(!) help/manual texts to a pager if output is a tty and $PAGER is set makes a lot of sense. But it's also quite some code to implement it correctly, which certainly isn't worth the hassle in some standalone utility ?

I joke. I think the work is pretty cool and so long as people know it exists in the ports collection, it could see some usage.
Just refactored my Stream implementation, which can now use 3 different backends for FileStreams (stdio, POSIX and win32) nicely separated and abstracted. This might have developed into something to pull out to a library later ?.

Well, some more polish (separate help and version output) and testing/fixing (e.g. the command line parser does a surprising thing, have to analyze that) left to do.
 
I consider this tool "finished" now with v1.1: https://github.com/Zirias/dos2ansi/releases/tag/v1.1 ... mainly because there are no "missing" features left I can think about :cool:

It's now really far more than "display CP437" (although it can of course do that), and it was an interesting journey for sure.

On the bright side, because on the way, I improved my personal "toolbox" again. I now have a flexible "Stream" abstraction (with stackable readers and writers) in C. This seemed the sanest way to decouple different aspects of the tool, like encoding color information, translating 16bit BMP codepoints to UTF8/UTF16/UTF16LE, buffering, while not having to care whether this happens in memory, on stdin/stdout or on some actual file. I also have a new feature in my own little "pet buildsystem" reyling on nothing but GNU make, it can finally substitute "tokens" in files (without calling sed).

On the "interesing" (like in WTF) side, well ... I even rerolled the latest release for a tiny change in the build system for some misbehavior that only manifested when trying to build a FreeBSD port, just refer to the commit fixing it for "amusement". Also interesting how FreeBSD's stdio behaves in absence of a buffer ... and another thing hitting me by surprise was the alway WTF-inducing win32 API, FlushFileBuffers() fails when called on a HANDLE that refers to a "Console" (because this doesn't have buffers ... ah, sure!). Of course I should have read the whole docs first to avoid that trap. Not that it would make any sense though, if I have to know in my code whether I'm dealing with a Console or some other I/O HANDLE, the whole point of abstraction goes down the drain. ?

Well, writing all this just for interested fellow coders ?

If you want to test the tool, here's a port now you could apply with git am if you want:
 
Ok, there was additional stuff to fix, so there is a v1.2 (updated port for now here: https://github.com/Zirias/zfbsd-ports/tree/local/textproc/dos2ansi)

And then, I found fonts. Nice bitmap(!) fonts, so you can get pixel-perfect display: https://github.com/farsil/ibmfonts

Unfortunately, I had to patch them, adding (fake/empty) glyphs for some vt100 drawing chars ... otherwise xterm finds some of them missing and renders ALL of them itself, which is pretty fatal for original looks.

Here's a port for these fonts: https://github.com/Zirias/zfbsd-ports/tree/local/x11-fonts/ibmfonts -- I guess it makes sense to at least add this to the official ports tree, AFAIK there are no IBM-PC fonts available so far? ;-)

I certainly like the result:

1707926587420.png
 
So, the final(?) result of this "just for fun let's display some CP-437 stuff in a FreeBSD terminal" exercise is a software package with a feature-rich converter and an "ANSI-art viewer" shell script using that (with xterm, less, and fonts).

I added a port now that I'm happy with the quality: converters/dos2ansi. The default "x11" flavor depends on xterm and two font packages to work out of the box and display most ANSIart in a nice way. The "nox11" flavor has no dependencies and by default excludes the script.

You can read the manpages online: https://zirias.github.io/dos2ansi/
 
Back
Top