C Ambiguous ESC key

ogogon · May 21, 2025

Colleagues, please help me solve one problem.

Many years ago, I wrote my own command line editor because the existing ones lacked the functionality I needed.
The editor requests the necessary sequences of keyboard codes and screen control sequences from terminfo. Then it builds a sorted list from the keyboard codes.
The keyboard codes received during the work are collected in a certain buffer and constantly checked against the sorted list. After a match, it interprets the received key and performs the necessary actions.

Now I needed to add a handler for the ESC key and I discovered a paradoxical thing. The key does not send some unambiguous sequence that does not conflict with others, but the ESC code itself (0x1b).

For example, this is what the F1 key sends

Code:

ogogon@devel# ./kprint
{0x1b:'ESC'}{0x4f:'O'}{0x50:'P'}
ogogon@devel#

And this is ESC key itself.

Code:

ogogon@devel# ./kprint
{0x1b:'ESC'}
ogogon@devel#

A huge logical ambiguity arises. Almost all keyboard code sequences begin with the ESC symbol (0x1b). I still don't understand how to distinguish a "single" ESC key from the beginning of a sequence.

Please tell me how to solve my problem.

Thank you for the answers to the point,
Ogogon.

Maturin · May 21, 2025

A more practical idea from similar issues I once faced when detecting buttons pressed while programming 8bit µCs was:
wait.
Either some non-Esc sequences are delivered within a very short time (< some Milliseconds), or another Esc-starts a new one. Anyway the time you're able to press two buttons shall be way longer than the time the non-Esc-sequence follows the Esc-sequence.

ogogon · May 21, 2025

Maturin said:
A more practical idea from similar issues I once faced when detecting buttons pressed while programming on 8bit µC was:
wait.
Either some non-Esc sequences are delivered within a very short time (< some Milliseconds), or another Esc-starts a new one. Anyway the time you're able to press two buttons shall be way longer than the time the non-Esc-sequence follows the Esc-sequence.

Thanks for the advice. I've read about this method somewhere, but in my opinion, it's a nightmare.
It turns out that having received the ESC I must remember the microtime of this event, and having received the next symbol - another microtime. If their difference allows us to judge that it is not the output of the operating system, but the result of mechanical actions, then should we recognize the ESC as a separate key?
Isn't there a more elegant way? For example, in the early 90s, on the VAX computer running BSD 4.7, it was necessary to use the stty command to switch the value of the backspace key code so that it corresponded to the entry in the termcap. But can't the ESC key be modified like that?

Maturin · May 21, 2025

ogogon said:
I must remember the microtime of this event, and having received the next symbol - another microtime.

I don't think you need to fetch the timestamp for that. That indeed was a nightmare.

Simply add some sleep/NOP which automatically starts everytime a Esc-sequence is detected. It's either interrupted, if the key's-sequence comes, or the time ran out, which means then the Esc-key was pressed.

You need to figure out the shortest time some one can achieve between two keys being pressed, which will be app. about a couple Milliseconds (if even). And you need to know the amount of time the longest key-sequence may take to be sent (app. a few Microseconds, so significantly shorter.)
Edit: You don't even need to know that. The time gap between the first Esc and any following sequence was enough.
Nobody can press keys faster than ~20...50ms, and your whole key sequence is sent within something ~0.5...2µs.
I would simply try using ~5..10µs If then still nothing but the Escape-Code is detected, the Esc-key was pressed.

Otherwise look if you find some keyboard module/device you can use/get acces to.

ogogon · May 21, 2025

Maturin said:
I don't think you need to fetch the timestamp for that. That indeed was a nightmare.

Simply add some sleep/NOP which automatically starts everytime a Esc-sequence is detected. It's either interrupted, if the key's-sequence comes, or the time ran out, which means then the Esc-key was pressed.

Sorry, I didn't quite understand what a sleep/NOP is.

I have done everything very simply, and one might even say, ingenuously.
The input terminal has echo and line input disabled. I left the input wait, since it didn't bother me in any way. Then I read the input character by character, collect multi-octet UTF symbols (if necessary) and put the result in the buffer. For many years this was enough for me.

What am I supposed to do now because of one key with a poorly thought out code?
1. Rewrite everything to select/poll?
2. After receiving each ESC, set VTIME and then remove it? So ESC symbols are always coming in there, and mostly as part of keyboard sequences. If I were to call tcsetattr all the time, the system would go crazy.
3. Call timer interrupt every time?
There may be a simpler and more elegant way, but it doesn't occur to me. Please tell me.

Maturin said:
Otherwise look if you find some keyboard module/device you can use/get acces to.

I'm sorry, I didn't quite understand what you meant.

Maturin · May 22, 2025

Sorry, maybe I was the one who misunderstood in the first place.
Even I was using the term interrupting I didn't mean CPU interrupt handling.

sleep/NOP = No Operation, "do nothing"

I simply meant:

Code:

limit = a large number to be reached by iteration within ~5...10µs;

[outer loop detecting key pressed]
    if [first buffer's content = Esc]  // or simply 'not empty', since Esc always comes first
        // start inner wait-loop:
        i = 0;
        while [i < limit]
            if [second buffer is still empty]
                i++; // wait
                [maybe some additional empty for-loops doing nothing here but killing time]
                if [i == limit] the Esc-key was pressed;
            else
                i = limit // that's what I meant with "interrupt": "break" the loop; sorry for poor choice of word
                any other key was pressed;
        end;
    else
        wait for buffer fill
[end outer loop]

Most primitive. No timestamps, no interrupt handling, no anything, just simply wait a fixed amount of clock cycles, and see if within this time either something comes after the Esc-code, which means another key was pressed, or time expires without anything comes anymore, which means the Esc key was pressed.

Since nobody can type faster than a bit quicker than a tenth second (~<100ms) and 30ms is somewhat like the human limit of neurologic perception (anything faster than that ain't even recognized is happened at all) you can distinguish both by simply waiting significantly shorter than that time, but long enough to see if additional non-Esc-sequences follow, or not.
Since the second code-sequence following the Esc-code comes over tenthousand times faster than any human can type, it can be distinguished by this time difference pretty clearly. It doesn't matter if this fixed do-nothing-but-wait-loop takes 20ms with a slow keyboard on a 100MHz core, or 5µs on a 5GHz core with a fast keyboard (interface) As long as you stay below that 30ms nobody will recognize there is a time-wait-period everytime a key is pressed.
Only important thing is you need to pick that wait-period as short as possible, longer but close to the time a second code-squence is delivered, e.g. the {0x4f:'O'} from your example, to avoid lags by the operating system's scheduling.

I agree, this ain't not a very elegant, but a most primitive approach. This is lowest level.
Those things you do when there is no "infrastructure" you can draw upon, you're working blank directly on the core's "bare metal", need to create "infrastructure" first - lowest level, for example to distinguish normal button press from long press from double-click, if for whatever reasons you cannot use a timer or interrupt handling for such things.

So sorry, I didn't meant to elaborate this with three posts filling your thread.
I just wanted to drop this idea here as another possible one may of some use among others you want to collect here.
I thought it was clear what I meant within my first post.Sorry.

Andriy · May 22, 2025

I know of only way to deal with that, a timeout.
All software that I know about uses that technique, e.g. vim.
Look for keyseq-timeout here https://www.gnu.org/software/bash/manual/html_node/Readline-Init-File-Syntax.html

Andriy · May 22, 2025

You can also search for fun stories about arrow keys in vi over ssh with high latency.

https://stackoverflow.com/questions/13021196/how-do-i-get-vim-to-recognize-esc-key-faster

ralphbsz · May 23, 2025

Been there, done that, long long time ago.

Let us begin by assuming that all you have is a hardware-agnostic command line interface with a serial console, or something (like the VGA + keyboard virtual consoles or an xterm) that emulates that serial console. Or an ssh (or telnet or rlogin) session. If you actually had access to the keyboard interface itself, you could easily distinguish which key has been pressed by looking at scan codes, but in general, you don't have that.

As several others said: The key to differentiating the escape key from other keys that emit sequences which begin with escape is the timing. These days, the connection between the keyboard and the program receiving keystrokes is typically VERY VERY fast, with effective character rates of hundreds of thousands of bits per second or more, which means tens of thousands of bytes (characters per second), or far less than a ms per character. That means that the escape sequence that a typical key emits (like "esc [ 19 ~" for F8 or "esc [ C" for cursor right) shows up with really tiny gaps of less than a ms between characters. On the other hand, if a human presses the escape key and then "[ C" manually, the gap between these key strokes will be dozens or hundreds of ms. This makes it very easy: Any time you receive a character, put it into a buffer, together with the exact time it was received. Calling gettimeofday() in C is extremely easy and extremely fast (typically takes less than microseconds), so this does not cause any overhead, compared to waiting for characters for milliseconds or seconds. Then you look at the character: if it is a character that can not possibly begin a sequence (like the letter "A"), you remove it from the buffer and process it. If it is a character that might begin a sequence (like esc), you wait a little bit, for example about 1ms. If more characters are received, you assemble the sequence in the buffer and eventually process a function key. If not, then you process the lonely esc key about 1ms later.

Obviously, your code will have to be asynchronous (or multi-threaded) to do this. That is a common programming technique today.

Where it gets difficult is edge cases. For example, you receive esc, and immediately afterwards other characters, but they don't form a valid escape sequence for a known function key. What now? Good question, and depends on your requirements. Or you receive esc, but due to a network slowdown, all following characters are delayed by a second: you will wrongly treat the escape a single key stroke, and then a second later you might input the characters "[ 19 ~" into your input stream. That's life. But in the real world, these edge cases are remarkably rare.

In the old days, when terminals and keyboards were connected via slow serial lines, this was considerably harder. For example, on a 1200 baud modem (which is what I used to connect to our VAX in the late 1980s), it takes about 8ms to transmit one character. Now the distinction between a single escape character followed by human typed letters (about 100ms) and the normal character delay (8ms) gets to be hairy, in particular if your computer is also used by lots of other people, overloaded, and slow. For this reason, good operating systems (for example VMS) never used the escape key at all, and the OS pretty much enforced that one would only use function keys.

Coding this up in C if one has a table of all the escape sequences for keys (from terminfo) would probably take several hours days of work, less if the program is already event-driven or multi-threaded, and less if one has access to good libraries (such as thread-safe queues and buffers, and locking).

Maturin · May 23, 2025

ralphbsz said:
As several others said:

"several" is good. It was only Andriy and me.

Thank you guys a lot for your inputs here - I always learn a lot just by reading this forums here

Thanks!