This highly technical blog post explains the design rationale for the Keyman Engine keystroke input pipeline on Windows. The intended audience is software developers working on Keyman Engine, and other interested onlookers.
In the beginning, the user-mode Windows keyboard input pipeline looked something like this:
Now this diagram leaves out a lot of detail, in particular, the Windows kernel interactions and device drivers, because they are not important for this discussion. The core user-mode participants are shown in terms of their isolated threads of execution, rather than as components of the system. As such, the keyboard mapping DLLs, such as kbdus.dll, are omitted from the diagram; their responsibility is in mapping from scan codes to virtual key (VK) codes, sent with WM_KEYDOWN
and related messages, and from virtual key codes to character codes, sent with WM_CHAR
.
I am using the terms “down” and “up” to describe when keys are pressed and released, rather than the more esoteric “make” and “break” keyboard driver equivalents. If you are familiar with how Windows deals with keyboard input, you’ll spot a number of other simplifications which I hope are not relevant to this particular discussion.
The diagram below shows components rather than threads of execution.
For the rest of this blog post, I will be focusing on the sequence of operations, or event scenarios, using sequence diagrams. I will build up from a simplistic model of character injection, and at each step explain where things come unstuck.
A traditional Windows keyboard inserts 1 or more characters (except in the case of deadkeys, where there may be no visible output) into the text store. However, Keyman differs from this model because each keystroke event can initiate a transform in the target application’s text store, potentially deleting a number of characters before the insertion point, and then inserting a new sequence of characters.
When an application includes full support for the Text Services Framework, Keyman utilises this to transform the text store in a robust manner. Unfortunately, only a handful of applications on Windows support Text Services Framework, and so for all other applications, Keyman instead applies the transform by first posting a sequence of VK_BACK
key events to delete the appropriate number of characters from the left of the insertion point, and then posting a sequence of VK_PACKET
key events to insert the replacement string, using the SendInput
API function.
A simple example
For example, say we want to apply the following Keyman rule:
U+006F + [K_BKQUOTE] > U+00F2 c o` --> ò
The text store contains:
h | o |
U+0068 | U+006F |
When the user presses the Back Quote (Grave Accent) key, Keyman can emit the following key events using the SendInput
API, to accomplish the transform:
VK_BACK
downVK_BACK
upVK_PACKET
(U+00F2
) downVK_PACKET
(U+00F2
) up
So here’s how that looks in a sequence diagram:
Here, we are representing the Keyman interactions with the title TIP, which stands for Text Input Processor. The Text Input Processor is part of the Text Services Framework which is enabled for every application.
Note how Keyman generates SendInput
events which go back through Keyman before reaching the target application. I have fudged this slightly because technically, the TIP is part of the Application’s message loop, through the equivalent of a GetMessage
hook. However, for the sake of clarity in diagramming, it makes little difference.
How does Keyman distinguish between key events it generates versus those generated by the user (or the system, think remote desktop applications)? It sets a flag in the message extended info, which is not used by the Windows system keyboard events.
So this basic model works quite well, with two caveats.
- TIPs cannot trap all key events — depending on the language under which the TIP is registered, the set of key events available can vary — and the exact availability is undocumented. This is of course a problem for a generalized keyboarding solution.
- We haven’t attempted to deal with modifier keys such as Ctrl, Shift and Alt (or the chiral versions of those &emdash; chirality here referring to left vs right modifier keys).
We have found reasonable solutions for the first issue, and we won’t delve into those here. However, I think you are going to find that the modifier key rabbit hole is deep indeed!
The problem with modifiers
I have shown a simple example that doesn’t involve any modifier keys. Let’s add a Right Alt modifier key into our example, so that we can continue to use the Back Quote or Grave Accent key normally. The Right Alt key is also known as AltGr on European layouts, and is used in many keyboard layouts to access a third set of characters (unshifted and shifted keys being the first and second sets, of course!)
Here’s the updated Keyman rule:
U+006F + [RALT K_BKQUOTE] > U+00F2 c o AltGr+` --> ò
What happens now? Well, initially perhaps we could just let Keyman do its thing, recording the AltGr state but letting it go through to the application. But there is a problem. Can you see what it is?
I guess that diagram lets the cat out of the bag. As you can see, the application believes it received an Alt+Backspace event, which is interpreted as an Undo command by many applications.
I’ve omitted the release events for the keys because they don’t matter, yet. And yes, technically VK_RMENU
is received as a VK_MENU
with an extended bit set. Again, it doesn’t matter for this discussion.
Why do we need to pass the modifier key event through to the application? Two good reasons:
- Windows applications commonly use the Alt key to activate menus and shortcuts. When you hold the Alt key down, the Ribbon interface in Windows and Office applications also displays hints to let you know which key to press.
- Many applications, particularly drawing applications, use modifiers to change the behaviour of the mouse, and will signal the new behaviour by changing the mouse cursor when a modifier key is held down.
If we blocked the modifier keys from reaching the application (or only allowed them to arrive together with the next keystroke), we would block these useful behaviours.
Note: We could require new Keyman keyboards use only Shift and AltGr, and never pass the AltGr modifier key on to target applications. In the vast majority of situations, Shift+Backspace has the same action as Backspace, so we probably don’t have to worry about that. This would mean we could always block the target application from receiving any AltGr events when a keyboard has any AltGr rules.
Except when we have to map Ctrl+Alt to AltGr because of hardware keyboards that don’t have a Right Alt key, such as is common on very small laptops. In the world of keyboarding, there’s always an exception! And it doesn’t solve the problem for the hundreds of existing keyboard layouts that do use other modifiers, either. For a variety of other reasons, we do recommend that Keyman keyboards avoid use of modifiers other than Shift and AltGr, but it isn’t a 100% solution. |
So what can we do to solve this problem? What if we get Keyman to simulate releasing the Right Alt key before sending the output?
Oh dear! It turns out that pressing and releasing Alt, without any other intervening key event, opens the main menu in Windows applications.
Calling on the Main Menu
But we can fix this too! We can inject a dummy keystroke, which I will call VK_ZAP
here, to prevent the VK_MENU
default action from taking effect. Because VK_ZAP
is not recognised by any Windows applications, they just ignore it. (Internally we use code 0x07
for VK_ZAP
).
We mustn’t forget to push the Alt key back down again at the end of our sequence, and if you think it through, you’ll see we also need to send another zapper as well, in case the user releases the Alt key immediately afterward — otherwise we’ll end up with the menu opening mysteriously! (If we didn’t push Alt down again, then typing a sequence of keystrokes with Alt down would require releasing and pressing Alt for each keystroke).
So how does that look?
Success! We get the keystroke and at the end of the sequence, we have a consistent keyboard state.
Getting Stuck
But of course, nothing is ever that easy! This was essentially the model that Keyman used for a long time. And we found that occasionally, the modifier keys would get ‘stuck’ — the user would release the Alt key but Keyman (and Windows) would think it was still down. How could this happen?
Let’s see if we can model it. In this diagram I have collapsed the down and up pairs of key events where they are not significant.
So what happens here? When the user releases the key (highlighted in blue) before Keyman finishes processing and emitting the batch of output events, the application receives two Alt key up events before the character output occurs, finishing with the Alt key down event, (highlighted in red).
End result? From the user’s perspective, the Alt key seems to be stuck.
Serializing the input
The biggest difficulty here is that we cannot control when events enter the System Input Queue. And once we send something to the queue, it’s really too late to test. The only way we can fix this is if we control the queue ourselves. And so that’s what we’ve done. For each keystroke event we receive from the user, we duplicate it, sending it back to the System Input Queue with a Keyman flag on it, effectively creating our own subqueue within the queue for which we can control the order of events.
In this diagram, I have represented the now-familiar Keyman output as a single ‘transform’ event, and I actually swapped the order in which the Right Alt key and the Grave Accent key are released — this happens in real life and it doesn’t change the outcome, but it does make the diagram easier to understand. The Keyman-flagged activations of the System Input Queue are coloured in light blue.
You should be able to see how even though the Right Alt release event is received by the System Input Queue before it reaches the TIP, we can guarantee it doesn’t reach the application until after the transform is complete.
Metrofication
I’m going to throw one final wrinkle into the mix. When the Keyman TIP is activated in a Universal Windows Platform application (also known as a Metro app or a Modern UI app, among other names), it turns out that the TIP does not have sufficient permission to use the SendInput
API. This means we have to move the work of processing keystrokes out of the TIP and into a separate low level keyboard hook that runs in the context of the Keyman process.
This adds further complexity with the inter-process communication to ensure that key input events can be serialized, and the cleanest way of resolving this is to instantiate yet another thread that takes care of managing modifier state and serializing it.
The final model looks something like the following diagram, as usual somewhat simplified. I have included only a single key event here, because the pattern is the same for multiple events as the serializer guarantees the order events are received by the TIP and the Application.
Notice how the TIP no longer sees any events except for those generated by the serializer and posted to the queue. There are some internal tweaks to how the VK_ZAP
and modifier reset events are generated, but these are handled entirely within the serialization thread.
Future design possibilities
This architecture opens up the possibility of moving the keystroke processing entirely out of the application process and into the Keyman process in a future version of Keyman. This would improve the performance of Keyman and reduce its memory footprint, as keyboard layouts would not need to be loaded in each process.
References
http://www.philipstorr.id.au/pcbook/book3/scancode.htm (scan codes)
https://en.wikipedia.org/wiki/ISO/IEC_9995#ISO/IEC_9995-2 (key names)
https://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=keybrddesign (components diagram)
https://sequencediagram.org/ (used to create the diagrams)
6 thoughts on “The Keyman keyboard input pipeline”
Jonny · November 23, 2018 at 7:28 pm
This is a great article; you simplify a fair amount of information to just the right degree. This is very interesting and informative!
I use Mac the majority of the time, and in many ways I prefer the basic functioning and possibilities for customization on that platform. But my only portable computer is a Windows machine so I’ve been on a quest to equalize the keyboards and minimize the differences in their behavior.
But I have exceptional wishes and expectations for my keyboards & keyboard layouts, and I have customized my layout, pushing the specification beyond its intended use.
I want to be able to enter text in several languages/scripts as well as a number of other special characters & symbols. It all needs to be very logical so that I can remember how to produce all the rarely used symbols when I need them.
A couple years ago I discovered KbdEdit and it has really helped me to bridge the gap. I actually use it in tandem with several Windows registry scan code changes, as well as some AutoHotKey functions. There is a significant problem; however, which is actually a limitation with how Windows processes dead keys.
I especially appreciate KbdEdit’s ability to utilize and customize the use and behavior of the additional modifier keys found on Japanese keyboards in particular. These extra modifiers seem as though they would avoid the problems you discuss above. (Because the Japanese modifiers do not need to double-function as command keys…)
One of my requirements is to type Ethiopic text. This is where Windows doesn’t meet my expectations. The most used Ethiopic syllables are the 6th order characters (representing a short, often epenthetic vowel, and the consonant alone). Dead keys are the only native way to enter Ethiopic on Mac, so I use the 6th order form as the default output. This means if I don’t need a different vowel it’s just one keystroke. In Windows, the output is produced as expected, but the next dead key state is skipped as well, producing the default 6th order form negating the saved keystroke.
I’m considering the option of having Keyman process just the Ethiopic text since it doesn’t seem to rely on Windows dead keys. Plus it is also able to provide visual feedback throughout the process.
I’m just wondering how Keyman would react. You indicated that other modifiers aren’t recognized, but really I would just be using the text transformation features.
I hesitate to add another component to my already complex set-up, but the way Ethiopic is being handled is really a pain and it’s the last remaining significant difference between platforms.
Marc Durdin · November 24, 2018 at 7:35 am
I think it could be difficult to integrate Keyman with KbdEdit or AutoHotKey. You are welcome to try — Keyman is free so there’s real loss if it doesn’t work out for you.
Jasper · January 1, 2019 at 5:00 am
Marc:
This is a very well written and fascinating article explaining all the complexities that must be dealt with behind the scenes to make Keyman work!
In my limited experience KbdEdit really pushes the limits of the input/keyboarding model for Windows by offering up to 15 possible modifier states and the use of up to 3 addition modifier keys, one “locking”! It delves deeply into barely documented territory with consistent and reliable results thanks to the fact that all the support is built-in to Windows and doesn’t require any additional software.
Have you ever considered the possibility of avoiding some of the complications you’ve outlined here involving modifier keys by somehow using the Asian modifier states in a way that is essentially invisible to the user? I haven’t thought this through in any way, but my initial thought is that something along those lines could potentially offer an alternative approach that may have some inherent benefits or simplicity.
Concrete examples may help illustrate what I’m imagining to any potential readers who are unfamiliar with the topic.
Caps Lock: Some clever keyboards I’ve seen use the Caps Lock state to enable dual-script keyboards where the Caps Lock actually switches to an alternate writing system.
This “hack” can be legitimately criticized in various ways, nonetheless it offers some unique features which seem to be very useful & convenient to the user base.
But Windows has a number of limitations/restrictions associated with the Caps Lock state (dead keys, multiple character output, etc.) which do not exist in other operating systems.
By substituting the Kana Lock state “under the hood” it seems like these limitations could be circumvented. It may even be possible to retain the actual Caps Lock state as well activated with Shift+Caps Lock.
AltGr: This key has all sorts of complications as you outlined. It seems like it could be substituted with the Right Oyayubi key and avoid all of the complications described above.
I imagine that the switch could be implemented on various levels which would probably each have some differences & advantages or complications. And again, this could be done in a way that was invisible to the end-user and thus avoid unnecessary confusion.
Although such an approach could also introduce/involve complications I’m not even aware of, or an excessive amount of changes to the code.
Johnny: Good luck in your endeavors. Your configuration sounds complex; I can’t help but wonder how the interaction of KbdEdit & Keyman would play out.
Marc Durdin · January 1, 2019 at 7:17 pm
Thanks for the comments Jasper 🙂
I don’t think it would be possible to use the Asian modifier states in the way you describe; we don’t want to change the normal operation of modifier keys except for those combinations which are used with a specific keyboard layout. If we didn’t have any legacy keyboard layouts to support, we could simplify things by restricting how modifier keys are used — but because we do, we have had to cross this bridge and therefore the hard work is done!
Jasper · January 31, 2019 at 3:22 am
That’s too bad. It seems like these modifiers could potentially open up some interesting features and simplications (not so much in existing keyboards but for new ones- without affecting other keyboards).
But my knowledge & understanding in this field is limited and mainly based on my own experiences.
I know that with KbdEdit I can use and assign unique sets of modifier keys for different keyboards and switch between them on the same hardware keyboard without any adverse affect on each other. And I have encountered very few apps that do not respond as expected. Everything seems to happen quietly under the hood. I am able to type the characters I want to with the same key-presses as if I was using a keyboard with AltGr, but they no longer interfere with any application keyboard shortcuts (since the special characters actually use the asian modifiers instead of AltGr). I just have to use Left Alt for any Alt, Shift-Alt or Ctrl-Alt combinations.
Apps do not recognize, or generally even seem to detect the oyayubi key. They just receive the characters from the keyboard. It’s also possible to have additional modifier states for Shift and Ctrl with Oyayubi for even more characters.
And the KanaLock key essentially behaves like a hotkey to switch keyboards but without actually having to load another keyboard into memory.
It’s unfortunate that none of this can be utilized in Keyman.
Keyman 11.0 beta – Keyman Blog · January 7, 2019 at 6:58 pm
[…] In Keyman Desktop, we fixed a whole swathe of issues that should make using Keyman an even better experience. In particular, we added support for Metro-style and Windows Store applications such as Skype or Windows Search, and improved the robustness of input with a serialized input queue. […]