This highly technical blog post explains the design rationale for the Keyman Engine keystroke input pipeline on Windows. The intended audience is software developers working on Keyman Engine, and other interested onlookers.
In the beginning, the user-mode Windows keyboard input pipeline looked something like this:
Now this diagram leaves out a lot of detail, in particular, the Windows kernel interactions and device drivers, because they are not important for this discussion. The core user-mode participants are shown in terms of their isolated threads of execution, rather than as components of the system. As such, the keyboard mapping DLLs, such as kbdus.dll, are omitted from the diagram; their responsibility is in mapping from scan codes to virtual key (VK) codes, sent with
WM_KEYDOWN and related messages, and from virtual key codes to character codes, sent with
I am using the terms “down” and “up” to describe when keys are pressed and released, rather than the more esoteric “make” and “break” keyboard driver equivalents. If you are familiar with how Windows deals with keyboard input, you’ll spot a number of other simplifications which I hope are not relevant to this particular discussion.
The diagram below shows components rather than threads of execution.
For the rest of this blog post, I will be focusing on the sequence of operations, or event scenarios, using sequence diagrams. I will build up from a simplistic model of character injection, and at each step explain where things come unstuck.
A traditional Windows keyboard inserts 1 or more characters (except in the case of deadkeys, where there may be no visible output) into the text store. However, Keyman differs from this model because each keystroke event can initiate a transform in the target application’s text store, potentially deleting a number of characters before the insertion point, and then inserting a new sequence of characters.
When an application includes full support for the Text Services Framework, Keyman utilises this to transform the text store in a robust manner. Unfortunately, only a handful of applications on Windows support Text Services Framework, and so for all other applications, Keyman instead applies the transform by first posting a sequence of
VK_BACK key events to delete the appropriate number of characters from the left of the insertion point, and then posting a sequence of
VK_PACKET key events to insert the replacement string, using the
SendInput API function.
A simple example
For example, say we want to apply the following Keyman rule:
U+006F + [K_BKQUOTE] > U+00F2 c o` --> ò
The text store contains:
When the user presses the Back Quote (Grave Accent) key, Keyman can emit the following key events using the
SendInput API, to accomplish the transform:
So here’s how that looks in a sequence diagram:
Here, we are representing the Keyman interactions with the title TIP, which stands for Text Input Processor. The Text Input Processor is part of the Text Services Framework which is enabled for every application.
Note how Keyman generates
SendInput events which go back through Keyman before reaching the target application. I have fudged this slightly because technically, the TIP is part of the Application’s message loop, through the equivalent of a
GetMessage hook. However, for the sake of clarity in diagramming, it makes little difference.
How does Keyman distinguish between key events it generates versus those generated by the user (or the system, think remote desktop applications)? It sets a flag in the message extended info, which is not used by the Windows system keyboard events.
So this basic model works quite well, with two caveats.
- TIPs cannot trap all key events — depending on the language under which the TIP is registered, the set of key events available can vary — and the exact availability is undocumented. This is of course a problem for a generalized keyboarding solution.
- We haven’t attempted to deal with modifier keys such as Ctrl, Shift and Alt (or the chiral versions of those &emdash; chirality here referring to left vs right modifier keys).
We have found reasonable solutions for the first issue, and we won’t delve into those here. However, I think you are going to find that the modifier key rabbit hole is deep indeed!
The problem with modifiers
I have shown a simple example that doesn’t involve any modifier keys. Let’s add a Right Alt modifier key into our example, so that we can continue to use the Back Quote or Grave Accent key normally. The Right Alt key is also known as AltGr on European layouts, and is used in many keyboard layouts to access a third set of characters (unshifted and shifted keys being the first and second sets, of course!)
Here’s the updated Keyman rule:
U+006F + [RALT K_BKQUOTE] > U+00F2 c o AltGr+` --> ò
What happens now? Well, initially perhaps we could just let Keyman do its thing, recording the AltGr state but letting it go through to the application. But there is a problem. Can you see what it is?
I guess that diagram lets the cat out of the bag. As you can see, the application believes it received an Alt+Backspace event, which is interpreted as an Undo command by many applications.
I’ve omitted the release events for the keys because they don’t matter, yet. And yes, technically
VK_RMENU is received as a
VK_MENU with an extended bit set. Again, it doesn’t matter for this discussion.
Why do we need to pass the modifier key event through to the application? Two good reasons:
- Windows applications commonly use the Alt key to activate menus and shortcuts. When you hold the Alt key down, the Ribbon interface in Windows and Office applications also displays hints to let you know which key to press.
- Many applications, particularly drawing applications, use modifiers to change the behaviour of the mouse, and will signal the new behaviour by changing the mouse cursor when a modifier key is held down.
If we blocked the modifier keys from reaching the application (or only allowed them to arrive together with the next keystroke), we would block these useful behaviours.
|Note: We could require new Keyman keyboards use only Shift and AltGr, and never pass the AltGr modifier key on to target applications. In the vast majority of situations, Shift+Backspace has the same action as Backspace, so we probably don’t have to worry about that. This would mean we could always block the target application from receiving any AltGr events when a keyboard has any AltGr rules.
Except when we have to map Ctrl+Alt to AltGr because of hardware keyboards that don’t have a Right Alt key, such as is common on very small laptops. In the world of keyboarding, there’s always an exception!
And it doesn’t solve the problem for the hundreds of existing keyboard layouts that do use other modifiers, either.
For a variety of other reasons, we do recommend that Keyman keyboards avoid use of modifiers other than Shift and AltGr, but it isn’t a 100% solution.
So what can we do to solve this problem? What if we get Keyman to simulate releasing the Right Alt key before sending the output?
Oh dear! It turns out that pressing and releasing Alt, without any other intervening key event, opens the main menu in Windows applications.
Calling on the Main Menu
But we can fix this too! We can inject a dummy keystroke, which I will call
VK_ZAP here, to prevent the
VK_MENU default action from taking effect. Because
VK_ZAP is not recognised by any Windows applications, they just ignore it. (Internally we use code
We mustn’t forget to push the Alt key back down again at the end of our sequence, and if you think it through, you’ll see we also need to send another zapper as well, in case the user releases the Alt key immediately afterward — otherwise we’ll end up with the menu opening mysteriously! (If we didn’t push Alt down again, then typing a sequence of keystrokes with Alt down would require releasing and pressing Alt for each keystroke).
So how does that look?
Success! We get the keystroke and at the end of the sequence, we have a consistent keyboard state.
But of course, nothing is ever that easy! This was essentially the model that Keyman used for a long time. And we found that occasionally, the modifier keys would get ‘stuck’ — the user would release the Alt key but Keyman (and Windows) would think it was still down. How could this happen?
Let’s see if we can model it. In this diagram I have collapsed the down and up pairs of key events where they are not significant.
So what happens here? When the user releases the key (highlighted in blue) before Keyman finishes processing and emitting the batch of output events, the application receives two Alt key up events before the character output occurs, finishing with the Alt key down event, (highlighted in red).
End result? From the user’s perspective, the Alt key seems to be stuck.
Serializing the input
The biggest difficulty here is that we cannot control when events enter the System Input Queue. And once we send something to the queue, it’s really too late to test. The only way we can fix this is if we control the queue ourselves. And so that’s what we’ve done. For each keystroke event we receive from the user, we duplicate it, sending it back to the System Input Queue with a Keyman flag on it, effectively creating our own subqueue within the queue for which we can control the order of events.
In this diagram, I have represented the now-familiar Keyman output as a single ‘transform’ event, and I actually swapped the order in which the Right Alt key and the Grave Accent key are released — this happens in real life and it doesn’t change the outcome, but it does make the diagram easier to understand. The Keyman-flagged activations of the System Input Queue are coloured in light blue.
You should be able to see how even though the Right Alt release event is received by the System Input Queue before it reaches the TIP, we can guarantee it doesn’t reach the application until after the transform is complete.
I’m going to throw one final wrinkle into the mix. When the Keyman TIP is activated in a Universal Windows Platform application (also known as a Metro app or a Modern UI app, among other names), it turns out that the TIP does not have sufficient permission to use the
SendInput API. This means we have to move the work of processing keystrokes out of the TIP and into a separate low level keyboard hook that runs in the context of the Keyman process.
This adds further complexity with the inter-process communication to ensure that key input events can be serialized, and the cleanest way of resolving this is to instantiate yet another thread that takes care of managing modifier state and serializing it.
The final model looks something like the following diagram, as usual somewhat simplified. I have included only a single key event here, because the pattern is the same for multiple events as the serializer guarantees the order events are received by the TIP and the Application.
Notice how the TIP no longer sees any events except for those generated by the serializer and posted to the queue. There are some internal tweaks to how the
VK_ZAP and modifier reset events are generated, but these are handled entirely within the serialization thread.
Future design possibilities
This architecture opens up the possibility of moving the keystroke processing entirely out of the application process and into the Keyman process in a future version of Keyman. This would improve the performance of Keyman and reduce its memory footprint, as keyboard layouts would not need to be loaded in each process.
https://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=keybrddesign (components diagram)
https://sequencediagram.org/ (used to create the diagrams)