Techniques for Complex Script Keyboards – Visual Input Order

A few days ago I was assisting a Tamil customer with a Unicode keyboard they had designed which used visual input order. Visual input order means that vowels such as TAMIL VOWEL SIGN E, U+0BC6 ெ are typed before the consonant with which they combine, even though they are stored after the consonant as per the Unicode standard. Our customer was running into a problem where U+0BC6 ெ was combining with the wrong consonant in a run. In this blog I’ll discuss some of the possible solutions and potential issues with those solutions and finish with the solution that we proposed and that the customer chose to use. These solutions apply to any Indic script but we will use Tamil as an example.

The basic keyboard layout is shown below. For this discussion, I’ve only populated 3 keys – E, A and K. So you won’t be seeing real words in this example – the examples have been chosen as the simplest way to illustrate rendering complexities, and not as valid Tamil text.

The Keyman source file looks like this:

store(&VERSION) '8.0' 
store(&NAME) 'My First Tamil Keyboard' 
store(&MESSAGE) 'Demonstrating Visual Input Order' 

begin Unicode > use(main) 

group(main) using keys 

+ 'a' > U+0BBE 
+ 'e' > U+0BC6 
+ 'k' > U+0B95

Now with this visual input order keyboard, the Tamil vowel U+0BC6 ெ is stored after the consonant in the document but typed before it. The keyboard as it stands won’t do that:

Typed	Expected Display	Actual Display	Text Stored
e	ெ	ெ	U+0BC6
ek	கெ	ெக	U+0BC6 U+0B95

The initial solution was to add a rule to reorder these:

U+0BC6 + 'k' > U+0B95 U+0BC6

This fixes the initial issue but introduces the Travelling Vowel Problem – a vowel that just won’t stay where it is put:

Typed	Expected Display	Actual Display	Text Stored
e	ெ	ெ	U+0BC6
ek	கெ	கெ	U+0B95 U+0BC6
ekk	கெக	ககெ	U+0B95 U+0B95 U+0BC6
ekkk	கெகக	கககெ	U+0B95 U+0B95 U+0B95 U+0BC6

The problem here is how to tell that the U+0BC6 ெ has already been combined with a consonant to prevent it moving further down the text store. The solution initially chosen by our customer involved using U+200C ZWNJ to stop the vowel U+0BC6 ெ from moving along to the next consonant:

U+0BC6 + 'k' > U+0B95 U+0BC6 U+200C

This simple change stops the rule from matching repeatedly, because U+0BC6 ெ is no longer at the end of the context. But does that solve the problem completely?

Typed	Expected Display	Actual Display	Text Stored
e	ெ	ெ	U+0BC6
ek	கெ	கெ	U+0B95 U+0BC6
ekk	கெக	கெக	U+0B95 U+0BC6 U+200C U+0B95

Okay, so this seemed to display just fine but behind the scenes we now had an extra U+200C ZWNJ in the text store which is certainly not ideal. Our customer noticed this when one application rendered U+200C ZWNJ as a space rather than zero width.

So what if we used Keyman’s deadkey functionality to not actually store a character in the text, but still flag that the vowel has been combined?

U+0BC6 + 'k' > U+0B95 U+0BC6 deadkey(combined)

Typed	Expected Display	Actual Display	Text Stored
e	ெ	ெ	U+0BC6
ek	கெ	கெ	U+0B95 U+0BC6 (dk)
ekk	கெக	கெக	U+0B95 U+0BC6 (dk) U+0B95

Success! Or is it? What happens when we type the following?

Typed	Expected Display	Actual Display	Text Stored
k	க	க	U+0B95
ke	கெ	கெ	U+0B95 U+0BC6
kek	ககெ	ககெ	U+0B95 U+0B95 U+0BC6 (dk)
kekk	ககெக	ககெக	U+0B95 U+0B95 U+0BC6 (dk) U+0B95

Hey! We don’t want to combine with that consonant – this is a visual order keyboard! This could be called the Overenthusiastic Vowel Combining Problem. However, the text is stored correctly.

So that’s a rendering issue again. We can’t solve that with a deadkey statement. It looks like our customer was on the right track after all. The key to solving this is to remember that the uncombined vowel is an intermediate state. We can temporarily add a U+200B ZWSP before this vowel to stop it combining to the consonant, knowing that we can delete the U+200B ZWSP as soon as the combining consonant is typed, by changing two rules in the keyboard:

+ 'e' > U+200B U+0BC6
U+200B U+0BC6 + 'k' > U+0B95 U+0BC6

I chose U+200B ZWSP because it does not have any other shaping behaviour. Now when we type our test sequences, we get the following:

Typed	Expected Display	Actual Display	Text Stored
k	க	க	U+0B95
ke	கெ	கெ	U+0B95 U+200B U+0BC6
kek	ககெ	ககெ	U+0B95 U+0B95 U+0BC6
kekk	ககெக	ககெக	U+0B95 U+0BC6 U+0B95 U+0B95

I’ve highlighted in that table the U+200B ZWSP character that is stored temporarily to prevent the U+0BC6 ெ from combining with the previous character. Notice that U+200B ZWSP gets deleted in the next step. This simple pattern solves both the Overenthusiastic Vowel Combining Problem and the Travelling Vowel Problem.

Just for fun, I’ll add one final rule to handle the TAMIL VOWEL SIGN O, U+0BCA ொ. This is a combination of the U+0BC6 ெ and U+0BBE ா vowels, and is rendered on both sides of the consonant it attaches to. This ends up being a single, simple rule:

U+0B95 U+0BC6 + 'a' > U+0B95 U+0BCA

Typed	Expected Display	Actual Display	Text Stored
k	க	க	U+0B95
ke	கெ	கெ	U+0B95 U+200B U+0BC6
kek	ககெ	ககெ	U+0B95 U+0B95 U+0BC6
keka	ககொ	ககொ	U+0B95 U+0B95 U+0BCA

The final keyboard is then:

store(&VERSION) '8.0' 
store(&NAME) 'My First Tamil Keyboard' 
store(&MESSAGE) 'Demonstrating Visual Input Order' 

begin Unicode > use(main) 

group(main) using keys 

+ 'a' > U+0BBE 
+ 'e' > U+200B U+0BC6 
+ 'k' > U+0B95 
U+200B U+0BC6 + 'k' > U+0B95 U+0BC6 
U+0B95 U+0BC6 + 'a' > U+0B95 U+0BCA

Using these design patterns, you can create visual input order keyboards for any of the Indic scripts, and you would transfer the same principles to phonetic input methods. Judicious use of the any, index and store statements will also make light work of handling all the possible combinations. Other considerations that I have not covered here include visual order backspacing and prevention of illegal combinations such as U+0B95 U+0BBE U+0BBE காா.

1 thought on “Techniques for Complex Script Keyboards – Visual Input Order”

Marc Durdin · February 11, 2011 at 9:13 am

As noted by one reader, in the final 3 examples, the first Tamil Letter K U+0B95 depicted is shown as a consonant from a previous syllable. In practice this issue may rarely arise but for the prevention of confusion by the unexpected reordering of vowels, it is still useful to do this.

Techniques for Complex Script Keyboards – Visual Input Order

Published by Marc Durdin on February 8, 2011

1 thought on “Techniques for Complex Script Keyboards – Visual Input Order”

Marc Durdin · February 11, 2011 at 9:13 am

Leave a Reply Cancel reply

Related Posts

Developing Keyboards

How to resolve CAPS and NCAPS ambiguity in Keyman keyboards

Developing Keyboards

Introducing the Keyman Open Source Keyboard Repository

Android

Using Keyman Engine for Android to create a system keyboard