This blog post is a refresh of some documentation that has been buried for years on the Tavultesoft website. It is perhaps less important now than it was 5 years ago, but it is still helpful for those poor souls among us who are tasked with looking after existing legacy applications!

A bit of background

Windows has two classes of functions: the "A"
(ANSI) and the "W" (Wide or Unicode) functions. For example,
GetWindowText maps to either GetWindowTextA or GetWindowTextW, depending on the flags with which your application is compiled. The ANSI functions work in 8-bit land, supporting only codepage based input.

Many applications have been created using ANSI
functions as the Unicode functions were not supported (apart from a few special
exceptions) under Windows 95, 98 and Me. Many application development frameworks originally did not support the Unicode functions. This means that if you didn’t create your application in the last 2 or 3 years, it is quite likely that it does not support Unicode input.

In particular, windows can created either as ANSI or Unicode
windows, in Windows NT4, 2000, XP, Vista or 7. This affects how WM_CHAR
messages are received by the window. If the window is created as an ANSI
window, WM_CHAR will always be received by the window class as codepage
characters. If the window is created as a Unicode window, all WM_CHAR messages
will contain Unicode (UTF-16) characters. The IsWindowUnicode(HWND) function
will tell you whether the window supports Unicode input.

So how does MSLU fit in?

MSLU – Microsoft Layer for Unicode – provides missing
Unicode functionality to Windows 9x through a library. Unfortunately, the one
thing it cannot do is allow windows to be created as Unicode – so direct
Unicode input is not possible.

Unicode input in Windows NT4, 2000, XP, Vista and 7

You can create a window as Unicode (even in a generally ANSI
application) by ensuring that you use the following functions instead of the
‘A’ equivalents:

  • RegisterClassW
  • CreateWindowW (or CreateWindowExW)
  • GetMessageW
  • PeekMessageW
  • DispatchMessageW

The following example of a message loop (modified slightly
from Borland Delphi’s forms.pas) may be useful in helping you to support having
specific windows as Unicode. In this example, we had to modify the class
library in a mimimal fashion in order to properly support Unicode windows. The
important changes are highlighted:

function TApplication.ProcessMessage(var Msg: TMsg): Boolean;
var
  FIsUnicode, Handled: Boolean;
begin
  Result := False;
  if PeekMessage(Msg, 0, 0, 0, PM_NOREMOVE) then      // this was just PM_REMOVE prev.
  begin
    FIsUnicode := IsWindowUnicode(Msg.HWnd);
 
    if FIsUnicode
      then PeekMessageW(Msg, 0, 0, 0, PM_REMOVE)
      else PeekMessage(Msg, 0, 0, 0, PM_REMOVE);
    
    Result := True;
    if Msg.Message <> WM_QUIT then
    begin
      Handled := False;
      if Assigned(FOnMessage) then FOnMessage(Msg, Handled);
      if not IsHintMsg(Msg) and not Handled and not IsMDIMsg(Msg) and
        not IsKeyMsg(Msg) and not IsDlgMsg(Msg) then
      begin
        TranslateMessage(Msg);
        if FIsUnicode
          then DispatchMessageW(Msg)
          else DispatchMessage(Msg);
      end;
    end
    else
      FTerminate := True;
  end;
end;

Another useful trick is to force a window to Unicode by
resetting its window proc after the window has been created:

if FAllowUnicodeInput then
  SetWindowLongW(Handle, GWL_WNDPROC, GetWindowLong(Handle, GWL_WNDPROC))
else
  SetWindowLongA(Handle, GWL_WNDPROC, GetWindowLong(Handle, GWL_WNDPROC));

Unicode input using WM_UNICHAR

If you cannot modify your framework, then you may not be able to create your windows as Unicode windows, or receive Unicode WM_CHAR messages.

Fortunately, there is an alternative: WM_UNICHAR (0x0109).
Microsoft created the WM_UNICHAR message as a standard alternative for Unicode
character input, especially for Keyman. The WM_UNICHAR message is easy to
handle (again, from Delphi):

procedure TSomeWindow.WMUniChar(var Message: TWMChar);
begin
  if FAllowUnicodeInput then
  begin
    if Message.CharCode = UNICODE_NOCHAR then
      Message.Result := 1
    else
      InsertUTF32Character(Message.CharCode);
  end
  else
    Message.Result := 0;
end;

The WM_UNICHAR message is first sent to a window,
with UNICODE_NOCHAR (0xFFFF) in order to determine if the window supports it.
If not handled, DefWindowProc will return 0 (on all versions of Windows),
indicating that WM_UNICHAR is not supported. You must return 1 from your
window procedure in order to receive input via WM_UNICHAR. After returning 1,
subsequent characters will be posted in, by Keyman, via WM_UNICHAR instead of
by WM_CHAR. Note that WM_UNICHAR uses UTF-32 instead of UTF-16 which most
Windows applications use.

The following C macros can be helpful in converting between
UTF-16 surrogate pairs and UTF-32 (see www.unicode.org
for more detail on UTF-16 surrogate pairs, and the supplementary planes):

#define Uni_IsSurrogate1(ch) ((ch) >= 0xD800 && (ch) <= 0xDBFF)
#define Uni_IsSurrogate2(ch) ((ch) >= 0xDC00 && (ch) <= 0xDFFF)
 
#define Uni_SurrogateToUTF32(ch, cl) (((ch) - 0xD800) * 0x400 + ((cl) - 0xDC00) + 0x10000)
 
#define Uni_UTF32ToSurrogate1(ch) (((ch) - 0x10000) / 0x400 + 0xD800)
#define Uni_UTF32ToSurrogate2(ch) (((ch) - 0x10000) % 0x400 + 0xDC00)

Which approach should I use?

The WM_UNICHAR message is supported by Keyman Engine on all Windows
platforms. So as a simple solution, you could just
support that. However, that would mean that newer Windows XP, Vista and Windows 7 keyboards (e.g.
some Asian language keyboards) would not work in your application. This decision really depends on your user base and the level of complexity involved in a complete solution.

Categories: Developing Keyman

0 thoughts on “Accepting Unicode input in your Windows application”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Posts

Developing Keyman

Keyman Update for 30 August 2024

This blog reports on significant Keyman product and keyboard development updates over the period from 19 August 2024 — 30 August 2024. As always, you can follow all of our development online at github.com/keymanapp/keyman, and Read more…

Developing Keyman

Keyman Update for 16 August 2024

This blog reports on significant Keyman product and keyboard development updates over the period from 05 August 2024 — 16 August 2024. As always, you can follow all of our development online at github.com/keymanapp/keyman, and Read more…

Developing Keyman

Keyman Update for 19 July 2024

This blog reports on significant keyboard development updates over the period from 08 July 2024 — 19 July 2024. As always, you can follow all of our development online at github.com/keymanapp/keyman, and you may find Read more…