This blog post is a refresh of some documentation that has been buried for years on the Tavultesoft website. It is perhaps less important now than it was 5 years ago, but it is still helpful for those poor souls among us who are tasked with looking after existing legacy applications!
A bit of background
Windows has two classes of functions: the "A"
(ANSI) and the "W" (Wide or Unicode) functions. For example,
GetWindowText maps to either GetWindowTextA or GetWindowTextW, depending on the flags with which your application is compiled. The ANSI functions work in 8-bit land, supporting only codepage based input.
Many applications have been created using ANSI
functions as the Unicode functions were not supported (apart from a few special
exceptions) under Windows 95, 98 and Me. Many application development frameworks originally did not support the Unicode functions. This means that if you didn’t create your application in the last 2 or 3 years, it is quite likely that it does not support Unicode input.
In particular, windows can created either as ANSI or Unicode
windows, in Windows NT4, 2000, XP, Vista or 7. This affects how WM_CHAR
messages are received by the window. If the window is created as an ANSI
window, WM_CHAR will always be received by the window class as codepage
characters. If the window is created as a Unicode window, all WM_CHAR messages
will contain Unicode (UTF-16) characters. The IsWindowUnicode(HWND) function
will tell you whether the window supports Unicode input.
So how does MSLU fit in?
MSLU – Microsoft Layer for Unicode – provides missing
Unicode functionality to Windows 9x through a library. Unfortunately, the one
thing it cannot do is allow windows to be created as Unicode – so direct
Unicode input is not possible.
Unicode input in Windows NT4, 2000, XP, Vista and 7
You can create a window as Unicode (even in a generally ANSI
application) by ensuring that you use the following functions instead of the
‘A’ equivalents:
- RegisterClassW
- CreateWindowW (or CreateWindowExW)
- GetMessageW
- PeekMessageW
- DispatchMessageW
The following example of a message loop (modified slightly
from Borland Delphi’s forms.pas) may be useful in helping you to support having
specific windows as Unicode. In this example, we had to modify the class
library in a mimimal fashion in order to properly support Unicode windows. The
important changes are highlighted:
function TApplication.ProcessMessage(var Msg: TMsg): Boolean; var FIsUnicode, Handled: Boolean; begin Result := False; if PeekMessage(Msg, 0, 0, 0, PM_NOREMOVE) then // this was just PM_REMOVE prev. begin FIsUnicode := IsWindowUnicode(Msg.HWnd); if FIsUnicode then PeekMessageW(Msg, 0, 0, 0, PM_REMOVE) else PeekMessage(Msg, 0, 0, 0, PM_REMOVE); Result := True; if Msg.Message <> WM_QUIT then begin Handled := False; if Assigned(FOnMessage) then FOnMessage(Msg, Handled); if not IsHintMsg(Msg) and not Handled and not IsMDIMsg(Msg) and not IsKeyMsg(Msg) and not IsDlgMsg(Msg) then begin TranslateMessage(Msg); if FIsUnicode then DispatchMessageW(Msg) else DispatchMessage(Msg); end; end else FTerminate := True; end; end;
Another useful trick is to force a window to Unicode by
resetting its window proc after the window has been created:
if FAllowUnicodeInput then SetWindowLongW(Handle, GWL_WNDPROC, GetWindowLong(Handle, GWL_WNDPROC)) else SetWindowLongA(Handle, GWL_WNDPROC, GetWindowLong(Handle, GWL_WNDPROC));
Unicode input using WM_UNICHAR
If you cannot modify your framework, then you may not be able to create your windows as Unicode windows, or receive Unicode WM_CHAR messages.
Fortunately, there is an alternative: WM_UNICHAR (0x0109).
Microsoft created the WM_UNICHAR message as a standard alternative for Unicode
character input, especially for Keyman. The WM_UNICHAR message is easy to
handle (again, from Delphi):
procedure TSomeWindow.WMUniChar(var Message: TWMChar); begin if FAllowUnicodeInput then begin if Message.CharCode = UNICODE_NOCHAR then Message.Result := 1 else InsertUTF32Character(Message.CharCode); end else Message.Result := 0; end;
The WM_UNICHAR message is first sent to a window,
with UNICODE_NOCHAR (0xFFFF) in order to determine if the window supports it.
If not handled, DefWindowProc will return 0 (on all versions of Windows),
indicating that WM_UNICHAR is not supported. You must return 1 from your
window procedure in order to receive input via WM_UNICHAR. After returning 1,
subsequent characters will be posted in, by Keyman, via WM_UNICHAR instead of
by WM_CHAR. Note that WM_UNICHAR uses UTF-32 instead of UTF-16 which most
Windows applications use.
The following C macros can be helpful in converting between
UTF-16 surrogate pairs and UTF-32 (see www.unicode.org
for more detail on UTF-16 surrogate pairs, and the supplementary planes):
#define Uni_IsSurrogate1(ch) ((ch) >= 0xD800 && (ch) <= 0xDBFF) #define Uni_IsSurrogate2(ch) ((ch) >= 0xDC00 && (ch) <= 0xDFFF) #define Uni_SurrogateToUTF32(ch, cl) (((ch) - 0xD800) * 0x400 + ((cl) - 0xDC00) + 0x10000) #define Uni_UTF32ToSurrogate1(ch) (((ch) - 0x10000) / 0x400 + 0xD800) #define Uni_UTF32ToSurrogate2(ch) (((ch) - 0x10000) % 0x400 + 0xDC00)
Which approach should I use?
The WM_UNICHAR message is supported by Keyman Engine on all Windows
platforms. So as a simple solution, you could just
support that. However, that would mean that newer Windows XP, Vista and Windows 7 keyboards (e.g.
some Asian language keyboards) would not work in your application. This decision really depends on your user base and the level of complexity involved in a complete solution.
0 thoughts on “Accepting Unicode input in your Windows application”