Is there any technical documentation on the internal logic of the MS IME for Japanese?

MSDN_userSince1994_newaccount2023 0 Reputation points
2025-11-06T12:01:18.1633333+00:00

As I am still struggling to improve application compatibility with Japanese keyboard users, I find that the widely posted end user guides for "how to type Japanese" lack enough exactness to allow me to programmatically track the state of the Japanese IME (similar to tracking the 3 keyboard LEDs for western keyboard layouts). What I really need to understand is the logic model behind the IME keyboard interpretation (which is not the same as the unexplained code stuff in the 25 year old keyboard layout sample).

What and how many logic states can the IME be in? I realize that there is Romaji (fullwidth and halfwidth variants), Katakana (fullwidth and halfwidth) and Hiragana (Fullwidth). There is also a traditional caps lock state bit for Romaji. On top of these there seems to be some bit telling the IME if entered characters are subject to further conversion via the conversion keys. But are there additional states? Is there a halfwidth Hiragana state? Are there caps lock states for Katakana or Hiragana? Do the Romaji modes remember the latest non-Romaji mode? Do the non-Romaji modes remember the Romaji caps lock bit?

Also the behavior seems to differ between different program types (notepad classic and the Visual Studio text editor make the IME respond differently to mode switching keys in unexplained ways, other program families may differ even further without obviously special casing Japanese).

Clarification: I am looking for an abstract model against which APIs (including keyboard APIs like SendInput and WM_KEYDOWN) can be logically reasoned about, not an actual revelation of MS internal code. Thus it doesn't matter if a logical state bit is stored as a bit in the "MS IME" TSF provider, the IME API compatibility layer, the "MS IME" paired keyboard DLL, any part of the general keyboard code in USER32.DLL/win32k.sys etc. etc., only that the overall resulting behavior can be described "as if" there is a bit or boolean "somewhere unspecified" that does X and responds to keypress Y in some particular way. This is the same level of abstraction of how the US PC keyboard was documented in the 1981 IBM PC Manuals, as updated by the 1986 IBM PC AT manuals and further updated by the 1995 Microsoft specification for the 3 Windows keys, an abstract model that hasn't changed much since then despite massive changes in the Windows code architecture implementing the model.

Windows development | Windows API - Win32
{count} votes

1 answer

Sort by: Most helpful
  1. Tom Tran (WICLOUD CORPORATION) 3,115 Reputation points Microsoft External Staff Moderator
    2025-11-07T09:44:15.5733333+00:00

    Hi @MSDN_userSince1994_newaccount2023 ,

    Thanks for your detailed questions!

    After digging around, I agree with you that most "how to type Japanese" guides don't expose enough detail. Here's what I managed to find regarding your questions:


    Is there official documentation for the IME’s internal logic?

    I don't think so since Microsoft does not publish the internal IME logic model. The official documentation focuses on APIs. This is probably by design, IME state is exposed through documented flags and compartments rather than a public state machine.

    There is a discussion regarding this, you can check it out if it relates to your current course of action that was last updated last year:

    Disclaimer: This link is not Microsoft official documentation but it is trusted for information. Do not click or download any strange links that you might encounter.


    What and how many logic states can the IME be in? Are there additional states?

    According to IME Conversion Mode Values, conversion flags include:

    • Hiragana (full-width)
    • Katakana (full-width and half-width)
    • Alphanumeric (full-width and half-width)
    • Romaji input (IME_CMODE_ROMAN)
    • Full-shape vs half-shape (IME_CMODE_FULLSHAPE)
    • Conversion on/off (IME_CMODE_NOCONVERSION)

    Is there half-width Hiragana?

    No. The official IME modes list does not include half-width Hiragana. Supported modes are Hiragana (full-width), Katakana (full/half-width), and Alphanumeric (full/half-width).

    Reference: Japanese IME Guide.


    Are there caps lock states for Katakana or Hiragana?

    No. Caps Lock is a keyboard state, not an IME state. The IME does not have a separate “Caps Lock” mode for Katakana or Hiragana.

    The only related IME flag is IME_CMODE_ROMAN (or TF_CONVERSIONMODE_ROMAN in TSF), which controls whether Romaji input is used for kana conversion. This is independent of the physical Caps Lock key.

    IME Conversion Mode Values lists all IME flags. There is no Caps Lock flag, confirming it’s not part of IME state.

    Flags for Conversion Mode (TSF) mirrors IMM32 flags and also lacks any Caps Lock state.


    Do the Romaji modes remember the latest non-Romaji mode? Do the non-Romaji modes remember the Romaji caps lock bit?

    From my understanding, there is no global memory for these modes. IME state is stored per input context:

    • In IMM32, the state is tied to the HIMC (input context) for the window.
    • In TSF, the state is stored in compartments like GUID_COMPARTMENT_KEYBOARD_INPUTMODE_CONVERSION, which belong to the thread manager or document.

    TSF Compartments explains that IME state is stored in compartments tied to a thread or document, not globally.

    ImmGetConversionStatus shows how IMM32 retrieves conversion state for a specific input context, proving it’s context-based.

    This means:

    • If you switch between Romaji and Kana in the same text context, the IME may restore the previous mode but this is not guaranteed across apps.
    • The Romaji flag and Caps Lock are unrelated. The IME does not store a “Caps Lock bit” for Romaji mode.

    Is there a flag for “subject to conversion”?

    Yes. IME_CMODE_NOCONVERSION (IMM32) and TF_CONVERSIONMODE_NOCONVERSION (TSF) indicate direct input without conversion. Combine this with IME open/close state to know if characters will be converted.


    Why does behavior differ between apps like Notepad and Visual Studio?

    I believe some apps use IMM32 (legacy) and others use TSF (modern). TSF tracks state per context and allows custom composition UIs, so mode-switching keys can behave differently. So this might be expected.

    You can read it more here: Text Services Framework Overview.


    I hope this clarifies your questions! If you still have any questions, please feel free to leave a comment. I'll be happy to help!


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.