UTF-8 conversion issues on legacy Windows

Short post: On Windows, UTF-16 was the dominant locale, and UTF-8 was something only to convert to and from. (Microsoft jumped the gun before Unicode expanded the address space.) While it got better (Windows 10 can use UTF-8 as an MBCS locale with ANSI APIs), it was historically a lot worse.

For converting, you’d use the MultiByteToWideChar and its opposite WideCharToMultiByte. On legacy Windows, they have slightly confusing semantics. Specifically, with flags. While Vista on introduced many flags that can be used with the UTF-8 codepage (to deal with the quirks of conversion, like invalid characters), previously only MB_ERR_INVALID_CHARS was allowed, and only if you were running XP or 2000 SP4. Before that, you can’t have any flags if you’re converting to or from UTF-16 and UTF-8. It’s unfortunately a little dangerous, but that’s the rub.

Leave a Reply

Your email address will not be published. Required fields are marked *