[Matroska-devel] mmg.exe 2 Bugs Related to Charset

Moritz Bunkus moritz at bunkus.org
Wed Dec 29 11:29:57 CET 2004


I'm not talking about _job_ files. For those I simply use wxWidgets own
functions. This will stay as it is as long as I use wxWidgets without
Unicode support.

What I am talking about is the files created directly for the
communication with mkvmerge itself when it is started for muxing. These
files are in fact in UTF-8. You just never see them because they're
deleted right after muxing ;)

All this i18n stuff is giving me several headaches. I don't think I will
get it right in 1.2.0. But after that I think I will make a drastic
change and convert mmg (and mkvmerge too, I fear) to use Unicode
internally. This will make Windows 9x users unhappy I guess, but there's
always Unicows, isn't there?

On Linux this move will also have some drawbacks: gcc 2.95's C++
implementation has a serious bug in the wstring class implementation
which makes it impossible to use. Well, gcc 2.95 is old, but I know a
lot of people who still use it.

> Actually, there is a psychological reason too:
> A string in a W. European langauge is roughly readable even if 
> it is in UTF-8 but is interpreted as WinLatin;
> while a Japanese-language string is completely foobared if it is 
> in UTF-8 but is interpreted as WinCP (SHIFT_JIS).

I could write those files in UTF-16, of course, but again that would
require some drastic changes. None of my routines are able to handle
wide char strings -- they all assume that a string ends once a 0 byte is


If Darl McBride was in charge, he'd probably make marriage
unconstitutional too, since clearly it de-emphasizes the commercial
nature of normal human interaction, and probably is a major impediment
to the commercial growth of prostitution. - Linus Torvalds

More information about the Matroska-devel mailing list