[Matroska-devel] EBML data type constraints

wm4 nfxjfg at googlemail.com
Thu Jul 2 16:21:11 CEST 2015

On Thu, 02 Jul 2015 15:48:48 +0200
Jerome Martinez <jerome at mediaarea.net> wrote:

> Le 02/07/2015 15:21, wm4 a écrit :
> > I wasn't talking about length 0 strings, but strings with '\0' bytes in
> > them. (Another case we've talked about.)
> http://matroska.org/technical/specs/index.html
> "String - Printable ASCII (0x20 to 0x7E), zero-padded when needed"
> Maybe I missed a conversation, but isn't '\0' already forbidden inside 
> the string?
> Do you speak of trailing '\0'? Trailing '\0' may be useful when we edit 
> a file without having to rewrite the whole file (e.g. removal of 1 
> character, we don't have enough place for a padding element).

You can handle this with the same code path that needs to exist if the
user wants to add a character to a string.

This is really a non-issue. If you edit a file, you will always have to
deal with the case that the whole file has to be rewritten. And most
demuxers already write a padding void element before the start of the
actual media data (clusters) to make editing easier.

On the other hand, allowing zero-padded strings will cause weirdness
with other languages which do byte strings, and where a zero byte won't
automatically terminate a string. IMHO this stupid detail should just
be disallowed (or deprecated).

> I would wonder more about UTF-8 type:
> "UTF-8 - Unicode string, zero padded when needed (RFC 2279)"
> it does not have the >=0x20 limitation. Maybe we should add it.
> "UTF-8 - Printable Unicode string (Unicode character value >=0x20) , 
> zero padded when needed (RFC 2279)"

More information about the Matroska-devel mailing list