[Matroska-users] semantics specify encoded values or unencoded values?

Matthew Heaney matthewjheaney at hotmail.com
Fri Jan 22 04:55:11 CET 2010

The specification here describes the byte stream of the EBML header and its
children (as well as other elements):


This describes the raw bytes in the stream, in their encoded form (in which
the first byte seen in the stream indicates the length of the integer ID
value that follows).

Does the Mastroska specification require that the byte stream exactly match
what is listed?  Or does the specification allow that EBML tags might be
encoded using more than the minimum number of bytes?

For example, the EBML Version tag is described as [42][86].  This is one
representation of the unencoded value 0x0286 (a representation that happens
to be minimal).  However, the corresponding unencoded value (0x0286) can
have other encodings, for example 0x020286 or 0x01000286.  These aren't
minimal, of course, but what does the standard require?  Are predefined tags
required to have the minimum-length encoding (as is listed on the web page
cited above) in the byte stream?

To use a concrete example, are the byte stream sequences [02][02][86] and
[01][00][02][86] valid tags for the EBML Version?


More information about the Matroska-users mailing list