[Matroska-devel] Several (minor) issues or underspecified areas in the MKV spec

Steve Lhomme slhomme at matroska.org
Wed Oct 14 20:09:12 CEST 2015

2015-10-12 19:10 GMT+02:00 Michael Bradshaw <mjbshaw at google.com>:
> On Sun, Oct 11, 2015 at 12:43 AM, Steve Lhomme <slhomme at matroska.org> wrote:
>> 2015-10-05 18:47 GMT+02:00 Michael Bradshaw <mjbshaw at google.com>:
>> > How should a EBMLMaxSizeLength > 8 be handled if it occurs after the
>> > element
>> > that needs it (specific edge case: DocType has a size length of 9, but
>> > DocType occurs before EBMLMaxSizeLength in the header; how should that
>> > be
>> > handled?) (alternate edge case: a Void element occurring in (or before)
>> > an
>> > EBML element with a size length is > 8 and occurring before
>> > EBMLMaxSizeLength). Should the spec explicitly require parsers to parse
>> > as
>> > if EBMLMaxSizeLength is 8 unless and until explicitly told otherwise?
>> > Do the limitations of EBMLMaxSizeLength apply to the document
>> > immediately?
>> The values in the EBML Header describe what the EBML parser will need
>> to parse the EBML Stream. On the other hand it should always be safe
>> to read the EBML Header even if your parser cannot handle the Stream
>> due to internal limitations. So we may define in the EBML specs that,
>> for the EBML Header, the ID Length must not be longer than 4 and the
>> Size Length may never be more than 8, maybe even 4 (I'd favor 4).
> That would be great if that could be mentioned in the EBML spec (and I'd
> favor 4 as well).

I updated the (Matroska) specs to define this.

>> > Shouldn't EBMLMaxIDLength have a range of > 3 (given that the EBML
>> > element
>> > has an ID length of 4)?
>> Not necessarily, as small EBML Doctypes may not need that much and
>> favor saving container space. As said above, the values in the EBML
>> Header describe the Doctype, the EBML Stream. Not the EBML Header
>> itself. We should definitely clarify that in the specs.
> This too would be great to have in the specs.

I split the table of the EBML header and Matroska doctype definition
in the Matroska specs, explaining the EBML ID and Size should have be
maximum of 4 octets long.

>> > Typo in the EBML spec in the Length definition for the Binary data type:
>> > “A
>> > Master-element” should be “A Byte Element”
>> Which document? I could not find this.
> The spec at:
> https://github.com/Matroska-Org/ebml-specification/blob/master/specification.markdown#ebml-element-types
> It's in the portion of the EBML spec that defines the Length for the
> "Element Data Type: Binary".
>> > The EBML spec says that the Reserved ID (all bits set to 1) is the only
>> > ID
>> > that may change the Length Descriptor (the count of leading zeroes + 1).
>> > What exactly does it mean to "change the Length Descriptor?" Does this
>> > mean
>> > a Length Descriptor can be > 4 (even if EBMLMaxIDLength = 4) if the ID
>> > is
>> > the Reserved ID?
>> I think it doesn't make sense as it is. I think it refers to the fact
>> that IDs should always be coded in their lowest form. But when all set
>> to 1 or 0, there's no default size.
> I personally don't think it makes sense to consider IDs with all VINT_DATA
> bits set to 1 as being the same ID. I consider them all as distinct IDs
> (which makes sense if you think of reading them like an unsigned variable
> sized integer; they're IDs that represent the integer values 0, 127, 16383,
> 2097151, and 268435455). I read the EBML spec reserving these IDs as a means
> of forward compatibility, reserving 5 IDs* for future revisions to the spec.
> Should future revisions to EBML require new elements, it can safely draw
> from these 5 IDs.
> If that's not the purpose of these reserved IDs, then I don't think it makes
> sense to reserve them in the first place (and documents should be free to
> use them, especially the very valuable Class A elements).
> *The IDs (including the VINT_WIDTH and VINT_MARKER bits):
> 0x80
> 0xff
> 0x7fff
> 0x3fffff
> 0x1fffffff
> (etc. for longer IDs)

Originally the idea was that the code to parse the length and the ID
should be the same to reduce code (& complexity). But in the end even
libebml just compare IDs in their lowest form and using otherwise
doesn't make much sense format-wise.

> _______________________________________________
> Matroska-devel mailing list
> Matroska-devel at lists.matroska.org
> http://lists.matroska.org/cgi-bin/mailman/listinfo/matroska-devel
> Read Matroska-Devel on GMane:
> http://dir.gmane.org/gmane.comp.multimedia.matroska.devel

Steve Lhomme
Matroska association Chairman

More information about the Matroska-devel mailing list