[Matroska-devel] Several (minor) issues or underspecified areas in the MKV spec

Steve Lhomme slhomme at matroska.org
Sun Oct 11 09:43:32 CEST 2015

2015-10-05 18:47 GMT+02:00 Michael Bradshaw <mjbshaw at google.com>:
> On Mon, Oct 5, 2015 at 8:03 AM, Dave Rice <dave at dericed.com> wrote:
>> I'm working on the EBML specification (the one being drafted on GitHub)
>> quite a bit. What are the questions to EBML?
> Preface: some of these are weird corner cases that are extremely unlikely to
> occur for anyone doing anything sane. That said, I think parsers should
> consistently (or even gracefully) handle the insane, and in order to do that
> I think these corner cases should be clarified in the spec.
> Can a global element (i.e. Void, CRC-32) occur before an EBML element? If
> so, are they considered part of the document (as is, it seems like an EBML
> document is implicitly defined as everything between an EBML header and then
> next EBML header (or EOF), in which case they are not considered part of the
> EBML document)?

The CRC-32 cannot because it has to be in a Master-Element to make sense.

The Void element could be placed between the "EBML Header" and the
"EBML Stream", to reserve space for later editing, for example. It may
belong to the EBML Header if its size fits the inside. Or it may be
in-between the Header and the Stream. The Stream actually starts with
a "level 0" element of the Stream described in the Doctype. What's
before and is not in the EBML Header can be discarded.

> How should a EBMLMaxSizeLength > 8 be handled if it occurs after the element
> that needs it (specific edge case: DocType has a size length of 9, but
> DocType occurs before EBMLMaxSizeLength in the header; how should that be
> handled?) (alternate edge case: a Void element occurring in (or before) an
> EBML element with a size length is > 8 and occurring before
> EBMLMaxSizeLength). Should the spec explicitly require parsers to parse as
> if EBMLMaxSizeLength is 8 unless and until explicitly told otherwise?
> Do the limitations of EBMLMaxSizeLength apply to the document immediately?

The values in the EBML Header describe what the EBML parser will need
to parse the EBML Stream. On the other hand it should always be safe
to read the EBML Header even if your parser cannot handle the Stream
due to internal limitations. So we may define in the EBML specs that,
for the EBML Header, the ID Length must not be longer than 4 and the
Size Length may never be more than 8, maybe even 4 (I'd favor 4).

> Shouldn't EBMLMaxIDLength have a range of > 3 (given that the EBML element
> has an ID length of 4)?

Not necessarily, as small EBML Doctypes may not need that much and
favor saving container space. As said above, the values in the EBML
Header describe the Doctype, the EBML Stream. Not the EBML Header
itself. We should definitely clarify that in the specs.

> Shouldn't EBMLMaxSizeLength have a range of > 0?

Correct, it cannot be 0, just like EBMLMaxIDLength. I edited the
Matroska specs accordingly.

> That is, if EBMLMaxSizeLength is 1, does that apply to elements in the EBML
> header immediately after it is encountered, meaning that if DocType followed
> it it must have a length < 127?

No, the "parsing context" defined by the EBML Header cannot be used
while it's being created.

> Typo in the EBML spec in the Length definition for the Binary data type: “A
> Master-element” should be “A Byte Element”

Which document? I could not find this.

> The EBML spec says that the Reserved ID (all bits set to 1) is the only ID
> that may change the Length Descriptor (the count of leading zeroes + 1).
> What exactly does it mean to "change the Length Descriptor?" Does this mean
> a Length Descriptor can be > 4 (even if EBMLMaxIDLength = 4) if the ID is
> the Reserved ID?

I think it doesn't make sense as it is. I think it refers to the fact
that IDs should always be coded in their lowest form. But when all set
to 1 or 0, there's no default size.

Dave: I'm not sure we defined that in the EBML specs yet but I think
we should (in a clearer form). I'm also not sure we defined what a
parser should do when encountering an reserved element (all ID data
set to 1 or 0). IMO it should just skip the element, rather than
consider the stream invalid/broken. That may be an easy way to
remove/clear some elements in the stream rather than rewriting a Void
element on top.

We may use only one of these 2 reserved values and call it "Clear"
element (so probably all 0).

> Thanks,
> --Michael
> _______________________________________________
> Matroska-devel mailing list
> Matroska-devel at lists.matroska.org
> http://lists.matroska.org/cgi-bin/mailman/listinfo/matroska-devel
> Read Matroska-Devel on GMane:
> http://dir.gmane.org/gmane.comp.multimedia.matroska.devel

Steve Lhomme
Matroska association Chairman

More information about the Matroska-devel mailing list