[Matroska-devel] Several (minor) issues or underspecified areas in the MKV spec

Dave Rice dave at dericed.com
Fri Oct 16 08:51:02 CEST 2015


> On Oct 11, 2015, at 3:43 AM, Steve Lhomme <slhomme at matroska.org> wrote:
> 
> 2015-10-05 18:47 GMT+02:00 Michael Bradshaw <mjbshaw at google.com>:
>> On Mon, Oct 5, 2015 at 8:03 AM, Dave Rice <dave at dericed.com> wrote:
>>> 
>>> I'm working on the EBML specification (the one being drafted on GitHub)
>>> quite a bit. What are the questions to EBML?
>> 
>> 
>> Preface: some of these are weird corner cases that are extremely unlikely to
>> occur for anyone doing anything sane. That said, I think parsers should
>> consistently (or even gracefully) handle the insane, and in order to do that
>> I think these corner cases should be clarified in the spec.
>> 
>> Can a global element (i.e. Void, CRC-32) occur before an EBML element? If
>> so, are they considered part of the document (as is, it seems like an EBML
>> document is implicitly defined as everything between an EBML header and then
>> next EBML header (or EOF), in which case they are not considered part of the
>> EBML document)?
> 
> The CRC-32 cannot because it has to be in a Master-Element to make sense.
> 
> The Void element could be placed between the "EBML Header" and the
> "EBML Stream", to reserve space for later editing, for example. It may
> belong to the EBML Header if its size fits the inside. Or it may be
> in-between the Header and the Stream. The Stream actually starts with
> a "level 0" element of the Stream described in the Doctype. What's
> before and is not in the EBML Header can be discarded.

I used the term "EBML Stream" in a different way in https://github.com/Matroska-Org/ebml-specification/pull/28 <https://github.com/Matroska-Org/ebml-specification/pull/28>. Here I used EBML Stream to mean a stream of many EBML Documents within a file or data stream, but here you use the term to mean the non-EBML Header part of the EBML Document. Preference as to which meaning of EBML Stream is correct? If using the EBML Stream of my PR then perhaps we need another term to mean the non-header part of an EBML Document.

>> How should a EBMLMaxSizeLength > 8 be handled if it occurs after the element
>> that needs it (specific edge case: DocType has a size length of 9, but
>> DocType occurs before EBMLMaxSizeLength in the header; how should that be
>> handled?) (alternate edge case: a Void element occurring in (or before) an
>> EBML element with a size length is > 8 and occurring before
>> EBMLMaxSizeLength). Should the spec explicitly require parsers to parse as
>> if EBMLMaxSizeLength is 8 unless and until explicitly told otherwise?
>> Do the limitations of EBMLMaxSizeLength apply to the document immediately?
> 
> The values in the EBML Header describe what the EBML parser will need
> to parse the EBML Stream. On the other hand it should always be safe
> to read the EBML Header even if your parser cannot handle the Stream
> due to internal limitations. So we may define in the EBML specs that,
> for the EBML Header, the ID Length must not be longer than 4 and the
> Size Length may never be more than 8, maybe even 4 (I'd favor 4).

I added this to https://github.com/Matroska-Org/ebml-specification/pull/28 <https://github.com/Matroska-Org/ebml-specification/pull/28>.

>> Shouldn't EBMLMaxIDLength have a range of > 3 (given that the EBML element
>> has an ID length of 4)?
> 
> Not necessarily, as small EBML Doctypes may not need that much and
> favor saving container space. As said above, the values in the EBML
> Header describe the Doctype, the EBML Stream. Not the EBML Header
> itself. We should definitely clarify that in the specs.

Ah, I wasn't clear on this from any document before. I can add to the PR but will do this another day. 

>> Shouldn't EBMLMaxSizeLength have a range of > 0?
> 
> Correct, it cannot be 0, just like EBMLMaxIDLength. I edited the
> Matroska specs accordingly.

I edited the EBML spec via PR here https://github.com/Matroska-Org/ebml-specification/pull/25 <https://github.com/Matroska-Org/ebml-specification/pull/25>.

>> That is, if EBMLMaxSizeLength is 1, does that apply to elements in the EBML
>> header immediately after it is encountered, meaning that if DocType followed
>> it it must have a length < 127?
> 
> No, the "parsing context" defined by the EBML Header cannot be used
> while it's being created.
> 
>> Typo in the EBML spec in the Length definition for the Binary data type: “A
>> Master-element” should be “A Byte Element”
> 
> Which document? I could not find this.

My mistake, PR here: https://github.com/Matroska-Org/ebml-specification/pull/29

>> The EBML spec says that the Reserved ID (all bits set to 1) is the only ID
>> that may change the Length Descriptor (the count of leading zeroes + 1).
>> What exactly does it mean to "change the Length Descriptor?" Does this mean
>> a Length Descriptor can be > 4 (even if EBMLMaxIDLength = 4) if the ID is
>> the Reserved ID?
> 
> I think it doesn't make sense as it is. I think it refers to the fact
> that IDs should always be coded in their lowest form. But when all set
> to 1 or 0, there's no default size.
> 
> Dave: I'm not sure we defined that in the EBML specs yet but I think
> we should (in a clearer form). I'm also not sure we defined what a
> parser should do when encountering an reserved element (all ID data
> set to 1 or 0). IMO it should just skip the element, rather than
> consider the stream invalid/broken. That may be an easy way to
> remove/clear some elements in the stream rather than rewriting a Void
> element on top.

I think it's fairly clear, see https://github.com/Matroska-Org/ebml-specification/blob/master/specification.markdown#element-id <https://github.com/Matroska-Org/ebml-specification/blob/master/specification.markdown#element-id>. That element ids must not have a vint_data of all 0 or all 1 and must be in the shortest possible expression. It doesn't give instruction to what a parser would do if those rules weren't followed. In a few other places we say if THIS then the ELEMENT is INVALID.

> We may use only one of these 2 reserved values and call it "Clear"
> element (so probably all 0).

Not sure I understand. You're talking about overwriting invalid elements with void or 'clear'?

[...]

Dave Rice
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.matroska.org/pipermail/matroska-devel/attachments/20151016/9d6a412b/attachment.html>


More information about the Matroska-devel mailing list