[Matroska-devel] Several (minor) issues or underspecified areas in the MKV spec

Dave Rice dave at dericed.com
Wed Oct 14 20:31:01 CEST 2015


> On Oct 14, 2015, at 2:09 PM, Steve Lhomme <slhomme at matroska.org> wrote:
> 
> 2015-10-12 19:10 GMT+02:00 Michael Bradshaw <mjbshaw at google.com>:
>> On Sun, Oct 11, 2015 at 12:43 AM, Steve Lhomme <slhomme at matroska.org> wrote:
>>> 
>>> 2015-10-05 18:47 GMT+02:00 Michael Bradshaw <mjbshaw at google.com>:
>>>> How should a EBMLMaxSizeLength > 8 be handled if it occurs after the
>>>> element
>>>> that needs it (specific edge case: DocType has a size length of 9, but
>>>> DocType occurs before EBMLMaxSizeLength in the header; how should that
>>>> be
>>>> handled?) (alternate edge case: a Void element occurring in (or before)
>>>> an
>>>> EBML element with a size length is > 8 and occurring before
>>>> EBMLMaxSizeLength). Should the spec explicitly require parsers to parse
>>>> as
>>>> if EBMLMaxSizeLength is 8 unless and until explicitly told otherwise?
>>>> Do the limitations of EBMLMaxSizeLength apply to the document
>>>> immediately?
>>> 
>>> The values in the EBML Header describe what the EBML parser will need
>>> to parse the EBML Stream. On the other hand it should always be safe
>>> to read the EBML Header even if your parser cannot handle the Stream
>>> due to internal limitations. So we may define in the EBML specs that,
>>> for the EBML Header, the ID Length must not be longer than 4 and the
>>> Size Length may never be more than 8, maybe even 4 (I'd favor 4).
>> 
>> 
>> That would be great if that could be mentioned in the EBML spec (and I'd
>> favor 4 as well).
> 
> I updated the (Matroska) specs to define this.

Is there a way to have version control on Matroska specs. I'm not sure how to study the change referenced.

>>>> Shouldn't EBMLMaxIDLength have a range of > 3 (given that the EBML
>>>> element
>>>> has an ID length of 4)?
>>> 
>>> Not necessarily, as small EBML Doctypes may not need that much and
>>> favor saving container space. As said above, the values in the EBML
>>> Header describe the Doctype, the EBML Stream. Not the EBML Header
>>> itself. We should definitely clarify that in the specs.
>> 
>> 
>> This too would be great to have in the specs.
> 
> I split the table of the EBML header and Matroska doctype definition
> in the Matroska specs, explaining the EBML ID and Size should have be
> maximum of 4 octets long.
> 
>>>> Typo in the EBML spec in the Length definition for the Binary data type:
>>>> “A
>>>> Master-element” should be “A Byte Element”
>>> 
>>> Which document? I could not find this.
>> 
>> 
>> The spec at:
>> https://github.com/Matroska-Org/ebml-specification/blob/master/specification.markdown#ebml-element-types
>> 
>> It's in the portion of the EBML spec that defines the Length for the
>> "Element Data Type: Binary".
>> 
>>> 
>>> 
>>>> The EBML spec says that the Reserved ID (all bits set to 1) is the only
>>>> ID
>>>> that may change the Length Descriptor (the count of leading zeroes + 1).
>>>> What exactly does it mean to "change the Length Descriptor?" Does this
>>>> mean
>>>> a Length Descriptor can be > 4 (even if EBMLMaxIDLength = 4) if the ID
>>>> is
>>>> the Reserved ID?
>>> 
>>> I think it doesn't make sense as it is. I think it refers to the fact
>>> that IDs should always be coded in their lowest form. But when all set
>>> to 1 or 0, there's no default size.
>> 
>> 
>> I personally don't think it makes sense to consider IDs with all VINT_DATA
>> bits set to 1 as being the same ID. I consider them all as distinct IDs
>> (which makes sense if you think of reading them like an unsigned variable
>> sized integer; they're IDs that represent the integer values 0, 127, 16383,
>> 2097151, and 268435455). I read the EBML spec reserving these IDs as a means
>> of forward compatibility, reserving 5 IDs* for future revisions to the spec.
>> Should future revisions to EBML require new elements, it can safely draw
>> from these 5 IDs.
>> 
>> If that's not the purpose of these reserved IDs, then I don't think it makes
>> sense to reserve them in the first place (and documents should be free to
>> use them, especially the very valuable Class A elements).
>> 
>> *The IDs (including the VINT_WIDTH and VINT_MARKER bits):
>> 0x80
>> 0xff
>> 0x7fff
>> 0x3fffff
>> 0x1fffffff
>> (etc. for longer IDs)
> 
> Originally the idea was that the code to parse the length and the ID
> should be the same to reduce code (& complexity). But in the end even
> libebml just compare IDs in their lowest form and using otherwise
> doesn't make much sense format-wise.


[...]

Dave


More information about the Matroska-devel mailing list