[Matroska-devel] Several (minor) issues or underspecified areas in the MKV spec

Steve Lhomme slhomme at matroska.org
Thu Oct 15 07:19:49 CEST 2015


2015-10-14 20:31 GMT+02:00 Dave Rice <dave at dericed.com>:
>
>> On Oct 14, 2015, at 2:09 PM, Steve Lhomme <slhomme at matroska.org> wrote:
>>
>> 2015-10-12 19:10 GMT+02:00 Michael Bradshaw <mjbshaw at google.com>:
>>> On Sun, Oct 11, 2015 at 12:43 AM, Steve Lhomme <slhomme at matroska.org> wrote:
>>>>
>>>> 2015-10-05 18:47 GMT+02:00 Michael Bradshaw <mjbshaw at google.com>:
>>>>> How should a EBMLMaxSizeLength > 8 be handled if it occurs after the
>>>>> element
>>>>> that needs it (specific edge case: DocType has a size length of 9, but
>>>>> DocType occurs before EBMLMaxSizeLength in the header; how should that
>>>>> be
>>>>> handled?) (alternate edge case: a Void element occurring in (or before)
>>>>> an
>>>>> EBML element with a size length is > 8 and occurring before
>>>>> EBMLMaxSizeLength). Should the spec explicitly require parsers to parse
>>>>> as
>>>>> if EBMLMaxSizeLength is 8 unless and until explicitly told otherwise?
>>>>> Do the limitations of EBMLMaxSizeLength apply to the document
>>>>> immediately?
>>>>
>>>> The values in the EBML Header describe what the EBML parser will need
>>>> to parse the EBML Stream. On the other hand it should always be safe
>>>> to read the EBML Header even if your parser cannot handle the Stream
>>>> due to internal limitations. So we may define in the EBML specs that,
>>>> for the EBML Header, the ID Length must not be longer than 4 and the
>>>> Size Length may never be more than 8, maybe even 4 (I'd favor 4).
>>>
>>>
>>> That would be great if that could be mentioned in the EBML spec (and I'd
>>> favor 4 as well).
>>
>> I updated the (Matroska) specs to define this.
>
> Is there a way to have version control on Matroska specs. I'm not sure how to study the change referenced.

There is a revision system on Drupal http://matroska.org/node/1/revisions
But it seems it doesn't show the actual diff between revisions. Not
sure it's possible to have Drupal grab files from git or git push
files to Drupal.

>>>>> Shouldn't EBMLMaxIDLength have a range of > 3 (given that the EBML
>>>>> element
>>>>> has an ID length of 4)?
>>>>
>>>> Not necessarily, as small EBML Doctypes may not need that much and
>>>> favor saving container space. As said above, the values in the EBML
>>>> Header describe the Doctype, the EBML Stream. Not the EBML Header
>>>> itself. We should definitely clarify that in the specs.
>>>
>>>
>>> This too would be great to have in the specs.
>>
>> I split the table of the EBML header and Matroska doctype definition
>> in the Matroska specs, explaining the EBML ID and Size should have be
>> maximum of 4 octets long.
>>
>>>>> Typo in the EBML spec in the Length definition for the Binary data type:
>>>>> “A
>>>>> Master-element” should be “A Byte Element”
>>>>
>>>> Which document? I could not find this.
>>>
>>>
>>> The spec at:
>>> https://github.com/Matroska-Org/ebml-specification/blob/master/specification.markdown#ebml-element-types
>>>
>>> It's in the portion of the EBML spec that defines the Length for the
>>> "Element Data Type: Binary".
>>>
>>>>
>>>>
>>>>> The EBML spec says that the Reserved ID (all bits set to 1) is the only
>>>>> ID
>>>>> that may change the Length Descriptor (the count of leading zeroes + 1).
>>>>> What exactly does it mean to "change the Length Descriptor?" Does this
>>>>> mean
>>>>> a Length Descriptor can be > 4 (even if EBMLMaxIDLength = 4) if the ID
>>>>> is
>>>>> the Reserved ID?
>>>>
>>>> I think it doesn't make sense as it is. I think it refers to the fact
>>>> that IDs should always be coded in their lowest form. But when all set
>>>> to 1 or 0, there's no default size.
>>>
>>>
>>> I personally don't think it makes sense to consider IDs with all VINT_DATA
>>> bits set to 1 as being the same ID. I consider them all as distinct IDs
>>> (which makes sense if you think of reading them like an unsigned variable
>>> sized integer; they're IDs that represent the integer values 0, 127, 16383,
>>> 2097151, and 268435455). I read the EBML spec reserving these IDs as a means
>>> of forward compatibility, reserving 5 IDs* for future revisions to the spec.
>>> Should future revisions to EBML require new elements, it can safely draw
>>> from these 5 IDs.
>>>
>>> If that's not the purpose of these reserved IDs, then I don't think it makes
>>> sense to reserve them in the first place (and documents should be free to
>>> use them, especially the very valuable Class A elements).
>>>
>>> *The IDs (including the VINT_WIDTH and VINT_MARKER bits):
>>> 0x80
>>> 0xff
>>> 0x7fff
>>> 0x3fffff
>>> 0x1fffffff
>>> (etc. for longer IDs)
>>
>> Originally the idea was that the code to parse the length and the ID
>> should be the same to reduce code (& complexity). But in the end even
>> libebml just compare IDs in their lowest form and using otherwise
>> doesn't make much sense format-wise.
>
>
> [...]
>
> Dave
> _______________________________________________
> Matroska-devel mailing list
> Matroska-devel at lists.matroska.org
> http://lists.matroska.org/cgi-bin/mailman/listinfo/matroska-devel
> Read Matroska-Devel on GMane: http://dir.gmane.org/gmane.comp.multimedia.matroska.devel



-- 
Steve Lhomme
Matroska association Chairman


More information about the Matroska-devel mailing list