[Matroska-devel] Several (minor) issues or underspecified areas in the MKV spec

Dave Rice dave at dericed.com
Tue Oct 6 16:29:43 CEST 2015


Hi,

> On Oct 6, 2015, at 9:49 AM, Michael Bradshaw <mjbshaw at google.com> wrote:
> 
> On Mon, Oct 5, 2015 at 10:15 AM, Dave Rice <dave at dericed.com <mailto:dave at dericed.com>> wrote:
>> On Oct 5, 2015, at 12:47 PM, Michael Bradshaw <mjbshaw at google.com <mailto:mjbshaw at google.com>> wrote:
>> How should a EBMLMaxSizeLength > 8 be handled if it occurs after the element that needs it (specific edge case: DocType has a size length of 9, but DocType occurs before EBMLMaxSizeLength in the header; how should that be handled?) (alternate edge case: a Void element occurring in (or before) an EBML element with a size length is > 8 and occurring before EBMLMaxSizeLength). Should the spec explicitly require parsers to parse as if EBMLMaxSizeLength is 8 unless and until explicitly told otherwise?
> 
> Maybe the documentation for EBMLMaxSizeLength should be clarified as EBMLMaxSizeLength=8 does not mean that the payload of the EBML elements is limited to 8 bytes, it means that the size value of the EBML Element itself is restricted to 8 bytes. I believe that an 8 byte size statement provides something like 72 petabytes. I hope there are no docTypes greater than 72 petabytes in length ;).
>  
> Yeah, I know EBMLMaxSizeLength refers to the length (in bytes) of the size value, and this is where some of that "extremely unlikely to happen but still in the realm of possible" applies :). That said, since the size isn't required to be trimmed of unnecessary leading bytes (i.e. "5 can be coded 0x000000000005 or 0x0005 or 0x05"), it's totally permissible for the encoder to set EBMLMaxSizeLength=10 and have some sizes that use all 10 bytes, even if the values they store could easily fit in fewer than 8 bytes. For files like these, I think it's worth clarifying this part of the spec.

From the spec (in development on github): "Unlike the VINT_DATA of the Element ID, the VINT_DATA component of the Element Data Size is NOT REQUIRED to be encoded at the shortest valid length. For example, an Element Data Size with binary encoding of 1011 1111 or a binary encoding of 0100 0000 0011 1111 are both valid Element Data Sizes and both store a semantically equal value."

This allows more flexibility in editing EBML Documents without having to rewrite too many bytes. For instance if you change a metadata tag value to shorten it, you could resave by padding the Element Data Size to a longer but equivalent value to make up the missing space from the shortening of the value. I suppose you could use a VOID tag in the space of the removed data as well, but adjusting the Element Data Size makes it possible to accommodate shorten the value by one byte by only rewriting the Element Data Size (to use a one byte longer version) and the Element Value which would then be one byte shorter.
>> The EBML spec says that the Reserved ID (all bits set to 1) is the only ID that may change the Length Descriptor (the count of leading zeroes + 1). What exactly does it mean to "change the Length Descriptor?" Does this mean a Length Descriptor can be > 4 (even if EBMLMaxIDLength = 4) iff the ID is the Reserved ID?
> 
> Good question, though I'm not sure the answer, this is an older part of the EBML spec that pre-dates my work on it. Some related discussions on this are here: https://github.com/Matroska-Org/ebml-specification/pull/15 <https://github.com/Matroska-Org/ebml-specification/pull/15>
> 
> Who would be good to ask for clarification? If we can't figure out exactly what it means, would it make more sense to just remove it from the spec?

Steve or Mortiz?

But actually I think this should be rewritten. The same concept is referred to both as the VINT_WIDTH and the Length Descriptor.

I propose to remove this line:
"The leading bits of the Class IDs are used to identify the length of the ID. The number of leading 0's + 1 is the length of the ID in octets. We will refer to the leading bits as the Length Descriptor."
as it is redundant to the more descriptive VINT_WIDTH definition.

And maybe this:
"The Reserved IDs (all x set to 1) are the only IDs that may change the Length Descriptor."
although I'm not exactly sure what the 'change' means.

IIRC a Reserved ID means that all the bits of the VINT_DATA are set to 1 (not all bits of the whole VINT), and thus 0b11111111 and 0b01111111111111 and 0b001111111111111111111111 are all valid Reserved IDs, so the changes in the Length Descriptor seem consistent with non-Reserved IDs as well.

[...]

Dave Rice

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.matroska.org/pipermail/matroska-devel/attachments/20151006/a4e6321c/attachment.html>


More information about the Matroska-devel mailing list