[Matroska-devel] Several (minor) issues or underspecified areas in the MKV spec

Dave Rice dave at dericed.com
Fri Oct 16 08:32:50 CEST 2015


> On Oct 8, 2015, at 7:22 PM, Michael Bradshaw <mjbshaw at google.com> wrote:
> Hi,
> One extra issue with the EBML spec: Element Data Size section says "the Signed Integer, Unsigned Integer, Float, and Date EBML Element Data Types have definitions which require a length of at least one octet and thus in these cases an Element Data Size with all VINT_DATA bits set to zero is invalid." But the "EBML Element Types" section explicitly states Signed Integer, Unsigned Integer, Float, and Date elements may all have a zero-octet size.

Ooops, there were some long discussions on resolving specification conflicts about Empty Elements. Eventually zero-length elements were allowed but I forgot to update this line. Fixed here: https://github.com/Matroska-Org/ebml-specification/pull/30 <https://github.com/Matroska-Org/ebml-specification/pull/30>.

> On Tue, Oct 6, 2015 at 7:29 AM, Dave Rice <dave at dericed.com <mailto:dave at dericed.com>> wrote:
>> On Oct 6, 2015, at 9:49 AM, Michael Bradshaw <mjbshaw at google.com <mailto:mjbshaw at google.com>> wrote:
>> On Mon, Oct 5, 2015 at 10:15 AM, Dave Rice <dave at dericed.com <mailto:dave at dericed.com>> wrote:
>>> On Oct 5, 2015, at 12:47 PM, Michael Bradshaw <mjbshaw at google.com <mailto:mjbshaw at google.com>> wrote:
>>> How should a EBMLMaxSizeLength > 8 be handled if it occurs after the element that needs it (specific edge case: DocType has a size length of 9, but DocType occurs before EBMLMaxSizeLength in the header; how should that be handled?) (alternate edge case: a Void element occurring in (or before) an EBML element with a size length is > 8 and occurring before EBMLMaxSizeLength). Should the spec explicitly require parsers to parse as if EBMLMaxSizeLength is 8 unless and until explicitly told otherwise?
>> Maybe the documentation for EBMLMaxSizeLength should be clarified as EBMLMaxSizeLength=8 does not mean that the payload of the EBML elements is limited to 8 bytes, it means that the size value of the EBML Element itself is restricted to 8 bytes. I believe that an 8 byte size statement provides something like 72 petabytes. I hope there are no docTypes greater than 72 petabytes in length ;).
>> Yeah, I know EBMLMaxSizeLength refers to the length (in bytes) of the size value, and this is where some of that "extremely unlikely to happen but still in the realm of possible" applies :). That said, since the size isn't required to be trimmed of unnecessary leading bytes (i.e. "5 can be coded 0x000000000005 or 0x0005 or 0x05"), it's totally permissible for the encoder to set EBMLMaxSizeLength=10 and have some sizes that use all 10 bytes, even if the values they store could easily fit in fewer than 8 bytes. For files like these, I think it's worth clarifying this part of the spec.
> From the spec (in development on github): "Unlike the VINT_DATA of the Element ID, the VINT_DATA component of the Element Data Size is NOT REQUIRED to be encoded at the shortest valid length. For example, an Element Data Size with binary encoding of 1011 1111 or a binary encoding of 0100 0000 0011 1111 are both valid Element Data Sizes and both store a semantically equal value."
> This allows more flexibility in editing EBML Documents without having to rewrite too many bytes. For instance if you change a metadata tag value to shorten it, you could resave by padding the Element Data Size to a longer but equivalent value to make up the missing space from the shortening of the value. I suppose you could use a VOID tag in the space of the removed data as well, but adjusting the Element Data Size makes it possible to accommodate shorten the value by one byte by only rewriting the Element Data Size (to use a one byte longer version) and the Element Value which would then be one byte shorter.
> Yes, and I think that's sensible. But my original question remains: should the spec require EBMLMaxSizeLength be set *before* any element occurs with a size VINT_WIDTH > 7 (and require parsers to parse as if EBMLMaxSizeLength=8 until explicitly told otherwise)?

Perhaps it's more sane to define EBML Headers to limit the size to 8. Then the EBMLMaxSizeLength would really only apply to the non-EBML Header parts of the EBML Document. ?
>>> The EBML spec says that the Reserved ID (all bits set to 1) is the only ID that may change the Length Descriptor (the count of leading zeroes + 1). What exactly does it mean to "change the Length Descriptor?" Does this mean a Length Descriptor can be > 4 (even if EBMLMaxIDLength = 4) iff the ID is the Reserved ID?
>> Good question, though I'm not sure the answer, this is an older part of the EBML spec that pre-dates my work on it. Some related discussions on this are here: https://github.com/Matroska-Org/ebml-specification/pull/15 <https://github.com/Matroska-Org/ebml-specification/pull/15>
>> Who would be good to ask for clarification? If we can't figure out exactly what it means, would it make more sense to just remove it from the spec?
> Steve or Mortiz?
> But actually I think this should be rewritten. The same concept is referred to both as the VINT_WIDTH and the Length Descriptor.
> I propose to remove this line:
> "The leading bits of the Class IDs are used to identify the length of the ID. The number of leading 0's + 1 is the length of the ID in octets. We will refer to the leading bits as the Length Descriptor."
> as it is redundant to the more descriptive VINT_WIDTH definition.
> And maybe this:
> "The Reserved IDs (all x set to 1) are the only IDs that may change the Length Descriptor."
> although I'm not exactly sure what the 'change' means.
> IIRC a Reserved ID means that all the bits of the VINT_DATA are set to 1 (not all bits of the whole VINT), and thus 0b11111111 and 0b01111111111111 and 0b001111111111111111111111 are all valid Reserved IDs, so the changes in the Length Descriptor seem consistent with non-Reserved IDs as well.
> I think "The Reserved IDs (all bits of VINT_DATA set to 1) are the only IDs that may change the VINT_WIDTH." should be removed altogether, because the spec requires an "Element ID MUST NOT" be a Reserved ID. It's weird to include further definitions for Reserved IDs when a document would be considered malformed if it had any. Changing VINT_WIDTH sounds nonsensical when an EBML document can't contain a Reserved ID in the first place.

Done: https://github.com/Matroska-Org/ebml-specification/pull/31 <https://github.com/Matroska-Org/ebml-specification/pull/31>.

> (I'd send a PR for these issues but I haven't gotten around to clearing the patches with my manager, and then with the legal department, etc... I can try to make one but it's going to take some time)

No problem, it may be a simpler endeavor for me to write the PR. The review is appreciated.
Dave Rice

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.matroska.org/pipermail/matroska-devel/attachments/20151016/1e98170a/attachment-0001.html>

More information about the Matroska-devel mailing list