[Matroska-devel] Several (minor) issues or underspecified areas in the MKV spec

Michael Bradshaw mjbshaw at google.com
Fri Oct 9 01:22:53 CEST 2015


Hi,

One extra issue with the EBML spec: Element Data Size section says "the
Signed Integer, Unsigned Integer, Float, and Date EBML Element Data Types
have definitions which require a length of at least one octet and thus in
these cases an Element Data Size with all VINT_DATA bits set to zero is
invalid." But the "EBML Element Types" section explicitly states Signed
Integer, Unsigned Integer, Float, and Date elements may all have a
zero-octet size.

On Tue, Oct 6, 2015 at 7:29 AM, Dave Rice <dave at dericed.com> wrote:
>
> On Oct 6, 2015, at 9:49 AM, Michael Bradshaw <mjbshaw at google.com> wrote:
> On Mon, Oct 5, 2015 at 10:15 AM, Dave Rice <dave at dericed.com> wrote:
>>
>> On Oct 5, 2015, at 12:47 PM, Michael Bradshaw <mjbshaw at google.com> wrote:
>>
>>
>>    - How should a EBMLMaxSizeLength > 8 be handled if it occurs after
>>    the element that needs it (specific edge case: DocType has a size length of
>>    9, but DocType occurs before EBMLMaxSizeLength in the header; how should
>>    that be handled?) (alternate edge case: a Void element occurring in (or
>>    before) an EBML element with a size length is > 8 and occurring
>>    before EBMLMaxSizeLength). Should the spec explicitly require parsers to
>>    parse as if EBMLMaxSizeLength is 8 unless and until explicitly told
>>    otherwise?
>>
>> Maybe the documentation for EBMLMaxSizeLength should be clarified
>> as EBMLMaxSizeLength=8 does not mean that the payload of the EBML elements
>> is limited to 8 bytes, it means that the size value of the EBML Element
>> itself is restricted to 8 bytes. I believe that an 8 byte size statement
>> provides something like 72 petabytes. I hope there are no docTypes greater
>> than 72 petabytes in length ;).
>>
>
> Yeah, I know EBMLMaxSizeLength refers to the length (in bytes) of the size
> value, and this is where some of that "extremely unlikely to happen but
> still in the realm of possible" applies :). That said, since the size isn't
> required to be trimmed of unnecessary leading bytes (i.e. "5 can be coded
> 0x000000000005 or 0x0005 or 0x05"), it's totally permissible for the
> encoder to set EBMLMaxSizeLength=10 and have some sizes that use all 10
> bytes, even if the values they store could easily fit in fewer than 8
> bytes. For files like these, I think it's worth clarifying this part of the
> spec.
>
>
> From the spec (in development on github): "Unlike the VINT_DATA of the
> Element ID, the VINT_DATA component of the Element Data Size is NOT
> REQUIRED to be encoded at the shortest valid length. For example, an
> Element Data Size with binary encoding of 1011 1111 or a binary encoding of
> 0100 0000 0011 1111 are both valid Element Data Sizes and both store
> a semantically equal value."
>
> This allows more flexibility in editing EBML Documents without having to
> rewrite too many bytes. For instance if you change a metadata tag value to
> shorten it, you could resave by padding the Element Data Size to a longer
> but equivalent value to make up the missing space from the shortening of
> the value. I suppose you could use a VOID tag in the space of the removed
> data as well, but adjusting the Element Data Size makes it possible to
> accommodate shorten the value by one byte by only rewriting the Element
> Data Size (to use a one byte longer version) and the Element Value which
> would then be one byte shorter.
>

Yes, and I think that's sensible. But my original question remains: should
the spec require EBMLMaxSizeLength be set *before* any element occurs with
a size VINT_WIDTH > 7 (and require parsers to parse as
if EBMLMaxSizeLength=8 until explicitly told otherwise)?


>>    - The EBML spec says that the Reserved ID (all bits set to 1) is the
>>    only ID that may change the Length Descriptor (the count of leading zeroes
>>    + 1). What exactly does it mean to "change the Length Descriptor?" Does
>>    this mean a Length Descriptor can be > 4 (even if EBMLMaxIDLength = 4) iff
>>    the ID is the Reserved ID?
>>
>> Good question, though I'm not sure the answer, this is an older part of
>> the EBML spec that pre-dates my work on it. Some related discussions on
>> this are here: https://github.com/Matroska-Org/ebml-specification/pull/15
>>
>
> Who would be good to ask for clarification? If we can't figure out exactly
> what it means, would it make more sense to just remove it from the spec?
>
>
> Steve or Mortiz?
>
> But actually I think this should be rewritten. The same concept is
> referred to both as the VINT_WIDTH and the Length Descriptor.
>
> I propose to remove this line:
> "The leading bits of the Class IDs are used to identify the length of the
> ID. The number of leading 0's + 1 is the length of the ID in octets. We
> will refer to the leading bits as the Length Descriptor."
> as it is redundant to the more descriptive VINT_WIDTH definition.
>
> And maybe this:
> "The Reserved IDs (all x set to 1) are the only IDs that may change the
> Length Descriptor."
> although I'm not exactly sure what the 'change' means.
>
> IIRC a Reserved ID means that all the bits of the VINT_DATA are set to 1
> (not all bits of the whole VINT), and thus 0b11111111 and 0b01111111111111
> and 0b001111111111111111111111 are all valid Reserved IDs, so the changes
> in the Length Descriptor seem consistent with non-Reserved IDs as well.
>

I think "The Reserved IDs (all bits of VINT_DATA set to 1) are the only IDs
that may change the VINT_WIDTH." should be removed altogether, because the
spec requires an "Element ID MUST NOT" be a Reserved ID. It's weird to
include further definitions for Reserved IDs when a document would be
considered malformed if it had any. Changing VINT_WIDTH sounds nonsensical
when an EBML document can't contain a Reserved ID in the first place.

(I'd send a PR for these issues but I haven't gotten around to clearing the
patches with my manager, and then with the legal department, etc... I can
try to make one but it's going to take some time)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.matroska.org/pipermail/matroska-devel/attachments/20151008/1786dfe2/attachment.html>


More information about the Matroska-devel mailing list