[Matroska-devel] Several (minor) issues or underspecified areas in the MKV spec

Michael Bradshaw mjbshaw at google.com
Mon Oct 12 19:10:11 CEST 2015

On Sun, Oct 11, 2015 at 12:54 AM, Steve Lhomme <slhomme at matroska.org> wrote:

> 2015-10-06 15:49 GMT+02:00 Michael Bradshaw <mjbshaw at google.com>:
> > On Mon, Oct 5, 2015 at 10:15 AM, Dave Rice <dave at dericed.com> wrote:
> >>
> >> On Oct 5, 2015, at 12:47 PM, Michael Bradshaw <mjbshaw at google.com>
> wrote:
> >>
> >> How should a EBMLMaxSizeLength > 8 be handled if it occurs after the
> >> element that needs it (specific edge case: DocType has a size length of
> 9,
> >> but DocType occurs before EBMLMaxSizeLength in the header; how should
> that
> >> be handled?) (alternate edge case: a Void element occurring in (or
> before)
> >> an EBML element with a size length is > 8 and occurring before
> >> EBMLMaxSizeLength). Should the spec explicitly require parsers to parse
> as
> >> if EBMLMaxSizeLength is 8 unless and until explicitly told otherwise?
> >>
> >> Maybe the documentation for EBMLMaxSizeLength should be clarified as
> >> EBMLMaxSizeLength=8 does not mean that the payload of the EBML elements
> is
> >> limited to 8 bytes, it means that the size value of the EBML Element
> itself
> >> is restricted to 8 bytes. I believe that an 8 byte size statement
> provides
> >> something like 72 petabytes. I hope there are no docTypes greater than
> 72
> >> petabytes in length ;).
> >
> >
> > Yeah, I know EBMLMaxSizeLength refers to the length (in bytes) of the
> size
> > value, and this is where some of that "extremely unlikely to happen but
> > still in the realm of possible" applies :). That said, since the size
> isn't
> > required to be trimmed of unnecessary leading bytes (i.e. "5 can be coded
> > 0x000000000005 or 0x0005 or 0x05"), it's totally permissible for the
> encoder
> > to set EBMLMaxSizeLength=10 and have some sizes that use all 10 bytes,
> even
> > if the values they store could easily fit in fewer than 8 bytes. For
> files
> > like these, I think it's worth clarifying this part of the spec.
> I don't see how that case is undefined. If the EBML Stream (as opposed
> to the header) can be 10 bytes and your parser can handle it (ie it
> read the EBML header, read that size and didn't leave with "error:
> unsupported EBML format"), then if if finds a 10 octets size value, it
> can read it. Even if the value is 5 in the end.

Given your previous email stating that parsing information (max ID/size
length) in the EBML Header only applies to the EBML Stream, then yes, I
agree it's not undefined. This question was about parsing things within the
EBML Header, and I wasn't sure if this parsing information in the header
could be used within the header itself (i.e. a VOID element with a size
length of 10 octets (even though the VOID element data is only 5 octets)
occurring in the EBML Header before the EBMLMaxSizeLength element; given
your other emails, it seems this type if situation would be considered
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.matroska.org/pipermail/matroska-devel/attachments/20151012/fd732af9/attachment-0001.html>

More information about the Matroska-devel mailing list