[Matroska-devel] EBML specification component for review - Element Data Size

Steve Lhomme slhomme at matroska.org
Sat May 2 18:18:39 CEST 2015

On Fri, May 1, 2015 at 5:12 PM, Dave Rice <dave at dericed.com> wrote:

> Yes, it would be possible to parse such an EBML document, as long as
> the parser knows about all elements, and maintains a knowledge about
> their relative ordering. But as I've said, this makes the parser a bit
> more complicated than a simple one, and breaks future extensibility.
> To be precise, for each element, the parser needs to know the set of
> allowed sub-elements. The set of sub-elements must not overlap with the
> set of sub-elements allowed in the grandparent element (which affects
> in particular VOID and CRC elements).
> Right. So perhaps we need a specific constraints on the use of VOID and
> CRC within an element of unknown size. It's hard to imagine the use case of
> a CRC sub-element within an element of unknown size, perhaps this can CRC
> within an unknown length element can simply be forbidden.

Correct. Again, one rules that's pretty obvious but that could be written

> Would there be an issue with SimpleTag in the case of an unknown size
> parent? SimpleTag can appear at several different levels.

I never thought about that. The current trend is to make servers dumb and
clients smart. In this case that means to generate on-the-fly tags, the
server should not have to know all the elements in memory before being able
to send them. So I would favor allowing it. Meaning a nested element cannot
end an infinite size for the same element. I think that would break libebml

> Also the rules we're discussing presume that an EBML-based format makes
> all sub-elements known. Should we add a statement to the EBML spec that
> defines this to keep the unknown-size parsing possible?

Yes, see my other email about the rules that apply on infinite/unknown

> An EBML element data size with all data bits set to 1 indicate that the
> data size is unknown. This allows for dynamically generated EBML streams
> where the final size isn't known beforehand. The element with unknown size
> MUST be an element with an element list as data payload. The end of the
> element list is determined by the ID of the element. When an element that
> isn't a sub-element of the element with unknown size arrives, the element
> list is ended."
> The RFC Draft also has this line just after that in the Data Size section:
> Since the highest value is used for unknown size the effective maximum
> data size is 2^56-2, using variable size integer width 8.
> Whereas the EBML spec doesn't give an 8 bit limit, this line seems to
> imply it. There is a probably with the 1 filled unknown length if
> EBMLMaxSizeWidth is greater than 8. Is the unknown length value supposed to
> match the length of the EBMLMaxSizeWidth or be fixed at 8?
> Bytes, not bits.
We prefer octets, which are guaranteed to be 8 bits.

Steve Lhomme
Matroska association Chairman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.matroska.org/pipermail/matroska-devel/attachments/20150502/9b79cc15/attachment.html>

More information about the Matroska-devel mailing list