[Matroska-devel] EBML specification component for review - Element Data Size

Dave Rice dave at dericed.com
Fri May 1 16:15:17 CEST 2015


Hi,

> On May 1, 2015, at 9:18 AM, Moritz Bunkus <moritz at bunkus.org> wrote:
> 
> Hey,
> 
>> I know, EBML was supposed to be generic and flexible, but I would say
>> experience with Matroska taught us that this specific aspect is not
>> really needed.
> 
> I know of at least four (private) projects that use EBML as a basis for
> various in-house solutions. Sure, it hasn't reached any kind of global
> use, but Matroska is certainly not the only one.
> 
> Anyway, I don't care that strongly about the point, so I won't argue it
> further. Limiting the maximum ID length to 4 is fine with me.

I'm going to try to summarize this thread. Apologies for a long post.

Regarding EBMLMaxSizeWidth, the EBML spec is flexible are what range is allowed, though Matroska constrains it. I propose this stay as is. We aren't currently working on a new version of EBML but trying to clarify the existing one. I propose this stay as is.

Unknown EBMLMaxSizeWidth are technically allowed, but the rules about use potentially make the document nearly unparseable as wm4 pointed out (espectially if all possible Element IDs aren't known by the parser). Is it possible to parse an EBML document where all elements without sub-elements have unknown values? For me "When an element that isn't a sub-element of the element with unknown size arrives, the element list is ended." isn't clear enough. For instance an unknown element may contain various known sub-elements and then a VOID element, but the VOID element could be a child of the unknown element or the grandparent element.

Question from our discussion:
- EBMLMaxSizeWidth and EBMLMaxSizeLength are synonymous? The RFC draft uses 'Width' whereas the spec uses 'Length'. I propose preference to Length.

Back to the origins of this thread...

For reference, the RFC draft states:

> The EBML element data size is encoded as a variable size integer with, by default, widths up to 8. Another maximum width value can be set by setting another value to EBMLMaxSizeWidth in the EBML header. See section 5.1. There is a range overlap between all different widths, so that 1 encoded with width 1 is semantically equal to 1 encoded with width 8. This allows for the element data to shrink without having to shrink the width of the size descriptor.
> 
> Values with all data bits set to 1 means size unknown, which allows for dynamically generated EBML streams where the final size isn't known beforehand. The element with unknown size MUST be an element with an element list as data payload. The end of the element list is determined by the ID of the element. When an element that isn't a sub-element of the element with unknown size arrives, the element list is ended.


For cleanup, I propose (knowing that the unknown size issue is not yet resolved):

"The EBML element data size is encoded as a variable size integer. Another maximum width value can be set by setting another value to EBMLMaxSizeWidth in the EBML header. There is a range overlap between all different widths, so that 1 encoded with width 1 is semantically equal to 1 encoded with width 8. This allows for the element data to shrink without having to shrink the width of the size descriptor.

An EBML element data size with all data bits set to 1 indicate that the data size is unknown. This allows for dynamically generated EBML streams where the final size isn't known beforehand. The element with unknown size MUST be an element with an element list as data payload. The end of the element list is determined by the ID of the element. When an element that isn't a sub-element of the element with unknown size arrives, the element list is ended."

The RFC Draft also has this line just after that in the Data Size section:

> Since the highest value is used for unknown size the effective maximum data size is 2^56-2, using variable size integer width 8.

Whereas the EBML spec doesn't give an 8 bit limit, this line seems to imply it. There is a probably with the 1 filled unknown length if EBMLMaxSizeWidth is greater than 8. Is the unknown length value supposed to match the length of the EBMLMaxSizeWidth or be fixed at 8?

Dave Rice
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.matroska.org/pipermail/matroska-devel/attachments/20150501/dc83dde0/attachment-0001.html>


More information about the Matroska-devel mailing list