[Matroska-devel] clarifications on the use of Unknown Sized Elements

Dave Rice dave at dericed.com
Mon Jun 8 22:49:29 CEST 2015

Hi all,
I wanted to get some more discussion to finish documentation on the Element Data Size. Some discussion is at:

# The pull request for Data Size documentation

# Prior listserv thread concerning the Data Size section of Martin Nilsson's RFC Draft

# Earlier pull request (closed without commit because it was too messy, incorporated copy-written text)

So clarify (my) presumptions:
- Only Elements that are Master-elements may use an unknown size.
- I'm saying that "An Element Data Size with all VINT\_DATA bits set to one is reserved as an indicator that the size of the Element is unknown." Thus a VINT_DATA of 0b1111111 is unknown as well as 0b11111111111111 as well as any one-filled VINT_DATA for any length supported by the Element.

For clarification:

- When is the Element of Unknown Size ended. In discussions (in github, mailing list, and irc) there has been two answers to this:
	1. Upon the first occurrence of an Element that is not a valid sub-element of the unknown-sized element.
	2. Upon the first occurrence of an Element that is at the same or a higher level than the one with an infite size

Under option 1 an occurrence of SimpleTag within another SimpleTag of unknown size DOES NOT indicate the end of the parent SimpleTag element.
Under option 2 an occurrence of SimpleTag within another SimpleTag of unknown size DOES indicate the end of the parent SimpleTag element.
Which option is the correct intention of EBML?

- Dependency on Schema
For the SimpleTag option available, the interpretation of the ending of the element presumes that the parse has knowledge of the EBML Schema/Profile of mastroska in any version 1-4. I'd like to include some statement to say that the use of an element of unknown size requires the use of the corresponding schema. For instance a pure EBML parser (without knowledge of the matroska schema) would not be able to parse a matroska document with an unknown size because it wouldn't be able to 'know' what elements are valid endings to the unknown-sized element.

In the latest PR, https://github.com/Matroska-Org/ebml-specification/pull/15,  I put this phrase:

> The use of Elements of unknown size is dependent on the    
> definition of the EBML Schema declared in DocType, because an Element of    
> unknown size can not be parsed without a complete list of all possible    
> sub-elements.

In conversion on the PR, Steve said this statement is incorrect: https://github.com/Matroska-Org/ebml-specification/pull/15#discussion_r31696905, but I am not certain how to best resolve it. Suggestions? Should I make reference to the need to have the associated schema to be able to parse unknown-sized elements of ebml documents?

More information about the Matroska-devel mailing list