[Matroska-devel] EBML Schema

wm4 nfxjfg at googlemail.com
Thu Oct 1 11:50:45 CEST 2015

On Wed, 30 Sep 2015 22:46:09 -0400
Dave Rice <dave at dericed.com> wrote:

> Hi,
> > On Aug 28, 2015, at 10:52 AM, Dave Rice <dave at dericed.com> wrote:
> >> On Aug 28, 2015, at 2:50 AM, Moritz Bunkus <moritz at bunkus.org <mailto:moritz at bunkus.org>> wrote:
> >> 
> >> Hey,
> >> 
> >> I have no objections, however I don't know a lot about XML schemas in
> >> the first place (neither about DTDs, to be honest).
> > 
> > Honestly, I know a lot more about XML Schemas than I do about DTDs. As wikipedia mentions at https://en.wikipedia.org/wiki/Document_type_definition <https://en.wikipedia.org/wiki/Document_type_definition>, DTDs have largely been superseded by XML Schemas. And at this point I think that XML Schemas may be a more familiar analogy to use.
> > 
> > I think XML Schemas also share more in common with specdata.xml than DTDs do. Schemas use the <element> node and have maxOccurs and minOccurs attributes (specdata has semantically the same thing with mandatory and multiple), they both have a similar declaration of element type, element name and element description. Actually I think a semantically equivalent version of specdata.xml could be written as an XML Schema.
> > 
> > XML Schemas also offer a few advantages for machine readable expressions; for instance XML Schemas can mandate a particular pattern or regex for a value.
> > 
> >>> I propose the specdata.xml file here
> >>> https://github.com/Matroska-Org/foundation-source/blob/master/spectool/specdata.xml <https://github.com/Matroska-Org/foundation-source/blob/master/spectool/specdata.xml>
> >>> <https://github.com/Matroska-Org/foundation-source/blob/master/spectool/specdata.xml>
> >>> is a good basis for the consideration of an EBML Schema. From what I
> >>> can see, specdata.xml is an expression of the EBML + Matroska
> >>> specifications to support automated creation of documentation, but the
> >>> structure of this already shares a lot of similarity to XML Schemas.
> >> 
> >> For both documentation (e.g. the table on the matroska.org <http://matroska.org/> specs page is
> >> generated from this file) and code (libMatroska's class hierarchy is
> >> generated automatically from this file) actually.
> > 
> > Does specdata.xml play a role in mkvalidate? I'm thinking of the potential to have an ebmlvalidator where you can provide the EBML Schema to validate particular EBML docType.
> > 
> >>> Is there a preference in handling the standardization of Matroska:
> >>> documenting it in a similar fashion to our work in the EBML spec or to
> >>> define what an EBML Schema is and consider matroska an expression of
> >>> it?
> >> 
> >> I'm not sure whether or not I understand the implications. But my gut
> >> feeling is that having a definition for an EBML Schema would benefit
> >> other formats than Matroska, too, therefore the latter seems the way to
> >> go.
> > 
> > I have the same feeling:
> > - document EBML as a specification that includes rules for defining a docType in the form of an EBML Schema
> > - write an EBML Schema (updated specdata.xml) for Matroska and maybe webM
> > 
> >>> Are some changes to specdata.xml acceptable? Such as a filename change
> >>> or changing the name of the <table> element of some attributes?
> >> 
> >> Well, like I said above the specdata.xml is used for generating both
> >> documentation and code. Both should stay viable. If changes to it are
> >> made then the accompanying tools must be updated as well.
> >> 
> >>> Neither the current EBML specs nor the specdata.xml specifically refer
> >>> to the hierarchical arrangement of the elements, but this could be
> >>> presumed by their ordering. For instance, could any level 3 element be
> >>> a child of any level 2 Master-element? I presume not, but I don't
> >>> think it's clear anywhere what parent-child relationships are
> >>> feasible. Possibly specdata.xml and/or the EBML Schema Definition
> >>> could define the relationship between levels of related elements
> >>> similar to how an XML Schema (XSD) does.
> >> 
> >> So far it is understood that an element not marked as a global element
> >> must only occur as a child of its parent. Its parent is the last element
> >> located before the child element in the specdata file with a lower level
> >> than the child element. Or something like that.
> > 
> > This will need some documentation. That's how I've understood the mkv spec as well but the definition for how an EBML Schema works should be explicit about this.
> I created a first draft of Matroska's specdata.xml with nested elements here: https://gist.github.com/dericed/f0a4bb0e7dc635ed1347 <https://gist.github.com/dericed/f0a4bb0e7dc635ed1347>. The content of the xml is the same but the definition is moved from element to element/documentation. And then elements are nested within elements according to their level and allowed location. I think a nested structure in an EBML Schema would make the location more clear than the current rule which is that the element is a child of the previous element with a higher element level value. Now the element is simply a child of the parent element. With a structure like this the level attribute would be redundant to the element structure.
> Another advantage of this structure is that is allowed the EBML Schema to be better adapted to foriegn language descriptions. Just as in XML Schema one could have multiple <documentation> nodes per <element> with different language attributes.
> I'd also like to propose change the EBML Schema attributes mandatory and multiple to their familiar XML Schema counterparts: minOccurs and maxOccurs. Here all mandotory="1" would become minOccurs="1" and multiple="1" would be maxOccurs="unbounded".
> Another idea is that the next version of EBML could add an element for schemaLocation which would be a url to the EBML Schema, thus a Matroska file could have an EBML header schemaLocation of https://github.com/Matroska-Org/foundation-source/blob/master/spectool/specdata.xml <https://github.com/Matroska-Org/foundation-source/blob/master/spectool/specdata.xml> so that validators could pull the appropriate schema for validation.
> Comments?
> Dave Rice

Maybe I'm way too late for this, but: does it really have to be XML?
It's neither readable, nor inviting to add lots of details to the
documentation elements.

More information about the Matroska-devel mailing list