[Matroska-devel] EBML Schema

Dave Rice dave at dericed.com
Mon Nov 9 19:19:07 CET 2015

Hi all,

> On Oct 3, 2015, at 9:46 AM, Steve Lhomme <slhomme at matroska.org> wrote:
> On Aug 28, 2015 17:00, "Dave Rice" <dave at dericed.com <mailto:dave at dericed.com>> wrote:
> >
> > Hi,
> >
> >> On Aug 28, 2015, at 2:50 AM, Moritz Bunkus <moritz at bunkus.org <mailto:moritz at bunkus.org>> wrote:
> >>
> >> Hey,
> >>
> >> I have no objections, however I don't know a lot about XML schemas in
> >> the first place (neither about DTDs, to be honest).
> >
> >
> > Honestly, I know a lot more about XML Schemas than I do about DTDs. As wikipedia mentions at https://en.wikipedia.org/wiki/Document_type_definition <https://en.wikipedia.org/wiki/Document_type_definition>, DTDs have largely been superseded by XML Schemas. And at this point I think that XML Schemas may be a more familiar analogy to use.
> >
> > I think XML Schemas also share more in common with specdata.xml than DTDs do. Schemas use the <element> node and have maxOccurs and minOccurs attributes (specdata has semantically the same thing with mandatory and multiple), they both have a similar declaration of element type, element name and element description. Actually I think a semantically equivalent version of specdata.xml could be written as an XML Schema.
> >
> > XML Schemas also offer a few advantages for machine readable expressions; for instance XML Schemas can mandate a particular pattern or regex for a value. 
> >
> >>> I propose the specdata.xml file here
> >>> https://github.com/Matroska-Org/foundation-source/blob/master/spectool/specdata.xml <https://github.com/Matroska-Org/foundation-source/blob/master/spectool/specdata.xml>
> >>> <https://github.com/Matroska-Org/foundation-source/blob/master/spectool/specdata.xml <https://github.com/Matroska-Org/foundation-source/blob/master/spectool/specdata.xml>>
> >>> is a good basis for the consideration of an EBML Schema. From what I
> >>> can see, specdata.xml is an expression of the EBML + Matroska
> >>> specifications to support automated creation of documentation, but the
> >>> structure of this already shares a lot of similarity to XML Schemas.
> >>
> >>
> >> For both documentation (e.g. the table on the matroska.org <http://matroska.org/> specs page is
> >> generated from this file) and code (libMatroska's class hierarchy is
> >> generated automatically from this file) actually.
> >
> >
> > Does specdata.xml play a role in mkvalidate? I'm thinking of the potential to have an ebmlvalidator where you can provide the EBML Schema to validate particular EBML docType.
> Well the parsing code is generated from the XML file, so in a way, yes. But it's not parsed "live".
> >>> Is there a preference in handling the standardization of Matroska:
> >>> documenting it in a similar fashion to our work in the EBML spec or to
> >>> define what an EBML Schema is and consider matroska an expression of
> >>> it?
> >>
> >>
> >> I'm not sure whether or not I understand the implications. But my gut
> >> feeling is that having a definition for an EBML Schema would benefit
> >> other formats than Matroska, too, therefore the latter seems the way to
> >> go.
> >
> >
> > I have the same feeling:
> > - document EBML as a specification that includes rules for defining a docType in the form of an EBML Schema
> > - write an EBML Schema (updated specdata.xml) for Matroska and maybe webM
> >
> >>> Are some changes to specdata.xml acceptable? Such as a filename change
> >>> or changing the name of the <table> element of some attributes?
> >>
> >>
> >> Well, like I said above the specdata.xml is used for generating both
> >> documentation and code. Both should stay viable. If changes to it are
> >> made then the accompanying tools must be updated as well.
> >>
> >>> Neither the current EBML specs nor the specdata.xml specifically refer
> >>> to the hierarchical arrangement of the elements, but this could be
> >>> presumed by their ordering. For instance, could any level 3 element be
> >>> a child of any level 2 Master-element? I presume not, but I don't
> >>> think it's clear anywhere what parent-child relationships are
> >>> feasible. Possibly specdata.xml and/or the EBML Schema Definition
> >>> could define the relationship between levels of related elements
> >>> similar to how an XML Schema (XSD) does.
> >>
> >>
> >> So far it is understood that an element not marked as a global element
> >> must only occur as a child of its parent. Its parent is the last element
> >> located before the child element in the specdata file with a lower level
> >> than the child element. Or something like that.
> >
> >
> > This will need some documentation. That's how I've understood the mkv spec as well but the definition for how an EBML Schema works should be explicit about this.
Any more opinion about how to go about (or if to go about) modifying specdata.xml towards becoming an expression of a to-be-defined EBML Schema for matroska and webm? As a summary of proposed changes to specdata.xml

- change to XML Schema conventions where relevant:
		- use maxOccurs attribute instead of the current Multiple attribute.
		- use minOccurs attribute instead of the current Mandatory attribute.
		- move documentation of elements to a sub-element (allows for possible internationalization in the schema and better semantics)
- arrange elements in hierarchical form to indicate parent-child relationships (rather than the current practices where all elements are defined at the same level, and you have to parse back in elements to the one with the lower-numbered level attribute to find the parent)

A draft of specdata.xml with these changes is at https://gist.github.com/dericed/f0a4bb0e7dc635ed1347 <https://gist.github.com/dericed/f0a4bb0e7dc635ed1347>. I can continue to work on this and send back changes for advice/approval but if I do so is there someone who could later update the tools that use specdata.xml so that newly-defined EBML Schemas may later to be into use?

btw I'm cc'ing the newly-established CELLAR listserv which focuses on work on the EBML and Matroska specification. If you are interested in these topics please considering subscribing at https://www.ietf.org/mailman/listinfo/cellar <https://www.ietf.org/mailman/listinfo/cellar>.

Best Regards,
Dave Rice

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.matroska.org/pipermail/matroska-devel/attachments/20151109/d0603a10/attachment.html>

More information about the Matroska-devel mailing list