[Matroska-devel] EBML Schema

Dave Rice dave at dericed.com
Mon Nov 9 19:19:07 CET 2015


Hi all,

> On Oct 3, 2015, at 9:46 AM, Steve Lhomme <slhomme at matroska.org> wrote:
> On Aug 28, 2015 17:00, "Dave Rice" <dave at dericed.com <mailto:dave at dericed.com>> wrote:
> >
> > Hi,
> >
> >> On Aug 28, 2015, at 2:50 AM, Moritz Bunkus <moritz at bunkus.org <mailto:moritz at bunkus.org>> wrote:
> >>
> >> Hey,
> >>
> >> I have no objections, however I don't know a lot about XML schemas in
> >> the first place (neither about DTDs, to be honest).
> >
> >
> > Honestly, I know a lot more about XML Schemas than I do about DTDs. As wikipedia mentions at https://en.wikipedia.org/wiki/Document_type_definition <https://en.wikipedia.org/wiki/Document_type_definition>, DTDs have largely been superseded by XML Schemas. And at this point I think that XML Schemas may be a more familiar analogy to use.
> >
> > I think XML Schemas also share more in common with specdata.xml than DTDs do. Schemas use the <element> node and have maxOccurs and minOccurs attributes (specdata has semantically the same thing with mandatory and multiple), they both have a similar declaration of element type, element name and element description. Actually I think a semantically equivalent version of specdata.xml could be written as an XML Schema.
> >
> > XML Schemas also offer a few advantages for machine readable expressions; for instance XML Schemas can mandate a particular pattern or regex for a value. 
> >
> >>> I propose the specdata.xml file here
> >>> https://github.com/Matroska-Org/foundation-source/blob/master/spectool/specdata.xml <https://github.com/Matroska-Org/foundation-source/blob/master/spectool/specdata.xml>
> >>> <https://github.com/Matroska-Org/foundation-source/blob/master/spectool/specdata.xml <https://github.com/Matroska-Org/foundation-source/blob/master/spectool/specdata.xml>>
> >>> is a good basis for the consideration of an EBML Schema. From what I
> >>> can see, specdata.xml is an expression of the EBML + Matroska
> >>> specifications to support automated creation of documentation, but the
> >>> structure of this already shares a lot of similarity to XML Schemas.
> >>
> >>
> >> For both documentation (e.g. the table on the matroska.org <http://matroska.org/> specs page is
> >> generated from this file) and code (libMatroska's class hierarchy is
> >> generated automatically from this file) actually.
> >
> >
> > Does specdata.xml play a role in mkvalidate? I'm thinking of the potential to have an ebmlvalidator where you can provide the EBML Schema to validate particular EBML docType.
> 
> Well the parsing code is generated from the XML file, so in a way, yes. But it's not parsed "live".
> 
> >>> Is there a preference in handling the standardization of Matroska:
> >>> documenting it in a similar fashion to our work in the EBML spec or to
> >>> define what an EBML Schema is and consider matroska an expression of
> >>> it?
> >>
> >>
> >> I'm not sure whether or not I understand the implications. But my gut
> >> feeling is that having a definition for an EBML Schema would benefit
> >> other formats than Matroska, too, therefore the latter seems the way to
> >> go.
> >
> >
> > I have the same feeling:
> > - document EBML as a specification that includes rules for defining a docType in the form of an EBML Schema
> > - write an EBML Schema (updated specdata.xml) for Matroska and maybe webM
> >
> >>> Are some changes to specdata.xml acceptable? Such as a filename change
> >>> or changing the name of the <table> element of some attributes?
> >>
> >>
> >> Well, like I said above the specdata.xml is used for generating both
> >> documentation and code. Both should stay viable. If changes to it are
> >> made then the accompanying tools must be updated as well.
> >>
> >>> Neither the current EBML specs nor the specdata.xml specifically refer
> >>> to the hierarchical arrangement of the elements, but this could be
> >>> presumed by their ordering. For instance, could any level 3 element be
> >>> a child of any level 2 Master-element? I presume not, but I don't
> >>> think it's clear anywhere what parent-child relationships are
> >>> feasible. Possibly specdata.xml and/or the EBML Schema Definition
> >>> could define the relationship between levels of related elements
> >>> similar to how an XML Schema (XSD) does.
> >>
> >>
> >> So far it is understood that an element not marked as a global element
> >> must only occur as a child of its parent. Its parent is the last element
> >> located before the child element in the specdata file with a lower level
> >> than the child element. Or something like that.
> >
> >
> > This will need some documentation. That's how I've understood the mkv spec as well but the definition for how an EBML Schema works should be explicit about this.
> 
Any more opinion about how to go about (or if to go about) modifying specdata.xml towards becoming an expression of a to-be-defined EBML Schema for matroska and webm? As a summary of proposed changes to specdata.xml

- change to XML Schema conventions where relevant:
		- use maxOccurs attribute instead of the current Multiple attribute.
		- use minOccurs attribute instead of the current Mandatory attribute.
		- move documentation of elements to a sub-element (allows for possible internationalization in the schema and better semantics)
- arrange elements in hierarchical form to indicate parent-child relationships (rather than the current practices where all elements are defined at the same level, and you have to parse back in elements to the one with the lower-numbered level attribute to find the parent)

A draft of specdata.xml with these changes is at https://gist.github.com/dericed/f0a4bb0e7dc635ed1347 <https://gist.github.com/dericed/f0a4bb0e7dc635ed1347>. I can continue to work on this and send back changes for advice/approval but if I do so is there someone who could later update the tools that use specdata.xml so that newly-defined EBML Schemas may later to be into use?

btw I'm cc'ing the newly-established CELLAR listserv which focuses on work on the EBML and Matroska specification. If you are interested in these topics please considering subscribing at https://www.ietf.org/mailman/listinfo/cellar <https://www.ietf.org/mailman/listinfo/cellar>.

Best Regards,
Dave Rice

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.matroska.org/pipermail/matroska-devel/attachments/20151109/d0603a10/attachment.html>


More information about the Matroska-devel mailing list