[Matroska-devel] EBML Schema

Dave Rice dave at dericed.com
Thu Oct 1 13:10:04 CEST 2015



> On Oct 1, 2015, at 5:50 AM, wm4 <nfxjfg at googlemail.com> wrote:
> 
> On Wed, 30 Sep 2015 22:46:09 -0400
> Dave Rice <dave at dericed.com> wrote:
> 
>> Hi,
>>>> On Aug 28, 2015, at 10:52 AM, Dave Rice <dave at dericed.com> wrote:
>>>> On Aug 28, 2015, at 2:50 AM, Moritz Bunkus <moritz at bunkus.org <mailto:moritz at bunkus.org>> wrote:
>>>> 
>>>> Hey,
>>>> 
>>>> I have no objections, however I don't know a lot about XML schemas in
>>>> the first place (neither about DTDs, to be honest).
>>> 
>>> Honestly, I know a lot more about XML Schemas than I do about DTDs. As wikipedia mentions at https://en.wikipedia.org/wiki/Document_type_definition <https://en.wikipedia.org/wiki/Document_type_definition>, DTDs have largely been superseded by XML Schemas. And at this point I think that XML Schemas may be a more familiar analogy to use.
>>> 
>>> I think XML Schemas also share more in common with specdata.xml than DTDs do. Schemas use the <element> node and have maxOccurs and minOccurs attributes (specdata has semantically the same thing with mandatory and multiple), they both have a similar declaration of element type, element name and element description. Actually I think a semantically equivalent version of specdata.xml could be written as an XML Schema.
>>> 
>>> XML Schemas also offer a few advantages for machine readable expressions; for instance XML Schemas can mandate a particular pattern or regex for a value.
>>> 
>>>>> I propose the specdata.xml file here
>>>>> https://github.com/Matroska-Org/foundation-source/blob/master/spectool/specdata.xml <https://github.com/Matroska-Org/foundation-source/blob/master/spectool/specdata.xml>
>>>>> <https://github.com/Matroska-Org/foundation-source/blob/master/spectool/specdata.xml>
>>>>> is a good basis for the consideration of an EBML Schema. From what I
>>>>> can see, specdata.xml is an expression of the EBML + Matroska
>>>>> specifications to support automated creation of documentation, but the
>>>>> structure of this already shares a lot of similarity to XML Schemas.
>>>> 
>>>> For both documentation (e.g. the table on the matroska.org <http://matroska.org/> specs page is
>>>> generated from this file) and code (libMatroska's class hierarchy is
>>>> generated automatically from this file) actually.
>>> 
>>> Does specdata.xml play a role in mkvalidate? I'm thinking of the potential to have an ebmlvalidator where you can provide the EBML Schema to validate particular EBML docType.
>>> 
>>>>> Is there a preference in handling the standardization of Matroska:
>>>>> documenting it in a similar fashion to our work in the EBML spec or to
>>>>> define what an EBML Schema is and consider matroska an expression of
>>>>> it?
>>>> 
>>>> I'm not sure whether or not I understand the implications. But my gut
>>>> feeling is that having a definition for an EBML Schema would benefit
>>>> other formats than Matroska, too, therefore the latter seems the way to
>>>> go.
>>> 
>>> I have the same feeling:
>>> - document EBML as a specification that includes rules for defining a docType in the form of an EBML Schema
>>> - write an EBML Schema (updated specdata.xml) for Matroska and maybe webM
>>> 
>>>>> Are some changes to specdata.xml acceptable? Such as a filename change
>>>>> or changing the name of the <table> element of some attributes?
>>>> 
>>>> Well, like I said above the specdata.xml is used for generating both
>>>> documentation and code. Both should stay viable. If changes to it are
>>>> made then the accompanying tools must be updated as well.
>>>> 
>>>>> Neither the current EBML specs nor the specdata.xml specifically refer
>>>>> to the hierarchical arrangement of the elements, but this could be
>>>>> presumed by their ordering. For instance, could any level 3 element be
>>>>> a child of any level 2 Master-element? I presume not, but I don't
>>>>> think it's clear anywhere what parent-child relationships are
>>>>> feasible. Possibly specdata.xml and/or the EBML Schema Definition
>>>>> could define the relationship between levels of related elements
>>>>> similar to how an XML Schema (XSD) does.
>>>> 
>>>> So far it is understood that an element not marked as a global element
>>>> must only occur as a child of its parent. Its parent is the last element
>>>> located before the child element in the specdata file with a lower level
>>>> than the child element. Or something like that.
>>> 
>>> This will need some documentation. That's how I've understood the mkv spec as well but the definition for how an EBML Schema works should be explicit about this.
>> 
>> I created a first draft of Matroska's specdata.xml with nested elements here: https://gist.github.com/dericed/f0a4bb0e7dc635ed1347 <https://gist.github.com/dericed/f0a4bb0e7dc635ed1347>. The content of the xml is the same but the definition is moved from element to element/documentation. And then elements are nested within elements according to their level and allowed location. I think a nested structure in an EBML Schema would make the location more clear than the current rule which is that the element is a child of the previous element with a higher element level value. Now the element is simply a child of the parent element. With a structure like this the level attribute would be redundant to the element structure.
>> 
>> Another advantage of this structure is that is allowed the EBML Schema to be better adapted to foriegn language descriptions. Just as in XML Schema one could have multiple <documentation> nodes per <element> with different language attributes.
>> 
>> I'd also like to propose change the EBML Schema attributes mandatory and multiple to their familiar XML Schema counterparts: minOccurs and maxOccurs. Here all mandotory="1" would become minOccurs="1" and multiple="1" would be maxOccurs="unbounded".
>> 
>> Another idea is that the next version of EBML could add an element for schemaLocation which would be a url to the EBML Schema, thus a Matroska file could have an EBML header schemaLocation of https://github.com/Matroska-Org/foundation-source/blob/master/spectool/specdata.xml <https://github.com/Matroska-Org/foundation-source/blob/master/spectool/specdata.xml> so that validators could pull the appropriate schema for validation.
>> 
>> Comments?
>> Dave Rice
> 
> Maybe I'm way too late for this, but: does it really have to be XML?
> It's neither readable, nor inviting to add lots of details to the
> documentation elements.

The basis of the Matroska spec is currently in XML, see https://github.com/Matroska-Org/foundation-source/blob/master/spectool/specdata.xml. The xml is the used to created the human-readable documentation. I think the XML is also used programmatically (I think in mkvalidator). So an XML document that defines an EBML document is not a new idea, but I would like to standardize how an EBML Schematic should be expressed. I think that following the analogy of XML Schematic makes sense.
Dave Rice

> _______________________________________________
> Matroska-devel mailing list
> Matroska-devel at lists.matroska.org
> http://lists.matroska.org/cgi-bin/mailman/listinfo/matroska-devel
> Read Matroska-Devel on GMane: http://dir.gmane.org/gmane.comp.multimedia.matroska.devel


More information about the Matroska-devel mailing list