[Matroska-devel] EBML Schema

Steve Lhomme slhomme at matroska.org
Sat Oct 3 15:30:38 CEST 2015


On Oct 1, 2015 04:47, "Dave Rice" <dave at dericed.com> wrote:
>
> Hi,
>
>> On Aug 28, 2015, at 10:52 AM, Dave Rice <dave at dericed.com> wrote:
>>>
>>> On Aug 28, 2015, at 2:50 AM, Moritz Bunkus <moritz at bunkus.org> wrote:
>>>
>>> Hey,
>>>
>>> I have no objections, however I don't know a lot about XML schemas in
>>> the first place (neither about DTDs, to be honest).
>>
>>
>> Honestly, I know a lot more about XML Schemas than I do about DTDs. As
wikipedia mentions at https://en.wikipedia.org/wiki/Document_type_definition,
DTDs have largely been superseded by XML Schemas. And at this point I think
that XML Schemas may be a more familiar analogy to use.
>>
>> I think XML Schemas also share more in common with specdata.xml than
DTDs do. Schemas use the <element> node and have maxOccurs and minOccurs
attributes (specdata has semantically the same thing with mandatory and
multiple), they both have a similar declaration of element type, element
name and element description. Actually I think a semantically equivalent
version of specdata.xml could be written as an XML Schema.
>>
>> XML Schemas also offer a few advantages for machine readable
expressions; for instance XML Schemas can mandate a particular pattern or
regex for a value.
>>
>>>> I propose the specdata.xml file here
>>>>
https://github.com/Matroska-Org/foundation-source/blob/master/spectool/specdata.xml
>>>> <
https://github.com/Matroska-Org/foundation-source/blob/master/spectool/specdata.xml
>
>>>> is a good basis for the consideration of an EBML Schema. From what I
>>>> can see, specdata.xml is an expression of the EBML + Matroska
>>>> specifications to support automated creation of documentation, but the
>>>> structure of this already shares a lot of similarity to XML Schemas.
>>>
>>>
>>> For both documentation (e.g. the table on the matroska.org specs page is
>>> generated from this file) and code (libMatroska's class hierarchy is
>>> generated automatically from this file) actually.
>>
>>
>> Does specdata.xml play a role in mkvalidate? I'm thinking of the
potential to have an ebmlvalidator where you can provide the EBML Schema to
validate particular EBML docType.
>>
>>>> Is there a preference in handling the standardization of Matroska:
>>>> documenting it in a similar fashion to our work in the EBML spec or to
>>>> define what an EBML Schema is and consider matroska an expression of
>>>> it?
>>>
>>>
>>> I'm not sure whether or not I understand the implications. But my gut
>>> feeling is that having a definition for an EBML Schema would benefit
>>> other formats than Matroska, too, therefore the latter seems the way to
>>> go.
>>
>>
>> I have the same feeling:
>> - document EBML as a specification that includes rules for defining a
docType in the form of an EBML Schema
>> - write an EBML Schema (updated specdata.xml) for Matroska and maybe webM
>>
>>>> Are some changes to specdata.xml acceptable? Such as a filename change
>>>> or changing the name of the <table> element of some attributes?
>>>
>>>
>>> Well, like I said above the specdata.xml is used for generating both
>>> documentation and code. Both should stay viable. If changes to it are
>>> made then the accompanying tools must be updated as well.
>>>
>>>> Neither the current EBML specs nor the specdata.xml specifically refer
>>>> to the hierarchical arrangement of the elements, but this could be
>>>> presumed by their ordering. For instance, could any level 3 element be
>>>> a child of any level 2 Master-element? I presume not, but I don't
>>>> think it's clear anywhere what parent-child relationships are
>>>> feasible. Possibly specdata.xml and/or the EBML Schema Definition
>>>> could define the relationship between levels of related elements
>>>> similar to how an XML Schema (XSD) does.
>>>
>>>
>>> So far it is understood that an element not marked as a global element
>>> must only occur as a child of its parent. Its parent is the last element
>>> located before the child element in the specdata file with a lower level
>>> than the child element. Or something like that.
>>
>>
>> This will need some documentation. That's how I've understood the mkv
spec as well but the definition for how an EBML Schema works should be
explicit about this.
>
>
> I created a first draft of Matroska's specdata.xml with nested elements
here: https://gist.github.com/dericed/f0a4bb0e7dc635ed1347. The content of
the xml is the same but the definition is moved from element to
element/documentation. And then elements are nested within elements
according to their level and allowed location. I think a nested structure
in an EBML Schema would make the location more clear than the current rule
which is that the element is a child of the previous element with a higher
element level value. Now the element is simply a child of the parent
element. With a structure like this the level attribute would be redundant
to the element structure.

Nested elements are surely better than relying on the value of a previously
parsed element.

The nested elements having a recursive flag still seems a little hackish.
As well as the global elements having a - 1 value. For example there could
be a global EBML element for all children but starting at level 4.

I suppose there are such elements in HTML. How are they described in XML
schema?

> Another advantage of this structure is that is allowed the EBML Schema to
be better adapted to foriegn language descriptions. Just as in XML Schema
one could have multiple <documentation> nodes per <element> with different
language attributes.
>
> I'd also like to propose change the EBML Schema attributes mandatory and
multiple to their familiar XML Schema counterparts: minOccurs and
maxOccurs. Here all mandotory="1" would become minOccurs="1" and
multiple="1" would be maxOccurs="unbounded".
>
> Another idea is that the next version of EBML could add an element for
schemaLocation which would be a url to the EBML Schema, thus a Matroska
file could have an EBML header schemaLocation of
https://github.com/Matroska-Org/foundation-source/blob/master/spectool/specdata.xml
so
that validators could pull the appropriate schema for validation.
>
> Comments?
> Dave Rice
>
>
> _______________________________________________
> Matroska-devel mailing list
> Matroska-devel at lists.matroska.org
> http://lists.matroska.org/cgi-bin/mailman/listinfo/matroska-devel
> Read Matroska-Devel on GMane:
http://dir.gmane.org/gmane.comp.multimedia.matroska.devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.matroska.org/pipermail/matroska-devel/attachments/20151003/730c04f2/attachment-0001.html>


More information about the Matroska-devel mailing list