[Matroska-devel] EBML Schema

Steve Lhomme slhomme at matroska.org
Sat Oct 3 15:22:26 CEST 2015


The reason I created the file in the first place was to make sure the code
matches the specs as they are both generated from the same source.

That kinda makes it the specs albeit not very readable.
On Oct 2, 2015 12:30, "wm4" <nfxjfg at googlemail.com> wrote:

> On Thu, 1 Oct 2015 07:10:04 -0400
> Dave Rice <dave at dericed.com> wrote:
>
> >
> >
> > > On Oct 1, 2015, at 5:50 AM, wm4 <nfxjfg at googlemail.com> wrote:
> > >
> > > On Wed, 30 Sep 2015 22:46:09 -0400
> > > Dave Rice <dave at dericed.com> wrote:
> > >
> > >> Hi,
> > >>>> On Aug 28, 2015, at 10:52 AM, Dave Rice <dave at dericed.com> wrote:
> > >>>> On Aug 28, 2015, at 2:50 AM, Moritz Bunkus <moritz at bunkus.org
> <mailto:moritz at bunkus.org>> wrote:
> > >>>>
> > >>>> Hey,
> > >>>>
> > >>>> I have no objections, however I don't know a lot about XML schemas
> in
> > >>>> the first place (neither about DTDs, to be honest).
> > >>>
> > >>> Honestly, I know a lot more about XML Schemas than I do about DTDs.
> As wikipedia mentions at
> https://en.wikipedia.org/wiki/Document_type_definition <
> https://en.wikipedia.org/wiki/Document_type_definition>, DTDs have
> largely been superseded by XML Schemas. And at this point I think that XML
> Schemas may be a more familiar analogy to use.
> > >>>
> > >>> I think XML Schemas also share more in common with specdata.xml than
> DTDs do. Schemas use the <element> node and have maxOccurs and minOccurs
> attributes (specdata has semantically the same thing with mandatory and
> multiple), they both have a similar declaration of element type, element
> name and element description. Actually I think a semantically equivalent
> version of specdata.xml could be written as an XML Schema.
> > >>>
> > >>> XML Schemas also offer a few advantages for machine readable
> expressions; for instance XML Schemas can mandate a particular pattern or
> regex for a value.
> > >>>
> > >>>>> I propose the specdata.xml file here
> > >>>>>
> https://github.com/Matroska-Org/foundation-source/blob/master/spectool/specdata.xml
> <
> https://github.com/Matroska-Org/foundation-source/blob/master/spectool/specdata.xml
> >
> > >>>>> <
> https://github.com/Matroska-Org/foundation-source/blob/master/spectool/specdata.xml
> >
> > >>>>> is a good basis for the consideration of an EBML Schema. From what
> I
> > >>>>> can see, specdata.xml is an expression of the EBML + Matroska
> > >>>>> specifications to support automated creation of documentation, but
> the
> > >>>>> structure of this already shares a lot of similarity to XML
> Schemas.
> > >>>>
> > >>>> For both documentation (e.g. the table on the matroska.org <
> http://matroska.org/> specs page is
> > >>>> generated from this file) and code (libMatroska's class hierarchy is
> > >>>> generated automatically from this file) actually.
> > >>>
> > >>> Does specdata.xml play a role in mkvalidate? I'm thinking of the
> potential to have an ebmlvalidator where you can provide the EBML Schema to
> validate particular EBML docType.
> > >>>
> > >>>>> Is there a preference in handling the standardization of Matroska:
> > >>>>> documenting it in a similar fashion to our work in the EBML spec
> or to
> > >>>>> define what an EBML Schema is and consider matroska an expression
> of
> > >>>>> it?
> > >>>>
> > >>>> I'm not sure whether or not I understand the implications. But my
> gut
> > >>>> feeling is that having a definition for an EBML Schema would benefit
> > >>>> other formats than Matroska, too, therefore the latter seems the
> way to
> > >>>> go.
> > >>>
> > >>> I have the same feeling:
> > >>> - document EBML as a specification that includes rules for defining
> a docType in the form of an EBML Schema
> > >>> - write an EBML Schema (updated specdata.xml) for Matroska and maybe
> webM
> > >>>
> > >>>>> Are some changes to specdata.xml acceptable? Such as a filename
> change
> > >>>>> or changing the name of the <table> element of some attributes?
> > >>>>
> > >>>> Well, like I said above the specdata.xml is used for generating both
> > >>>> documentation and code. Both should stay viable. If changes to it
> are
> > >>>> made then the accompanying tools must be updated as well.
> > >>>>
> > >>>>> Neither the current EBML specs nor the specdata.xml specifically
> refer
> > >>>>> to the hierarchical arrangement of the elements, but this could be
> > >>>>> presumed by their ordering. For instance, could any level 3
> element be
> > >>>>> a child of any level 2 Master-element? I presume not, but I don't
> > >>>>> think it's clear anywhere what parent-child relationships are
> > >>>>> feasible. Possibly specdata.xml and/or the EBML Schema Definition
> > >>>>> could define the relationship between levels of related elements
> > >>>>> similar to how an XML Schema (XSD) does.
> > >>>>
> > >>>> So far it is understood that an element not marked as a global
> element
> > >>>> must only occur as a child of its parent. Its parent is the last
> element
> > >>>> located before the child element in the specdata file with a lower
> level
> > >>>> than the child element. Or something like that.
> > >>>
> > >>> This will need some documentation. That's how I've understood the
> mkv spec as well but the definition for how an EBML Schema works should be
> explicit about this.
> > >>
> > >> I created a first draft of Matroska's specdata.xml with nested
> elements here: https://gist.github.com/dericed/f0a4bb0e7dc635ed1347 <
> https://gist.github.com/dericed/f0a4bb0e7dc635ed1347>. The content of the
> xml is the same but the definition is moved from element to
> element/documentation. And then elements are nested within elements
> according to their level and allowed location. I think a nested structure
> in an EBML Schema would make the location more clear than the current rule
> which is that the element is a child of the previous element with a higher
> element level value. Now the element is simply a child of the parent
> element. With a structure like this the level attribute would be redundant
> to the element structure.
> > >>
> > >> Another advantage of this structure is that is allowed the EBML
> Schema to be better adapted to foriegn language descriptions. Just as in
> XML Schema one could have multiple <documentation> nodes per <element> with
> different language attributes.
> > >>
> > >> I'd also like to propose change the EBML Schema attributes mandatory
> and multiple to their familiar XML Schema counterparts: minOccurs and
> maxOccurs. Here all mandotory="1" would become minOccurs="1" and
> multiple="1" would be maxOccurs="unbounded".
> > >>
> > >> Another idea is that the next version of EBML could add an element
> for schemaLocation which would be a url to the EBML Schema, thus a Matroska
> file could have an EBML header schemaLocation of
> https://github.com/Matroska-Org/foundation-source/blob/master/spectool/specdata.xml
> <
> https://github.com/Matroska-Org/foundation-source/blob/master/spectool/specdata.xml>
> so that validators could pull the appropriate schema for validation.
> > >>
> > >> Comments?
> > >> Dave Rice
> > >
> > > Maybe I'm way too late for this, but: does it really have to be XML?
> > > It's neither readable, nor inviting to add lots of details to the
> > > documentation elements.
> >
> > The basis of the Matroska spec is currently in XML, see
> https://github.com/Matroska-Org/foundation-source/blob/master/spectool/specdata.xml.
> The xml is the used to created the human-readable documentation. I think
> the XML is also used programmatically (I think in mkvalidator). So an XML
> document that defines an EBML document is not a new idea, but I would like
> to standardize how an EBML Schematic should be expressed. I think that
> following the analogy of XML Schematic makes sense.
>
> In my opinion, the existing spec is so vague exactly because nobody
> knew how to edit the spec, or because the spec was hard to edit, or
> because this xml file simply doesn't look very inviting to edit. That
> it feels a bit crammed in there, and that it's hard to do good text
> formatting. Most real specs are not edited in ad-hoc XML formats.
>
> Having a XML file defining Matroska elements might be useful, but I
> don't understand why it should be the definitive document.
> _______________________________________________
> Matroska-devel mailing list
> Matroska-devel at lists.matroska.org
> http://lists.matroska.org/cgi-bin/mailman/listinfo/matroska-devel
> Read Matroska-Devel on GMane:
> http://dir.gmane.org/gmane.comp.multimedia.matroska.devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.matroska.org/pipermail/matroska-devel/attachments/20151003/b701884e/attachment.html>


More information about the Matroska-devel mailing list