[Matroska-devel] [RFC] EBML Schema

Steve Lhomme steve.lhomme at free.fr
Wed Dec 8 17:12:43 CET 2004


Hi,

Dean Scarff a écrit :
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> I've put together a schema (tentatively termed "EBML Schema") for a
> schema for describing EBML[1] documents (whew!).  A quick search of
> matroska.devel suggested this was under consideration but hasn't been
> discussed since May.
> 
> Please bear with me, the rest of this message may be as confusing as
> the opening sentence.  The EBML Schema schema is written in RELAX
> NG[2].  It can be easily converted into an XML Schema[3].

I heard about both, but Relax NG is still a weird thing to me. I know 
Schema are DTDs written in XML which gives more possibilities to 
describe an XML format (although XML is described as a DTD...).

> EBML Schema instances can be made valid RELAX NG instances by using a
> namespace prefix for EBML specific elements.  This allows the
> *structure* of EBML files to be represented in an XML format, and
> therefore existing tools for validating XML against a schema (eg
> libxml2) can be leveraged.  It's also nice to provide a human-readable
> alternative representation that stays reasonably true to the EBML

That could be good. So far we manage to extract informations about 
EBML/Matroska files with basic tools that ouptut text files. But 
outputing XML that actually refers to a schema would be nice.

> format.  The biggest problem that I can see is EBML's support of
> multiple root elements (segments) which requires an extra root element
> to wrap them all up.  I kludged the issue by using the ebml signature
> (0x1a45dfa3) as a root element and nesting all segments under that.
> This should be fair, since a new signature really indicates a new
> document.

Well the only format using EBML now is Matroska (AFAIK). And The EBML 
Segment is at the same level as the same level as the EBML header. You 
can have multiple Segments in the same file and even multiple EBML head 
(but in that case how to handle subsequent heads is not yet defined). I 
hope it fits your specs.

> The EBML Schema schema is available at:
> <http://scarff.id.au/ns/EBMLschema/0.2RFC/ebml-schema.rng>
> or in XML Schema:
> <http://scarff.id.au/ns/EBMLschema/0.2RFC/ebml-schema.xsd>

I had a look at both and didn't see any mention of the EBML IDs, nor the 
names we have given to EBML elements. Is that normal ?

> An EBML Schema instance for Matroska is available at:
> <http://scarff.id.au/ns/EBML/Matroska%200.2RFC/matroska.xml>

That one contains the IDs and names :)

So does it mean you have an EBML Schema, create an instance based on 
this Schema ? And then you can have an XML file based on the instance ?

> Just in case it's still not clear: I've discussed 3 levels of XML
> here.  The EBML Schema schema; instances of EBML Schema; and an XML
> presentation-layer for EBML, which can be validated against instances
> of EBML Schema.

I hope it means the same I said above ;)

> As yet, I have not written a specification for EBML Schema; but the
> semantics of elements are inherited from [1] and [2], with behaviour
> as one expects.  However, if I receive positive feedback, I can
> certainly knock up a DocBook spec.

That's a good work.

I can also ask some help from a friend of mine (Fabio Arciniegas) who 
wrote some books on XML and (IIRC) worked on Relax NG.

cya



More information about the Matroska-devel mailing list