[matroska-devel] Re: Matroska/EBML specs ??

Steve Lhomme steve.lhomme at free.fr
Thu Feb 20 15:24:13 CET 2003

> Hi,

> Can anyone point to a reference where I can find the "exact" specs of
> ebml..

For EBML :
Which is similar to what is in the matroska specs :
(read directly from the CVS).

> I have been through the ones given on websites but i still find that
> someway
> they dont specify the bnf type format.. one which would specify the

What do you call the BNF format ?

> grammar,.. I think that matroska specs are still being worked upon and
> maybe
> ebml too is not in the final state (if i am wrong pls correct me).. But

The matroska specs are modified when some problems are found or additions needed
(the lastest was about cueing handling). I would call it stable even though the
recent changes are only minor (only things not already coded).
The EBML part has changed for a long time (considering the age of matroska). And
there is no plans to change it. Only additions in the EBML head (adding
something like a DTD to describe the coming file).

> by
> exact, i mean something as clear and precise as the WBXML on
> http://www.w3.org/TR/wbxml/ or better still the Binary XML Content
> Format
> Specification (WAP-192-WBXML-20010725-a) at
> http://www.wapforum.org/what/technical.htm ?

Mmm, I see a bit what you call BNF. But I'm not sure this notation will be mush
better. EBML is a very basic/minimal format. I think the EBML specs page covers
all the technical aspect of the format. But if something is missing (an
questions ?) I'll glady add it to the specs.

> The reason why I am asking this is basically the more clearer the
> exact
> specs of EBML to us, the faster we can get the parser in place.. 
> okay, As a first cut, the steps i have in mind is.. 
> 1. Successful build of the existing code.	- Done
> 2. Reading the headers of a dummy matroska file using the existing
> test sources. - Done
> 3. Understand the ebml in theory and hand-code a small XML to EBML. -
> Almost Done (thats why this mail)

Do you plan to make an EBML<->XML converter ? That could be interresting. The
problem is that XML is not suited for binary. So "binary" EBML elements will
have to be converted to MIME64 or be put in an external file. Anyway that's a
very good project we've been thinking about already. It will open a lot of doors
for EBML processing/generating. Maybe some XLST code would be the preffered
form, as it's a standard related to XML.

> 4. Write code to parse this handcoded EBML back to the XML. - This code
> is
> imp and will go a long way in helpin us write the ebml parser, basically
> i
> would suspect that our further versions will stem from this basic
> parser code. 
> 5. While writing the above parser code, we will invariably get a feel of
> how
> to go about the bigger project.. that is in terms of code structure,
> apis and function calls.
> Moritz had specifically wanted that the parser code should NOT do any
> file
> I/O,.. but for the time being we maybe using some file I/O to read some
> test
> ebml files and see if the parser works correctly.. later we can always
> knock
> off the I/O layer.. 

Well, what Moritz wanted to say is that the I/O handling should be "pluggable"
so that some platform/architectures that already handle I/O internally could
plug theirs to your code. In C, a structure with function pointers with stuff
like open, read, write, close should be a good starting point.

> As for the parser,.. as spyder we are also weighing the difference
> between
> sax or tree.. the tree would have worse performance than the sax for
> big
> ebml docs,... say something around 50 MB sized files,.. but somethings
> can
> be better acheived with a sax parser, using callbacks etc,... anyone
> with any ideas.. ??

I'm no SAX expert. I just coded a parser that sounded logical to me. It contains
callbacks, so I assume it's more SAX like.

> Pls take time to point us to the most recent specs of EBML and of
> Matroska that the developer group currently refers too.

See above.

More information about the Matroska-devel mailing list