[Matroska-general] EBML questions

Steve Lhomme steve.lhomme at free.fr
Sat May 12 21:03:01 CEST 2007

Josh Green wrote:
> Hello, I'm a developer of a format called CRAM which is used for
> compressing files containing audio and binary (such as MIDI audio
> instruments, etc).  This format isn't really in wide use yet, but we're
> currently polishing things up to advertise it more as an alternative to
> some other proprietary means of instrument compression.

Hi, sounds like a good idea :)

> We choose EBML as the basis for this format and have a couple questions
> concerning some parts of the spec which we're not following.

Well, the rest of the email doesn't have questions, but I'll answer them 
anyway ;)

> The CRAM format is not using CRC32 currently for level 1 chunks.  While
> we are considering using this for the EBML chunk itself, we don't think
> it makes sense to use it for the rest of the compressed data, since MD5
> is used for the compressed audio and binary data (2 signatures for
> uncompressed audio and binary).

CRC32 can be used at all levels and it's never mandatory, so it's fine 
not to use it.

> We aren't currently using some of the other chunks marked as Mandatory
> in the spec.  For example EBMLMaxIDLength and EBMLMaxSizeLength.

As long as the IDs are not longer than 4 bytes long existing parsers 
should have no problem. And the EBML coded size should not exceed 8 bytes.

In general in the EBML specs (like here 
http://www.matroska.org/technical/specs/) there are some default values. 
So if you don't set the value in the file (and the value is mandatory) 
you can assume the default value instead. The goal is not to write 
obvious value and save space. So in your case that's very good to use 
default values.

> This format is targeted more at compression than streaming, although the
> sample data itself may be streamed (WavPack or FLAC compressed data),
> this likely won't be across EBML chunk boundaries.  My questions are
> mainly in regards to developers possibly using other EBML parsing
> implementations for CRAM.  Would not following the above components of
> the EBML spec, lead to existing parsers failing to parse CRAM?  Perhaps
> this is a non-issue anyways, since not knowing the document type of an
> EBML file isn't very useful (the contents of the chunks are unknown).

You mean you don't set a DocType ? I think you should do that, because 
the default value is "matroska" and thus your files will be treated like 
matroska in many apps.

> We are currently breaking backwards compatibility with the CRAM spec due
> to some other reasons, so I'd like to get things right this time :)


BTW what library do you use to write your files ? libebml ? something 
you made yourself ?


More information about the Matroska-general mailing list