[Matroska-general] EBML questions

Josh Green josh at resonance.org
Sun May 13 15:48:19 CEST 2007

On Sat, 2007-05-12 at 21:03 +0200, Steve Lhomme wrote:
> Josh Green wrote:
> > Hello, I'm a developer of a format called CRAM which is used for
> > compressing files containing audio and binary (such as MIDI audio
> > instruments, etc).  This format isn't really in wide use yet, but we're
> > currently polishing things up to advertise it more as an alternative to
> > some other proprietary means of instrument compression.
> Hi, sounds like a good idea :)

I think so too.  It can be a real bummer when a bunch of free instrument
files are left compressed with some proprietary format that is less
accessible on other operating systems, or the company goes out of
business (sfpack for example).

> Well, the rest of the email doesn't have questions, but I'll answer them 
> anyway ;)

Yes I realized they weren't really questions, but more just wanting to
get clarification that I'm not doing something particularly wrong with
the EBML format ;)

> > This format is targeted more at compression than streaming, although the
> > sample data itself may be streamed (WavPack or FLAC compressed data),
> > this likely won't be across EBML chunk boundaries.  My questions are
> > mainly in regards to developers possibly using other EBML parsing
> > implementations for CRAM.  Would not following the above components of
> > the EBML spec, lead to existing parsers failing to parse CRAM?  Perhaps
> > this is a non-issue anyways, since not knowing the document type of an
> > EBML file isn't very useful (the contents of the chunks are unknown).
> You mean you don't set a DocType ? I think you should do that, because 
> the default value is "matroska" and thus your files will be treated like 
> matroska in many apps.

No, we do set our own DocType, DocTypeVersion and DocTypeReadVersion
fields (what I had meant was that if software isn't familiar with the
DocType, there isn't a way to interpret the contents of custom EBML
chunks).  There will be 3 types, "CRAM" for a lossless compressed file,
"CRAML" for a lossy compressed file and "CRAMC" for the hybrid
correction file (when combined with the matching CRAML results in the
original lossless file).  All that hybrid stuff thanks to Wavpack :)

I noticed that Matroska files have the DocType as the first field in the
EBML chunk.  I think we should also have our DocType there (currently
its after the EBMLVersion and EBMLReadVersion), to ease file

> > We are currently breaking backwards compatibility with the CRAM spec due
> > to some other reasons, so I'd like to get things right this time :)
> OK.
> BTW what library do you use to write your files ? libebml ? something 
> you made yourself ?

The current CRAM implementation is part of libInstPatch
(http://libinstpatch.resonance.org).  We have our own routines for
handling the EBML, although its really not too complicated to begin
with.  It would be good to verify that CRAM is parsable by other
implementations such as libebml.  One of the reasons we are breaking
backwards compatibility, was because I had falsely assumed before that
all integers (even values in fields) were stored variable length
encoded.  After looking over a matroska file with a hex editor, I
realized that it isn't even necessary, since the size can already be
inferred from the EBML chunk.  *So embarrassed*  That of course wouldn't
have happened if I was using something like libebml.  Fortunately CRAM
really isn't widely (if at all?) used yet.

Anyone who is interested though in instrument files such as SoundFont
and GigaSampler, might want to check out one of our side projects:

Its a staging area for CRAM right now, so its not quite yet ready for
use, but will be after the pending release of CRAM format version 4.

> Steve

Thank you very much for the responses, helps give me some confidence in
the current specification.  Best regards!

More information about the Matroska-general mailing list