[Matroska-devel] Re: EBML

Martin Nilsson matroska at mani.user.lysator.liu.se
Tue Feb 17 04:28:56 CET 2004


Paul Bryson wrote:
> I have spent many hours pouring over the ID3v2.4.0 docs.  My greatest regret is
> that they didn't have an HTML version that I could just like directly to tag
> descriptions.

I did an HTML version of the revesion before 2.4.0, but it was so 
incredibly time consuming that I really didn't want to go through it 
again. I did however look into building an XML format for RFC-like 
texts, envisioning one XSLT to make the text version and one to make the 
HTML version. As with many projects I didn't come too far, mostly 
because no one else was interested.

> Sometimes it was difficult to understand exactly what a given
> field was for, and in working with the Matroska Tags, I would simply link to
> those specs hoping that others would understand the purpose.

Understandable, but in my opinion wrong. Just include what you really 
have a need for. The way specifications and libraries tend to grow is 
probably one of the more difficult problems to overcome/avoid. You need 
to have an objective look at your specification and question what is 
good and what shouldn't be there, and that is _really_ hard. WRT tagging 
I would suggest you to drop support for AENC, since no one uses it, and 
EQU2, since it is underspecified. Instead there should have been a frame 
with spectral average which could be used as input to a function that 
generated equalization curves. Generally speaking though it would all be 
a waste of time since people really only want to have volume 
equalization between files (We did some feature research together with 
mp3.com a while back).

Now back to the EBML spec. I've had another go at it and is now probably 
in the last 10% (where 90% of the time is spent...). I've posted the 
latest version at http://www.lysator.liu.se/~mani/ebml/ together with a 
copy of matroska.edtd and an EBML DTD validator. Unfortunately it does 
find some problems with Matroska.

[nilsson at mahoro ebml]$ ./vdtd matroska.edtd
Pass 1. (parser)
matroska.edtd:176: ID "80" is reserved.
Pass 2. (analysis)
matroska.edtd:169: Element name "chapteruid" already used at line 191.
matroska.edtd:68: Element name "trackuid" already used at line 190.
matroska.edtd:26: Element name "duration" already used at line 58.
matroska.edtd:58: Default reference "TrackDuration" does not exist.
matroska.edtd:47: Default reference "TrackDuration" does not exist.

ID 80 (ChapterDisplay) is reserved (all "x" are 0). This is the serious 
one, since it would mean to change the binary format, unless the 
reserved rule is dropped.

ChapterUID exists both in Chapters and Tagging. TrackUID exists both in 
Track and Tagging. Duration exists both in Info and Cluster. These three 
are just naming problems, since I decided that element names should be 
unique in a DTD.

Finally the element referenced as default value to BlockDuration and 
Duration (in TimeSlice) doesn't exist. Did you rename it?


About the document:

I've changed the ID semantics as Steve suggested and adjusted the total 
number of IDs per class (hopefully to correct values). I'm almost done 
with the DTD section and am about to begin the standard elements 
section. Some questions here.

Should really SignatureSlot be part of the EBML core standard?

Isn't CRC-32 better implemented as a container element? Then the decoder 
will know when it must calculate CRC-32 and the start and end is very 
obvious. I also think the requirement for a CRC-32 element in every 
level 1 elements should be dropped. It isn't even met in all files I've 
found to experiment with.

Should we drop the string type? It is a complete subset of the utf8 type 
and since I've defined ranges on utf8 strings to limit the range of 
every byte, you can still define an element to have the same practical 
meaning.

/Martin Nilsson





More information about the Matroska-devel mailing list