[Matroska-devel] Re: EBML
Martin Nilsson
matroska at mani.user.lysator.liu.se
Tue Feb 17 04:28:56 CET 2004
Paul Bryson wrote:
> I have spent many hours pouring over the ID3v2.4.0 docs. My greatest regret is
> that they didn't have an HTML version that I could just like directly to tag
> descriptions.
I did an HTML version of the revesion before 2.4.0, but it was so
incredibly time consuming that I really didn't want to go through it
again. I did however look into building an XML format for RFC-like
texts, envisioning one XSLT to make the text version and one to make the
HTML version. As with many projects I didn't come too far, mostly
because no one else was interested.
> Sometimes it was difficult to understand exactly what a given
> field was for, and in working with the Matroska Tags, I would simply link to
> those specs hoping that others would understand the purpose.
Understandable, but in my opinion wrong. Just include what you really
have a need for. The way specifications and libraries tend to grow is
probably one of the more difficult problems to overcome/avoid. You need
to have an objective look at your specification and question what is
good and what shouldn't be there, and that is _really_ hard. WRT tagging
I would suggest you to drop support for AENC, since no one uses it, and
EQU2, since it is underspecified. Instead there should have been a frame
with spectral average which could be used as input to a function that
generated equalization curves. Generally speaking though it would all be
a waste of time since people really only want to have volume
equalization between files (We did some feature research together with
mp3.com a while back).
Now back to the EBML spec. I've had another go at it and is now probably
in the last 10% (where 90% of the time is spent...). I've posted the
latest version at http://www.lysator.liu.se/~mani/ebml/ together with a
copy of matroska.edtd and an EBML DTD validator. Unfortunately it does
find some problems with Matroska.
[nilsson at mahoro ebml]$ ./vdtd matroska.edtd
Pass 1. (parser)
matroska.edtd:176: ID "80" is reserved.
Pass 2. (analysis)
matroska.edtd:169: Element name "chapteruid" already used at line 191.
matroska.edtd:68: Element name "trackuid" already used at line 190.
matroska.edtd:26: Element name "duration" already used at line 58.
matroska.edtd:58: Default reference "TrackDuration" does not exist.
matroska.edtd:47: Default reference "TrackDuration" does not exist.
ID 80 (ChapterDisplay) is reserved (all "x" are 0). This is the serious
one, since it would mean to change the binary format, unless the
reserved rule is dropped.
ChapterUID exists both in Chapters and Tagging. TrackUID exists both in
Track and Tagging. Duration exists both in Info and Cluster. These three
are just naming problems, since I decided that element names should be
unique in a DTD.
Finally the element referenced as default value to BlockDuration and
Duration (in TimeSlice) doesn't exist. Did you rename it?
About the document:
I've changed the ID semantics as Steve suggested and adjusted the total
number of IDs per class (hopefully to correct values). I'm almost done
with the DTD section and am about to begin the standard elements
section. Some questions here.
Should really SignatureSlot be part of the EBML core standard?
Isn't CRC-32 better implemented as a container element? Then the decoder
will know when it must calculate CRC-32 and the start and end is very
obvious. I also think the requirement for a CRC-32 element in every
level 1 elements should be dropped. It isn't even met in all files I've
found to experiment with.
Should we drop the string type? It is a complete subset of the utf8 type
and since I've defined ranges on utf8 strings to limit the range of
every byte, you can still define an element to have the same practical
meaning.
/Martin Nilsson
More information about the Matroska-devel
mailing list