[Matroska-devel] Element order in EBML/libebml

Moritz Bunkus moritz at bunkus.org
Tue Oct 21 17:07:01 CEST 2003


Here's my take on the 'does order matter?' issue. It's rather long, but
please read it fully, because I think that this whole business is
basically a non-issue.

1) What do we want?

We want to be able to play a movie ( = a Matroska file, same goes for
audio only files, I'm just simplifying) with as little effort as
possible. We want to have files that are as small as possible while
retaining as much information about the contents as possible. We want to
have a flexible format.

2) How can that be achieved?

Little effort: This means (among other things) that you should be able
to play the file from front to back without having to seek backwards at
all. This implies having the track information before the first cluster,
and it implies a certain order of data in the file. This also means that
we do NOT want to buffer half of the file just because audio and video
are not interleaved properly.

Small files: We don't want to store useless stuff like a 'element order
number' for elements that occur very often.

Flexible format: We based EBML on XML, and Matroska is our DTD (a bit
simplified, but the comparison works in this case). But EBML is neither
equal to XML, nor does it follow the same principles in every way.

3) How do others achieve this?

XML is very flexible. It lets you create your own elements. If you have
a DTD then you have a certain set of elements that can be children of
other elements, may occur multiple times etc. Sounds like Matroska,
doesn't it? Yes, but one important difference is the topic of 'ordering
elements'. For a XML document with a DTD you have a fixed order of
different elements, meaning that
  <title>Stupid White Men</title>
  <author>Michael Moore</author>
  <author>Michael Moore</author>
  <title>Stupid White Men</title>
are not the same document! One will comply to the DTD while the other
does not. In Matroska this is DIFFERENT - here the chapters may come
after the clusters or before them.
In XML the order of multiple entities of the same element do NOT
matter. These two documents are the same:
At the moment we have the same behaviour with Matroska.

Other multimedia file formats however chose to have an order - the order
in which you put the data into the container does matter and should (or
must?) be kept intact in order for the demuxer/decoder to be handled
properly. Fully indexed formats usually don't need this (!) because they
have the index which will tell the block order. Matroska is not a fully
indexed format. However in many cases we already have an implicit order:
Clusters have a cluster timecode, and we agree that blocks should be
stored in coding order (which is basically the timecode if no B frames
are present).

4) How should we do it?

Just to clarify some words (I copy the definitions from a RFC2119):

1. MUST   This word, or the terms "REQUIRED" or "SHALL", mean that the
   definition is an absolute requirement of the specification.
3. SHOULD   This word, or the adjective "RECOMMENDED", mean that there
   may exist valid reasons in particular circumstances to ignore a
   particular item, but the full implications must be understood and
   carefully weighed before choosing a different course.

I will use these two words in the following requirements. My goal is to
chose the PRACTICAL solution, not necessarily the MOST ELEGANT or

1. Order of Matroska elements does generally NOT matter. Exceptions:
2. The order of Clusters and BlockGroups DOES matter. Details:
  1. Clusters MUST be ordered by the smallest global timecode of all
     block groups present in it. In most cases this is the cluster
     timecode, but it does not have to be.
  2. The block groups for each track MUST be ordered by their coding
3. Block groups SHOULD be interleaved sanely. No limitations will be
   given for this, but basically all the information required to play
   back all tracks at timecode X should be in close proximity to each
   other in the file.
4. All level 1 elements SHOULD be easy to find by the demuxer. This
   means that they either are positioned before the first cluster or
   that they can be found by parsing and following meta seek elements
   (which again SHOULD be easy to find).

What this means is that what we already do is the Right Way to do
things. We should just formalize it. We should not have to add an
element to each block group describing its position in the Great Scheme
of Things. It would create a HUGE overhead, and it would make us the
laughing stock of the whole multimedia industry.

 ==> Ciao, Mosu (Moritz Bunkus)

More information about the Matroska-devel mailing list