[Matroska-devel] Matroska Roundup

Steve Lhomme steve.lhomme at free.fr
Sat Sep 3 15:44:43 CEST 2005

As I have to write a summary of what I "invented" prior to joining DivX, 
I thought it would be nice to share it as a short and technical to what 
Matroska is.

1) EBML : an XML inspired binary format

* As for XML, there are elements that contains other elements. Each 
element has an ID, a length and a content.
* The content can be either a defined type (signed/unsigned integer, 
float (32/64 bits), UTF-8 string, Date, Binary) or a "master" element 
that contains other elements.
* Dates are in nanoseconds relative to the beggining of the millenium at 
* The ID and length are coded using a VLC like coding (similar to UTF-8 
with bits reversed). The length can be -1 to define an unknown/infinite 
* Each nesting level can contain elements only for this level or that 
can be used on different levels (global elements)
* Every EBML based format needs a semantic document to be able to 
interpret it (similar to a DTD or schema for XML).
* Every EBML based format starts with en EBML header describing what the 
file/stream contains.
* There are a few predefined elements common to all EBML formats like 
the Void element or the CRC-32 element.
* In the semantic description, each element can have a default value 
that doesn't need to be in the output stream as this value can be 
deduced from the receiver
* In the semantic description, each element can be marked as mandatory 
and therefore should be present in the output stream, unless it has a 
default value

more details on: http://ebml.sourceforge.net/specs/

2) Matroska : an EBML based multimedia container

* error-resilience using long EBML IDs for level 0 and level 1 elements, 
Clusters (the Level 1 element containing the data from the codecs), 
timecodes and CRC at any level (usually level 1)
* each "movie" is described in the scope of a Segment (level 0 element)
* Segments have a Unique ID that should make it unique in the world
* Segments can be "virtually" linked together using hard/medium/soft 
** hard-linking consists of specifying a next or previous segment ID
** medium-linking uses chapters to specify that a Segment has to be 
played in place of the chapter
** soft-linking uses chapter commands to jump to a location in another 

* Chapters can be nested to create multiple levels and a hierarchy 
between parts of a Segment
* A chapter can have a name in various languages using nested elements
* Tags can "target" either a chapter, a track or a segment directly
* Chapters are part of an edition
* A chapter edition can be marked as ordered, meaning the content is 
played in non sequential order (follow the timecodes set in the chapters)
* Chapters can codec various codec commands. These commands are 
interpreted by a chapter codec during playback (only used for ordered 
* There is a set of defined Tag names/types but any other tag can be used
* Tag values can be in different languages using nested elements
* Video frames needing other frames to be decoded embed the timecode of 
such references at the container level (allow more than 2 references and 
out of order references)
* All codec data are stored in coding order, the timecodes correspond to 
the display order
* Timecodes can be "scaled" on a segment basis when a precision of a 
nanosecond is not required and to save space at other levels.
new type of frame referencing (more complete than the usual IPB system)
* Each audio, video or subtitle frame is contained in a Block that is in 
a Cluster. The Cluster has a global timecode and the Blocks have a 
timecode value relative to the cluster.
* Frames in a block can be coupled together using lacing. Lacing consist 
of a header describing the position of each element of the lace and then 
the elements. Lacing include fixed-size, Xiph-like (similar to ogg) and 
EBML-like (the difference between the previous frame is coded in an 
* A matroska file can contain an unlimited number of attached files 
described by a name and a MIME type
* Each audio, video, subtitle track can be encrypted separately, either 
the codec init data or the whole track data. Various encryption methods 
are available
* A Segment has to specify when tracks are "silent" during the time 
scope of the Segment
* Video and subtitle tracks can specify the pixel and display 
dimensions, including cropping values
* Audio support SBR like audio formats (sampling frequency different 
than the one actually specified in the file)
* Seeking through Cluster can be accelerated by the use of Cue points
* Seeking through all a Segment can be accelerated by the use of the 
Meta Seek that gives the position of some elements in the stream
* Segments can be concatenated into a single file and played 
sequentially and seen as different entities/files from the player

more details on: http://www.matroska.org/technical/specs/

3) DvdMenuXtractor : DVD extraction tool

DvdMenuXtractor is a graphic tool to extract all the useful data from a 
DVD: video, audio, subs, menu, chapters.
It includes an IFO reader using libdvdread, that generates XML for 
chapters and segments usable by MKVToolnix and AVI-Mux GUI.
It also extract timecode files for each extracted stream, including the 
gaps found in the VOB streams
MPEG2 streams are extracted into .m2v files (MPEG ES)
Audio streams are extracted to .mp2, .wav, .ac3, .dts depending on the 
format in the VOB file
Subtitle streams are extracted as .sub files and the corresponding .idx file
Upon extraction a batch file is created for each output segment possible 
(one segment by DVD logical entity) to mux the file in matroska.
Each extracted stream can be reencoded separately, the timecode and XML 
files remain the name.
DvdMenuXtractor uses wxWidgets.

More information about the Matroska-devel mailing list