[Matroska-devel] Re: EBML Namespaces

HAESSIG Jean-Christophe haessije at eps.e-i.com
Fri Apr 28 14:30:31 CEST 2006

> I'm not sure if I understand this comment.  There are global 
> EBML IDs defined that would be valid for any EBML format.  
> There are 8 for the EBML header.  There CRC-32 and VOID 
> elements are another two which I /think/ are valid for all 
> EBML files.  That is 10 altogether.  That shouldn't be to 

Well, I thought adding namespace support would be a good time to
clean this up. I think EBML will evolve and new IDs will be used
for new features.

> many to keep track of.  I am suggesting one more, and with a 

The need to keep track of things is always a hassle, especially
if another solution is available. Namespaces provide isolation
between different vocabularies and EBML is no exception. Within
its own private namespace, EBML could evolve without worrying
about which IDs the other formats have taken.

> 4 byte ID to boot.  Between [80] and [1F][FF][FF][FF] there 
> are more than 500 million different IDs, so it makes a random 
> collision pretty unlikely, even if someone weren't aware of 
> the pre-existing IDs. 

Yes, it's unlikely, until one runs into a collison. Not long ago,
it was also highly impossible for the 4GB memory limit on 32-bit
processors to be reached. Moreover I do not think format writers
choose their IDs really randomly, they're humans after all...
If there is a way to completely avoid such problems, I would
rather take it.

> The ID's I suggested make at least 16 bytes overhead for 
> embedding another format (1 byte DocType, though in this case 
> "matroska" is 8 bytes).  You could certainly make two of 
> those single byte IDs to drop it to 10 byte minimum.  I would 
> leave the parent as a long ID to avoid collision.

The use of namespaces goes way beyond embedding other formats.
For that sole purpose, your proposal is valid, but it doesn't
really suit other uses such as annotation.

> I would think that if a format is low-bitrate enough for it 
> to matter and wants to have that many instances of another 
> format, it should just include all of the sub-format's 
> elements into it's own specifications.

That's not the right way to do it IMO. If you have a very good
multimedia container ;) and you want to add some features to it,
(e.g. for a video editing program, adding video data to play
the stream backwards effectively) you don't want to rewrite it
from scratch. You just have to add new elements in your own
namespace. The very purpose of this is that all the video
players for that format still will be able to play the file and
will just ignore the extra elements from the namespaces unknown
to them.

> For instance, if I wanted to include some Matroska Tags, 
> excluding the overhead for a name space breakout, you will 
> need at least 25 bytes just to do a single Tag.  And that 
> ignores the oddity of such a task.  It would be more 
> efficient to just have a part of the specs for your format 
> say, "Matroska element 'Tags' and all of it's children are 
> valid at this point."

This is correct if your format is designed to embed the
features from the beginning. In this case you don't even need
to use matroska's IDs. However namespaces become useful for
extensions that the original engineering didn't include or
are not relevant in the original design. For example an
hypothetic document format could be defined with EBML. Editors
may want to include revision data, so they can put into the
different elements some info about who wrote them, when they
were written and if applicable, how they were altered. This
information is irrelevant to basic document readers, and
should not be included in the original format.

This opens the possibility to write formats that are not
usable by themselves, but are designed to be embedded as
extensions in other formats.


More information about the Matroska-devel mailing list