[Matroska-devel] Re: EBML Namespaces

Paul Bryson paul at msn.com
Thu Apr 27 19:24:11 CEST 2006


HAESSIG Jean-Christophe wrote:
>> Something  without much chance of random collision like:
>> [12][34][56][78]
>>
>> If you want to be able to indicate *which* name space you are 
>> changing to, then include a child element that includes the 
>> format's DocType:
>> [13][87][65][43]
> 
> As you describe it, it seems that the EBML generic elements
> should be valid in all namespaces. I thought it would be
> much cleaner for EBML to have its own namespace, which would
> improve the extensibility of EBML. Otherwise further additions
> of new class-ids used by EBML may collide with older formats.

I'm not sure if I understand this comment.  There are global EBML IDs 
defined that would be valid for any EBML format.  There are 8 for the 
EBML header.  There CRC-32 and VOID elements are another two which I 
/think/ are valid for all EBML files.  That is 10 altogether.  That 
shouldn't be to many to keep track of.  I am suggesting one more, and 
with a 4 byte ID to boot.  Between [80] and [1F][FF][FF][FF] there are 
more than 500 million different IDs, so it makes a random collision 
pretty unlikely, even if someone weren't aware of the pre-existing IDs. 
(Although I don't know how you could not be as you need them to make the 
header.

>> So, format XYZ that wants to contain a Matroska file would 
>> contain this:
>> [12][34][56][78] (size)
>> 	[13][87][65][43] (size) {matroska}
>> 	[14][87][12][87] (size)
>> 		[18][53][80][67] (size)

> I think it would lead to too much overhead. For one namespace
> switch you would have at least 10 to 16, if not more extra bytes.
> Formats intensively mixing 3 or 4 vocabularies would yield
> huge files...

The ID's I suggested make at least 16 bytes overhead for embedding 
another format (1 byte DocType, though in this case "matroska" is 8 
bytes).  You could certainly make two of those single byte IDs to drop 
it to 10 byte minimum.  I would leave the parent as a long ID to avoid 
collision.

I would think that if a format is low-bitrate enough for it to matter 
and wants to have that many instances of another format, it should just 
include all of the sub-format's elements into it's own specifications.

For instance, if I wanted to include some Matroska Tags, excluding the 
overhead for a name space breakout, you will need at least 25 bytes just 
to do a single Tag.  And that ignores the oddity of such a task.  It 
would be more efficient to just have a part of the specs for your format 
say, "Matroska element 'Tags' and all of it's children are valid at this 
point."

On the other hand, if you were going to include 20 Matroska Tags, and 
had them all grouped together in a single name space switch, you will be 
storing maybe 1000 bytes of data, of which the 23 bytes to change the 
name space would be pretty insignificant.


Atamido




More information about the Matroska-devel mailing list