[Matroska-devel] Re: EBML Namespaces
Paul Bryson
paul at msn.com
Thu Apr 27 19:24:11 CEST 2006
HAESSIG Jean-Christophe wrote:
>> Something without much chance of random collision like:
>> [12][34][56][78]
>>
>> If you want to be able to indicate *which* name space you are
>> changing to, then include a child element that includes the
>> format's DocType:
>> [13][87][65][43]
>
> As you describe it, it seems that the EBML generic elements
> should be valid in all namespaces. I thought it would be
> much cleaner for EBML to have its own namespace, which would
> improve the extensibility of EBML. Otherwise further additions
> of new class-ids used by EBML may collide with older formats.
I'm not sure if I understand this comment. There are global EBML IDs
defined that would be valid for any EBML format. There are 8 for the
EBML header. There CRC-32 and VOID elements are another two which I
/think/ are valid for all EBML files. That is 10 altogether. That
shouldn't be to many to keep track of. I am suggesting one more, and
with a 4 byte ID to boot. Between [80] and [1F][FF][FF][FF] there are
more than 500 million different IDs, so it makes a random collision
pretty unlikely, even if someone weren't aware of the pre-existing IDs.
(Although I don't know how you could not be as you need them to make the
header.
>> So, format XYZ that wants to contain a Matroska file would
>> contain this:
>> [12][34][56][78] (size)
>> [13][87][65][43] (size) {matroska}
>> [14][87][12][87] (size)
>> [18][53][80][67] (size)
> I think it would lead to too much overhead. For one namespace
> switch you would have at least 10 to 16, if not more extra bytes.
> Formats intensively mixing 3 or 4 vocabularies would yield
> huge files...
The ID's I suggested make at least 16 bytes overhead for embedding
another format (1 byte DocType, though in this case "matroska" is 8
bytes). You could certainly make two of those single byte IDs to drop
it to 10 byte minimum. I would leave the parent as a long ID to avoid
collision.
I would think that if a format is low-bitrate enough for it to matter
and wants to have that many instances of another format, it should just
include all of the sub-format's elements into it's own specifications.
For instance, if I wanted to include some Matroska Tags, excluding the
overhead for a name space breakout, you will need at least 25 bytes just
to do a single Tag. And that ignores the oddity of such a task. It
would be more efficient to just have a part of the specs for your format
say, "Matroska element 'Tags' and all of it's children are valid at this
point."
On the other hand, if you were going to include 20 Matroska Tags, and
had them all grouped together in a single name space switch, you will be
storing maybe 1000 bytes of data, of which the 23 bytes to change the
name space would be pretty insignificant.
Atamido
More information about the Matroska-devel
mailing list