[Matroska-devel] EBML Namespaces

Steve Lhomme steve.lhomme at free.fr
Mon Apr 10 12:26:16 CEST 2006


HAESSIG Jean-Christophe wrote:
>> Now the idea of a namespace would mean that the same ID would 
>> be used by
>> 2 formats but with a different meaning. But given you set the 
>> different namespace in each ID, de facto they have a 
>> different ID. So I don't really see how it solves the problem 
>> of collision.
> 
> Empirically, yes. But the namespace ID should not be seen as
> part of the Class-ID, since the used namespace ID can virtually

Yes, if that works this way, it's much better as it would break forward 
compatibility of older files (Matroska at least).

> hold any value, and will probably be different for the same
> vocabulary used in two distinct files. If you had 2 files, each
> One using two namespaces : File A [0 (EBML); 1 (Private NS A)],
> And File B [0 (EBML); 1(Private NS B)], a suitable program could
> Merge them in file C [0 (EBML); 1 (Private NS A); 2(Private NS B)].
> 
> I think a little confusion has been introduced in Class-ID naming
> Because their representation in the specs is the full byte dump,
> so ID [A1] is represented as [A1], while its real VINT value
> really is 21(hex).

Yes, but it's easier for coders who are looking for a value. Maybe we 
should add the simplified value too.

> After realizing that the size descriptor is not part of the class
> ID value we can introduce another object that will not be counted
> as part of the class ID : the namespace ID.

Something like EbmlString, EbmlUInt, EbmlMaster, etc ? That's an option. 
The problem I see is how to mix elements of different namespaces.

Or it could just mean like in C++:
using namespace xyz;

So that all IDs from xyz should be recognised. That implies that all 
namespaces used in a file cannot have overlapping IDs (collision conflict).

> For example, if the namespace ID width is 3, the ID represented
> as [81] would have VINT value 1, namespace 0. The same ID in
> namespace 1 would read [91] and [F1] in namespace 7. Notice that
> only the representation in the byte stream changes, not the real
> value of the ID.

This is a good solution. But it probably wouldn't work with Matroska 
(the only known format to use EBML so far). Because the bit(s) used to 
mark the namespace are probably already used by some IDs. Also, the 
limit to 3 (or 2 or 5) is arbitrary and doesn't meet the goal of EBML to 
be a format with no limits. (the only one we have is Matroska legacy, 
but we could evolve EBML independently of Matroska too)

>> Yes, but you still need to map, at the lowest level, the 
>> namespaces for each upper level reader.
> 
> Of course. This job will be done by new elements in the EBML
> namespace (further noted as NSDE -- namespace declaration elements).
> I proposed value 0 for the namespace for EBML elements, mainly for
> convenience reasons.
> We need one element to set the namespace width and one container

Setting the namespace width might be a good idea. Because older formats 
(like Matroska) could set it to 0 (default value of the new element in 
the EBML header). And new files could use more space for the IDs (with 
namespace).

In that case each namespace could use a custom (to the file) ID and be 
defined by a URL (string ID) or a (EBML) DTD.

> element to declare a namespace : in this element must be a
> sub-element to set the namespace value, and a sub-element to
> associate a namespace key with it (the only thing formats need to
> be globally unique).
> 
> The trickiest problem I can see is deciding in what scope a namespace
> is active. The cleanest rule would be (just as in XML): a NSDE
> controls the namespace of its parent and its parent's children (the
> NSDE is therefore included) but this would be harder to implement
> because it requires forward-checking to decide to which namespace
> the current element belongs. Happily we are allowed to add
> restrictions on where NSDEs can be used in elements, if any.

Given we extend the ID size (to include the namespace of each ID) I 
don't see a problem here. The scope applies to the ID itself.

> Another scoping rule can be "following-siblings" where a NSDE changes
> NS rules for the next elements and their children. It is technically
> correct and easy to implement, but for the moment I dislike it, I
> can't tell why...
> 
> A third option is to only allow the use of NSDE near the beginning
> of the file and make the rules global to the whole EBML file, but this
> is rather gory.
> 
>> Sure. Maybe I didn't get your solution right. But I'm glad 
>> someone is trying to extend EBML. The main missing feature 
>> for the moment is the inability at the lower level to know if 
>> an element is EBMLMaster or not. 
>> So it's impossible to display a map of an EBML document 
>> without knowing the semantic.
> 
> I've had some success in that, but not full, which means my solution
> cannot be used in real programs (see the attached Python
> script -- GTK2 libs needed), the idea is to always parse the content
> of elements -- be they master or not. The data in the element is
> searched for sub-elements. If the length of the found sub-elements
> overflows the parent, then parsing is cancelled and the data returns
> to raw status.

Yeah, there are too many possible false alarm. That's not a reliable 
solution.

> Of course, this doesn't work if the data looks like legitimate EBML,
> but in fact isn't. There I can see only one solution : escape it.

No, the data in each EBML should not be modified because of the EBML ID 
it's in. That will make parsers way too complex. There could be a rule 
that all EBML Master IDs have a certain bit set, and the others don't. 
That could mean that one of the ID bits would be used for EBML Master.

That will break Matroska compatibility but it could be added as an EBML 
2 version (and Matroska v3 bitstream).

> A code that says 'EBML stops here' should be inserted just before
> he raw data that needs it. This job can be done by a normal EBML
> element (with size 0), which is minimum 2 bytes long. Statistically
> I didn't encounter much cases where bogus EBML was interpreted, so
> it wouldn't be a problem for terseness. As an added benefit, this
> code could be used as a marker for the end of unbounded (size
> unknown) container elements and totally relieve the class ID from
> Providing hints about the level of that element (which is currently
> the case).
> 
> And last, but not least, to provide real compositing and annotation
> with namespaces, all elements should be allowed to contain
> sub-elements (except size unbounded ones).

-- 
robUx4 on blog <http://robux4.blogspot.com/>



More information about the Matroska-devel mailing list