[Matroska-devel] EBML Namespaces

HAESSIG Jean-Christophe haessije at eps.e-i.com
Thu Apr 20 17:52:14 CEST 2006


 
Sorry for the long break, I couldn't find time to answer
properly, since the subject isn't trivial.

> > would then be encoded in the byte stream as [080A45DFA3], 
> which makes 
> > room for 7 bits.
> 
> Sliding of 8 bits to the right, should make room for 8 bits. 

Except that one bit out of the 8 is eated by the size descriptor
because the ID is made longer.

> Depending on the EBML header we could know wether IDs are 
> supposed to have a namespace or not. But I may have another 
> option: why not but the bits
> *after* the current bits used for the ID ? All the ID 

Putting the namespace value before or after the Class-ID would
basically have the same effect, except that values are more
likely to change in their low order digits, and therefore it's
harder to find unused space here. 

> processing of IDs would remain unchanged. And we would only 

I'm not quite sure how you see this, but AFAIC imagine, one
should seek for the namespace part of the ID, remove it, and then
resume with normal ID interpretation.

> need code to handle the namespace, the same way we have the 
> length. So parsing would be split like this:
> 
> [ID][namespace][size][data]
> it could also be
> [ID][size][namespace][data]

I sense that you want to encode the namespace value as a
totally separate field, with equal status compared to Class-ID,
Size, and Data. However there is a slight problem with this :
EBML is supposed to be a byte-aligned format and it would
require at least 1 extra byte for each element. This is not bad
in itself, but it would waste a great amount of bits, since I do
not expect files with more than 5 mixed namespaces to be
frequent. Therefore, I expect the namespace value to take up to
3 bits in most cases, this is why I try to pack it into an
existing field. 

You seem to be prepared to make big changes to the format, but
I don't know to what extent whe should break compatibility...

> What we need is to make one of the namespace in the document 
> be set as "default", ie not marked. The same way we don't 
> have to write mandatory elements that have the default value. 
> This way Matroska can keep its low overhead and be extended 
> by new namespaces.

This could be the best solution, if we can find a way to
express the namespace descriptors in a space-efficient manner
*and* not making it a pain in the a** for random-seeking
applications to recover the namespace state. However if it
can't be done I would rather have the namespace expressed for
each element in a file using them, and have files with no
namespaces at all (ns desl length=0) like plain Matroska.
With proper prefix-coding of the ns descriptor, one could use
only one or two *bits* per element.

> Also, if we use an EBML element to say: all lower elements 
> use namespace XYZ it could replace the default value. 
> Namespace switching would only occur in very localized 
> places. That's the difference between having the "using 
> namespace XYZ" approach and the "XYZ::element" one. We might 
> use both (as in C++).

Using such a following-sibling approach would hurt seeking as
it is currently done. Of course we can add specific rules
like : an element containing namespace switches MUST NOT have
its sub-elements indexed by seek heads, except if these seek
heads point the parser to all relevant ns switches. This
raises an important issue about the effective structure of
libraries (of course, people who implement the whole parsing
for their own application will have less problems here)
dedicated to do the parsing. I believe that namespace
processing really should be unknown to the specific
applications.

> Yes, I was thinking about that too. That's why I prefer to 
> keep the IDs intact and the format proposed above is good. 
> Seeking (at least in
> matroska) can remain unchanged. For other formats we would 

Since there is no foreign-format mixing possibility due to
The lack of namespaces, there is indeed no problem.

> need to take the namespace in account to make sure the 
> element is the namespace we're looking.

I was thinking a little more about seeking and I came to the
conclusion that seeking (indexing and pointing to some part
of the file, and the like) should go in EBML (or some seeking
NS), and not in each specific application. Why ? Imagine you
have some program to add comments in EBML files. You could
take any element in the file and add a string comment. The app
has its private elements and would use a separate namespace,
so it wouldn't interfere with the existing data. The file would
still be readable by the original program, the natural rule
being to simply ignore unknown elements. However, adding
elements changes the size of the file, and therefore the
positions to which the seek heads point. Moving seeking into
EBML would enable automatic relocation of the seek-heads.

A more interesting thing with this is that local namespace
state can be recovered while seeking, since it would be the
job of the EBML library to make seek heads and it could
include all the necessary information.

JC



More information about the Matroska-devel mailing list