[Matroska-devel] EBML
Martin Nilsson
matroska at mani.user.lysator.liu.se
Fri Feb 13 02:01:48 CET 2004
Steve Lhomme wrote:
>
> "As an example ID 1 from class A, encoded as 0x81, and ID 1 from class
> B, encoded as 0x4001, are considered different IDs."
>
> Interresting point that was never raised before. IMO it should be
> avoided to use this case. And that's what we did with Matroska. It could
> impose some constraints on the parser behaviour which is IMO not
> necessary. I think there is enough IDs to avoid this (and BTW that means
> the number of IDs for class B, C and D are wrong because they should not
> contain elements of the other class).
I disagree. Even if you represented the ID in its parsed numerical state
you would still have to save the size as well to prevent 0x4001 from
being compressed into 0x81. So disallowing two IDs from having the same
decoded value only imposes extra constraints on ID generation, unless
one uses the decoded ID as the actual ID (i.e. having 0x81 and 0x4001
meaning the same element).
And imposing extra ID generation constraints is very bad since
(hopefully) more people will be writing EBML DTDs than EBML parsers.
Making correct IDs are already the most complicated step in formulating
an EBML language.
> FYI, one of the EBML enhancement that is planned is to allow some kind
> of embedded DTD inside the EBML header, that would describe the
> hierarchy of the elements, their ID and their type (all in EBML format
> of course). This way it would be easy to spot known/unknown elements and
> interpret the value of some elements (maybe even with a human friendly
> name) even if you don't know the actual meaning.
Good idea. There should probably be some sort of ID range reserved for
EBML elements. Actually the possible ID clashes between different DTDs
upon compositioning is probably the biggest weakness of EBML right now
as far as I can see. And it is a difficult problem to solve. W3C never
actually solved the problem for XML. They sidestepped the issue with
namespaces as "solution".
Moving on to reply 2:
[default by symbol value]
> This has not been firmly defined yet. Even though in Matroska we
> already use this.
Well, I used the Matroska definition as test case for the expressiveness
of my DTD format
> The problem is in what scope the 'other' element should be.
My first thought was to only refer back to parents, but in practice you
would probably more often want to refer to siblings. The real issue here
of course is to cap the memory consumption for the "semantic EBML
parser". Do we need to save the latest value of every parsed element?
No, fortunately not. When we read the EBML DTD we compile a list of all
referred elements and only save the latest one of those in a history.
Further we could state that once we leave the top level element the
history must be cleared. That in itself is a good idea, but the fewer
rules the better. Even further we could state that once we leave the
current parent element, the history for that level should be cleared.
That is probably a bad idea. Lastly we could introduce another DTD
property specifying that an element has a private scope that should be
cleared when the element is exited. That is a flexible solution that
allows the DTD author to control how values are stored. I think it adds
unnecessary complexity for too little gain.
> IMPRESSIVE WORK !!!
Thank you.
> Do you plan to submit this to the IETF ? That would be great to have
> EBML become a wide standard. Because I'm sure it can be used in many
> cases : small XML that can store binary data and use default values.
I'm a bit reluctant to take things to IETF after my experience with
getting RFC 3003 through. I guess the only good way to get things done
is to be there in person, and that would require someone to financially
back the standard or that someone already involved in IETF sponsors it.
Right now my focus is on going through all the layers of Matroska and at
least try to change everything that I don't like... I hope you guys are
open for debate when I reach the difficult parts.
/Martin Nilsson
More information about the Matroska-devel
mailing list