[Matroska-devel] EBML specification component for review - Element Data Size

wm4 nfxjfg at googlemail.com
Fri May 1 14:46:38 CEST 2015


On Fri, 1 May 2015 13:48:45 +0200
Moritz Bunkus <moritz at bunkus.org> wrote:

> Hey,
> 
> > There's EBMLMaxIDLength, which gives the length of IDs. I see
> > absolutely no point in making this value different from 4.
> 
> Gah I was confusing the two. Sorry about that.
> 
> Anyway, this is the EBML spec we're talking about, not the Matroska
> specs. EBML being generic I'd rather not enforce limits here. We can
> easily enforce limits on the Matroska specs level, though; 4 for
> Matroska's element IDs and 8 bytes for Matroska's size field lengths
> should be enough.
> 
> But what's the point in limiting the basic EBML spec with such arbitrary
> values? Just because the current users don't need more? Such an argument
> doesn't convince me. Why not leave the field open for someone to invent
> a DocType that uses more than those ~268 million IDs? No, I currently
> cannot think of such a use either, but I definitely wouldn't want to
> forbid it.

Can you imagine a format with 268 million IDs? How many pages would the
spec of this format have? 4 bytes (or maybe 8) for element IDs is not
too low and not too high, so I see no reason to add additional
complexity.

Keep in mind that the element IDs form a _private_ name space per
DocType. I would probably agree that larger IDs are needed if the
namespace were the same for all formats.

In particular since the only existing widely known/defined EBML DocType
is Matroska, and essentially fixes the EBML ID length to 4 bytes (as
Dave Rice pointed out), and the maximum size length to 8.

I know, EBML was supposed to be generic and flexible, but I would say
experience with Matroska taught us that this specific aspect is not
really needed.

> > The previous specification was not very precise in a lot of things. But
> > this doesn't mean a new specification has to be the same. It's
> > desirable to make the specification as minimal as possible (without
> > losing precision). If there are some valid Matroska files which would
> > become invalid with the new spec, and if not many of such Matroska
> > files were around, I see no problem in this.
> 
> I can somewhat agree with this. But again, I don't think that the basic
> EBML structure is the place to forbid such things. Also handling an
> unknown size should be rather easy:
> 
> 1. If it's at a top-level element (Matroska segment) then the end is the
>    end of the file.
> 
> 2. If it's at a lower-level element then its end is the end of the
>    parent element.

This doesn't seem to be correct. If you skip all data until the end of
the parent element, you might miss other, known elements. And what if
the length of the parent element is unknown?

A conforming muxer could write an unknown length for _all_ elements with
sub-elements.

(Well, except for what you've written below.)

> Simply passing down the bounds of the current element when parsing
> sub-elements shouldn't be that hard at all. As a matter of fact it
> should be done even if we completely disallowed unknown lengths in order
> to detect broken sub-elements (that allegedly have an end behind their
> parent's end).
> 
> > In this case, I think unknown lengths should be disallowed in most
> > contexts
> 
> What contexts (in the EBML sense!) are you thinking about allowing them
> in?

In the Matroska sense, the "Segment" element, and maybe "Cluster". This
would still allow any form of streaming.

My point is that EBML shouldn't just allow this in general. Instead, it
should make clear that unknown lengths are an exception which the
specification for a specific DocType can explicitly grant for a select
set of elements.

> > It also makes extending the format harder: in my understanding, if a
> > sub-element has an unknown element ID, the parser can't continue.
> 
> No, if a sub-element has an unknown element ID then the parser should
> skip the element and ignore it.
> 
> If new elements are introduced into a DocType format like Matroska that
> a Matroska parser must parse in order to play the file then the
> EBMLReadVersion must be increased accordingly.

Well, that leaves a very tricky implicit definition of when writing an
unknown length element is unknown. The Matroska spec on the website also
doesn't mention anything about this (just that you should "avoid"
writing unknown sizes), so I wonder what that says about existing
practice.

In this case, restricting the use of unknown element to explicitly
known good/useless cases would inf act prevent muxer bugs.

> For example. The CueRelativePosition element was introduced in Matroska
> v4. mkvmerge uses it, but it only sets EBMLReadVersion to 2 because
> interpretation of that element is not required for playback, not even
> for seeking. It only makes seeking faster for players that understand
> that element.
> 
> Kind regards,
> mosu



More information about the Matroska-devel mailing list