[Matroska-devel] EBML specification component for review - Element Data Size

Moritz Bunkus moritz at bunkus.org
Fri May 1 13:48:45 CEST 2015


Hey,

> There's EBMLMaxIDLength, which gives the length of IDs. I see
> absolutely no point in making this value different from 4.

Gah I was confusing the two. Sorry about that.

Anyway, this is the EBML spec we're talking about, not the Matroska
specs. EBML being generic I'd rather not enforce limits here. We can
easily enforce limits on the Matroska specs level, though; 4 for
Matroska's element IDs and 8 bytes for Matroska's size field lengths
should be enough.

But what's the point in limiting the basic EBML spec with such arbitrary
values? Just because the current users don't need more? Such an argument
doesn't convince me. Why not leave the field open for someone to invent
a DocType that uses more than those ~268 million IDs? No, I currently
cannot think of such a use either, but I definitely wouldn't want to
forbid it.
> The previous specification was not very precise in a lot of things. But
> this doesn't mean a new specification has to be the same. It's
> desirable to make the specification as minimal as possible (without
> losing precision). If there are some valid Matroska files which would
> become invalid with the new spec, and if not many of such Matroska
> files were around, I see no problem in this.

I can somewhat agree with this. But again, I don't think that the basic
EBML structure is the place to forbid such things. Also handling an
unknown size should be rather easy:

1. If it's at a top-level element (Matroska segment) then the end is the
   end of the file.

2. If it's at a lower-level element then its end is the end of the
   parent element.

Simply passing down the bounds of the current element when parsing
sub-elements shouldn't be that hard at all. As a matter of fact it
should be done even if we completely disallowed unknown lengths in order
to detect broken sub-elements (that allegedly have an end behind their
parent's end).

> In this case, I think unknown lengths should be disallowed in most
> contexts

What contexts (in the EBML sense!) are you thinking about allowing them
in?

> It also makes extending the format harder: in my understanding, if a
> sub-element has an unknown element ID, the parser can't continue.

No, if a sub-element has an unknown element ID then the parser should
skip the element and ignore it.

If new elements are introduced into a DocType format like Matroska that
a Matroska parser must parse in order to play the file then the
EBMLReadVersion must be increased accordingly.

For example. The CueRelativePosition element was introduced in Matroska
v4. mkvmerge uses it, but it only sets EBMLReadVersion to 2 because
interpretation of that element is not required for playback, not even
for seeking. It only makes seeking faster for players that understand
that element.

Kind regards,
mosu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.matroska.org/pipermail/matroska-devel/attachments/20150501/95c9981d/attachment.sig>


More information about the Matroska-devel mailing list