[Matroska-devel] EBML specification component for review - Element Data Size

Steve Lhomme slhomme at matroska.org
Sat May 2 17:59:22 CEST 2015

On Fri, May 1, 2015 at 12:48 PM, wm4 <nfxjfg at googlemail.com> wrote:

> On Fri, 1 May 2015 12:15:04 +0200
> Moritz Bunkus <moritz at bunkus.org> wrote:
> There's EBMLMaxIDLength, which gives the length of IDs. I see
> absolutely no point in making this value different from 4.
You never know. Maybe someone will want to put a SHA1 someday.

> Then there's EBMLMaxSizeLength. With 8 you're apparently limited to
> 2^56-2. This will probably be high enough forever. Maybe you could argue
> that larger sizes should be possible; then I would suggest that the
> limit should be fixed to 2^64-1, which would require a maximum length
> of 10 bytes, with some representable values out of spec. Or maybe limit
> it 9 bytes, which would make the maximum 2^63-2 or so.
> But going higher than 2^64 ever doesn't seem useful. Why make it
> possible? Handling up to 2^64 on the other hand is easy.

Hopefully someday the whole universe can be contained in one EBML stream.
We don't need to put limits where we don't need to.

Now that's we're talking about it, in the Matroska specs we should specify
that the vint size cannot exceed 8 octets and the id size cannot exceed 4
 Existing demuxers will probably handle them anyway. There are a lot of

> buggy Matroska files around, and demuxers potentially contain hacks and
> concessions for such broken files. But this doesn't mean a tool that
> generates valid files according to the new spec should be allowed to
> create such "problematic" files.

If there are hacks in Matroska parsers due to bad muxing, they are never at
the EBML level. If the EBML is broken, it's OK not to care.

> So some cleanup is necessary, unless you fully want to concentrate
> on 1) - and even then you don't want to formalize every single broken
> file you can find.
> In this case, I think unknown lengths should be disallowed in most
> contexts because they make parsers more complicated for a feature that
> is almost never used. It also makes extending the format harder: in my
> understanding, if a sub-element has an unknown element ID, the parser
> can't continue.

Even if it doesn't make sense in Matroska (it may be deprecated but it's
been used widely in GStreamer), it should not go away fro EBML. Low latency
transmission of data is very nice feature and an advantage over a lot of
other binary formats. XML or JSON can be streamed because they have end
markers ('>' or '}') but that's usually not the case for binary. So we
should keep this feature.

Steve Lhomme
Matroska association Chairman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.matroska.org/pipermail/matroska-devel/attachments/20150502/70726f01/attachment.html>

More information about the Matroska-devel mailing list