[Matroska-devel] EBML specification component for review - Element Data Size

Steve Lhomme slhomme at matroska.org
Sat May 2 18:30:58 CEST 2015

On Sat, May 2, 2015 at 6:08 PM, wm4 <nfxjfg at googlemail.com> wrote:

> On Sat, 2 May 2015 17:59:22 +0200
> Steve Lhomme <slhomme at matroska.org> wrote:
> > On Fri, May 1, 2015 at 12:48 PM, wm4 <nfxjfg at googlemail.com> wrote:
> >
> > > On Fri, 1 May 2015 12:15:04 +0200
> > > Moritz Bunkus <moritz at bunkus.org> wrote:
> > >
> > > There's EBMLMaxIDLength, which gives the length of IDs. I see
> > > absolutely no point in making this value different from 4.
> > >
> > >
> > You never know. Maybe someone will want to put a SHA1 someday.
> >
> Doesn't make sense. The ID has to follow a certain formatting (the
> variable length encoding), so you can't have arbitrary data as ID. Only
> a subset of SHA1 hashes are valid EBML IDs.

I could imagine an XML > EBML > XML converted. That uses the tag name as
the ID, prefixed with a value that makes is suitable for vint
interpretation. That would mean a lot of 1s to code a 32 chars string
though and would be terribly inefficient.

The SHA1 of git commits (with the same ID prefix trick) could be useful if
you want to tag commits, attach data, metadata or whatever you could think

And more general, a semantic with IDs that are not from a predefined small
set but a huge amount of possible values could make sense. This is not how
we use it and libebml couldn't deal with that. But EBML would allow that
with no effort.

That leads to an old debate we had: should EBML be parseable without
knowing the semantics or should it know it. Once we decided to allow
infinite/unknown sizes, it was clear the semantic was needed. So something
about "semantic" should be said in the official specs. After all an EBML
file means nothing without the semantic implied by the DocType.

That's why I'm saying EBML should explicitly should require formats
> to specify where exactly unknown lengths can happen, and the Matroska
> format specification should restrict them to cases useful for
> streaming, such as the Segment and Cluster elements.

It's currently level 0 and level 1 elements in the semantic. Not sure it's
written anywhere though.

Steve Lhomme
Matroska association Chairman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.matroska.org/pipermail/matroska-devel/attachments/20150502/d249815f/attachment-0001.html>

More information about the Matroska-devel mailing list