[Matroska-devel] EBML specification component for review - Element Data Size

Moritz Bunkus moritz at bunkus.org
Fri May 1 17:50:07 CEST 2015


I'm fine with using »length« instead of »width«. We should use it not
only in the element names, though, but also in the explanations (the
text you, Dave, have proposed still contains »widths«).

> > Right. So perhaps we need a specific constraints on the use of VOID
> > and CRC within an element of unknown size. It's hard to imagine the
> > use case of a CRC sub-element within an element of unknown size,
> > perhaps this can CRC within an unknown length element can simply be
> > forbidden.
> >
> > Would there be an issue with SimpleTag in the case of an unknown
> > size parent? SimpleTag can appear at several different levels.

Basically any multi-level EBML or DocType-specific element (EbmlVoid,
EbmlCrc, Matroska SimpleTag…) cannot be handled properly if its parent's
size is unknown as a parser cannot unambiguously determine its place in
the hierarchy. Therefore forbidding them sounds good to me.

By now I'm also in favor of restricting »unknown size« to a few cases at
most. The problem with detecting the next element unambiguously is one
of escaping. For example, if a cluster has an unknown length then a
parser would assume that any level 1 element (other cluster, tracks,
chapters, attachments, cues…) would signal the cluster's end. However,
such an ID may very well occur in e.g. the content of a KaxSimpleBlock.

If I remember correctly then libEBML contains some code for dealing with
this heuristically by reading the supposed next element's size field as
well and determining whether or not that seems plausible. I may be wrong

Anyway, as wm4 said, this is very difficult at best for a
parser. Segments with an unknown size should be fine, but even clusters
don't strike me as a good candidate. Sure, a parser could improve that
heuristic further by not only reading the supposed element's ID and size
but also the next element's ID or something like that…

> It gets messy fast, which I think the use of elements with unknown
> length should be severely restricted.

Yep. Going out for a bike ride helps clear things up sometimes ;)

> > Doesn't it? If EBMLMaxSizeWidth=12 or EBMLMaxSizeWidth=8 then the
> > length of the 1-filled unknown size value changes accordingly,
> > right?

This is wrong. wm4 is right:

> In my understanding, there are multiple ways to encode the unknown
> size, and 0xFF is the shortest one. It's independent of
> EBMLMaxSizeWidth (it just restricts the maximum byte size you can use
> to encode the unknown size).

A size field (no matter how many bytes it's encoded with) whose bits are
all set to 1 signals an unknown size. 0xff does, as does 0x7f ff, 0x3f
ff ff etc.

Kind regards,
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.matroska.org/pipermail/matroska-devel/attachments/20150501/ab5edf44/attachment.sig>

More information about the Matroska-devel mailing list