[matroska-devel] Re: How UCI and Matroska could interact

Steve Lhomme steve.lhomme at free.fr
Thu Jan 23 18:56:54 CET 2003

En réponse à Pamel <paul at msn.com>:

> "Steve Lhomme" <steve.lhomme at free.fr> wrote
> > A P frame has a backward reference. But to
> > decode the 31th frame you only need the data of that frame and the
> reference key
> > frame. If GLDM's codec really need all the frames in between, then
> we're
> not
> > talking about a P frame but something that doesn't exist yet. And so
> it
> should
> > be created and supported in matroska (and UCI).
> This could be done using P frames as it would request frames back until
> it
> got to the keyframe.  I just think that its silly to use this
> 'workaround'
> when matroska could be easily extended to accomodate this new
> structure.
> The P frames idea is basicaly tricking the interface to work with the
> codec,
> but the extra work is being done and extra transfers of information. 
> Why
> not just define a way to do it that is what the codec needs.

See below (of the previous email).
> > One solution I can see is that the Block element and BlockAdditional
> are
> grouped
> > into a master element BlockGroup that would contain both. And you
> would
> have to
> > read both to be able to render the Block... This was already the idea
> but
> they
> > were not grouped together (bad EBML design anyway). In this case, we
> can
> now
> > have as many backward and forward references in the BlockAdditional
> as
> needed.
> How about one of these?  Let the BlockAdditional be another element on
> the
> same level as Block.  But BlockAdditional just contains sub-elements
> such as
> timecode(s), track#, and where the block with the information is
> located.
> This would mean there is 0 overhead for cases where these are not used. 
> Or,
> could you just make BlockAdditional a sub-element of the existing
> Block
> design?  So, again, it would not be taking space in cases where it is
> not
> used.  This would have the added benefit of all relevant timecodes
> being
> under the keyframe.  So, if a seek was done inbetween two timecodes,
> it
> would go to the first one, see that there were sub-timecodes, and use
> the
> most accurate one.

Well, the BlockAdditional is already placed after the Block. But it's not
syntaxicaly or semanticaly nice. Because it depends on the order of elements.
And for example reorganising Blocks between tracks and timecodes to make the
reading faster would be harder because of this. If everything is contained in a
bigger element, that's safe.

Also the idea of the BlockAddition is to put all optional elements there and
keep the minimum in the Block. The problem here is that the size is critical,
because of the consequences on the global overhead. That's why the track number
and the timecode should remain in the Block (not the reference timecodes anymore).

Also note that using references is convenient to save bandwidth by reducing the
amount of information after encoding. So using this technique should not produce
too much overhead otherwise we lose all the benefit... That means 3 or 4 octets
by reference (imagine 31 references) are pretty large (just to end up with 0
actual data ;).

BTW, I was a bit wrong in my range the current minimum (with references out of
the Block) is 6 and the max is 13 (lacing excluded). With 3 to 10 octets added
for the BlockGroup (when references is involved), 2 to 9 octets for the
BlockAddition and 3 to 4 octets for each reference... That makes :
- no reference : 9 to 23 octets
- P frame : 14 to 38 octets
- B frame : 17 to 42 octets
(3 to 4 octets added for each reference, the differentiation of forward/backward
is not used anymore, since we deal with signed timecodes)

This is up from the current 6-21 (21 is for the largest possible B frame).

If we don't use a BlockGroup element, we have :
- no reference : 6 to 13 octets
- P frame : 11 to 28 octets
- B frame : 14 to 32 octets

I need to find what the average granularity for speex and vorbis are at the
moment (size of packets) and for a low bitrate video codec. That would help
compute the overhead involved for each case. A 1% or 2% for the worst case
examples should be OK. That would help to decide between the clean vs smaller

More information about the Matroska-devel mailing list