[Matroska-devel] Opus in Matroksa Cont.

Hendrik Leppkes h.leppkes at gmail.com
Sat Jun 15 12:44:34 CEST 2013


On Sat, Jun 15, 2013 at 12:17 PM, Moritz Bunkus <moritz at bunkus.org> wrote:
> Hey,
>
> do it.
>
> Now the question how to timestamp & mux it properly.
>
> Let's assume a fictional OggOpus stream with a pre-skip of 40ms
> containing exactly five packets. Let's further assume the Opus packets
> are all 20ms long (meaning each covers 2400 granulepos values in Ogg)
> except for the last one, that one's only 15ms long (1800 granulepos).
> So the first Ogg packets would have the following granulepos values:
>
> A0 @ Ogg(2400)
> A1 @ Ogg(4800)
> A2 @ Ogg(7200)
> A3 @ Ogg(9600)
> A4 @ Ogg(12400)
>
> The first decoded we would actually hear would be the ones from P2
> (4800 granulepos values in P0+P1 = 40ms of decoded data which a player
> must discard accdoring to pre-skip).
>
> This will translate into Matroska the following way if I'm not mistaken:
>
> - We set CodecDelay to 40_000_000 = 40ms
> - The Matroska blocks will look like this at TimecodeScale 1_000_000
> (standard ms resolution):
> A0 @ Matroska(0)
> A1 @ Matroska(20)
> A2 @ Matroska(40)
> A3 @ Matroska(60)
> A4 @ Matroska(80) with SilentPadding set to 5_000_000 (= 5ms of
> silence at the end)
>
> So the "logical" timestamps after decoding would be:
>
> A0 @ decoded(-40ms)
> A1 @ decoded(-20ms)
> A2 @ decoded(0ms)
> A3 @ decoded(20ms)
> A4 @ decoded(40ms)
> and playback would stop after 55ms have been output. Right?
>
> Now. How do we interleave these with other packets? My example assumes
> a progressive video track @ 25fps = 40ms per frame/Matroska block. We
> have two options: interleave according to their block's timestamps
> values or interleave according to their "logical"/decoded timestamp
> values.
>
> 1. Interleaving according to the block's timestamps without
> considering CodecDelay:
>
> V0 @ Matroska(0)
> A0 @ Matroska(0)
> A1 @ Matroska(20)
> V1 @ Matroska(40)
> A2 @ Matroska(40)
> A3 @ Matroska(60)
> V2 @ Matroska(80)
> A4 @ Matroska(80)...
>

I prefer this approach, because it'll also make seeking easier.
If i want to start decoding at say 40, i also need A2 for the proper
pre-skip/codec delay data, so even if 40ms are not output, i need
those for decoding, so it makes sense to put it near V1.

The second approach would require seeking explicitly to A2, which
comes before V1, and may cause existing seeking implementation that
prefer syncing on video tracks to have an extra delay until audio
output starts.
The first approach is impler on muxers and demuxers (seeking), without
any real-world drawbacks, so i'm in favor of this.


More information about the Matroska-devel mailing list