[Matroska-devel] Opus in Matroska

Ralph Giles giles at thaumas.net
Fri Sep 14 21:59:53 CEST 2012

On 12-09-14 10:24 AM, Moritz Bunkus wrote:

>                                                           What if
> pre-rool "starts" with a non-key frame packet? In such cases no cue
> point must be created for that pre-roll packet. Therefore we might
> need some kind of whole new structure inside a cue point.

You're right. As written this just isn't permitted by the cue rules. As
far as I can tell, preroll requires that everyone rewrite their seek
code. :(

I was thinking of just changing the rules, for when the PreRoll element
is present, but having a new Cue structure would make that more obvious.

> 1. is not as easy as it sounds either. What would that track header
> element's unit be? Number of samples?

I think PreRoll can be a time offset, i.e. an unsigned int in
TimecodeScale units. Unlike pre-skip, it doesn't need to be
sample-accurate, just "big enough" to allow convergence. The seek
algorithm can pick whatever's conventient and at least that far before
the target time and start decoding there.

It's sufficient for Opus to assume it's constant for the whole Track (or
all Opus tracks, for that matter). If you want to support it varying,
maybe also having a PreRoll element (or field) in the Cue new structure
would work?

>> The next issue is the 'preskip' field from the CodecPrivate field.
> I'm not sure I actually understand how this would work, especially
> from two perspectives: calculating timecodes when storing that stuff
> in the container and how it interacts with seeking.

It's confusing. I try to keep 'pre-roll' and 'pre-skip' separate in my
mind; they work differently.

> Now for a simple tool like mkvmerge it should be easy enough: take the
> timecodes provided by the Ogg encapsulation and copy them over to
> Matroska, same for the pre-skip header information. But how do
> encoders calculate both the timecode for packet X and the pre-skip
> value?

The timecode in the file takes pre-skip into account. The encoder adds
up the encoding delays for the encoder (the reference implementation has
OPUS_GET_LOOKAHEAD for this) the resampler, etc. It puts this value in
the header, and then subtracts it from the number of samples it has
actually fed to the codec when generating timestamps. It also has to
append the same number of samples of extra data to the end of the input
audio to flush all the valid data through the encoding pipeline.

The decoder discards that same number of samples from the output it gets
from the codec at the start of the stream, but because the encoder has
already subtracted it from the timestamps it wrote, the decoder finds
them to be correct for what it's actually outputting.

> Now for seeking. If the user requests to seek to timecode Y what
> timecode must the splitter look for in the file? Y as well? Y +
> pre-skip? Do those samples always have to be dropped (after each
> seek)? Only if seeking to the start of the file?

Likewise, the timestamps are already corrected, so the splitter just
looks for Y (or Y-preroll, really) to start playback at Y. It doesn't
have to care about the pre-skip at all, except to the extent that it
might create blocks with negative timestamps. The decoder also has to
know whether it's starting from the beginning, and should discard
pre-skip samples of output, near the beginning, so it should discard
less than pre-skip samples, or in the middle and it should discard none.
Negative timestamps are one way to pass this information, but of course
it could be signalled by the splitter in other ways.

> While negative timestamps are possible in Matroska they're very, very
> limited in scope. Negative timestamps can occur because a Block's and
> SimpleBlock's "relative timestamp" field is a signed 16bit integer
> that is added to the current cluster's timestamp. However, that would
> leave ~ 16s of negative timecodes with the default timecode scale
> factor of 1000000 (meaning timestamps have a 1ms precision) and down
> to ~ 16us with 1ns precision. 
>                               As it is up to the muxer (in mkvmerge's
> case: even up to the user) which timecode scale is applied we cannot
> rely on negative timestamps being available _just with existing
> fields_ for Opus. So yes, we do need a new element for that as well.

Ok, thanks for clarifying that.

Having some kind of pre-trim is useful in general, btw, for clip
trimming when editing lossy-compressed files. The interaction with
timestamps is messy there too, though.

> I'd suggest we don't add another track header element for this but a
> new child element beneath the BlockGroup that signals the number of
> samples to trim from the last block inside that BlockGroup.

That sounds good (except for not being about to use SimpleBlock). Call
it BlockGroup::EndTrim?


More information about the Matroska-devel mailing list