[Matroska-devel] Opus in Matroska

Moritz Bunkus moritz at bunkus.org
Fri Sep 14 19:24:52 CEST 2012


Hey,

On Fri, Sep 14, 2012 at 6:57 PM, Ralph Giles <giles at thaumas.net> wrote:

> The biggest one is pre-roll after seek.

The problems with this are two-fold: 1. signalling how much pre-roll
there needs to be after seeking and 2. providing the information where
such blocks exist (I'm talking about the cues here).

2. can be addressed partially by my proposed new element
CueRelativePosition (see
http://lists.matroska.org/pipermail/matroska-devel/2012-September/004250.html).
However, this is probably not enough for generic support. What if
pre-rool "starts" with a non-key frame packet? In such cases no cue
point must be created for that pre-roll packet. Therefore we might
need some kind of whole new structure inside a cue point.

1. is not as easy as it sounds either. What would that track header
element's unit be? Number of samples? If so, how do you define a
"sample", especially for video: is it a whole frame? Interlaced
picture? And would be just as difficult for any other codec with a
varying number of samples per block like Vorbis.

Or is the "unit" the number of Matroska blocks? If so, what if that
pre-roll is not constant?

Thoughts?

> The next issue is the 'preskip' field from the CodecPrivate field. This
> is different from the pre-roll skip after seek. It's a count of samples
> to discard from the start of the stream to correct for algorithmic delay
> in the encoder so there's no phase shift between input and output, and
> it should be applied *before* calculating timestamps.

I'm not sure I actually understand how this would work, especially
from two perspectives: calculating timecodes when storing that stuff
in the container and how it interacts with seeking.

Now for a simple tool like mkvmerge it should be easy enough: take the
timecodes provided by the Ogg encapsulation and copy them over to
Matroska, same for the pre-skip header information. But how do
encoders calculate both the timecode for packet X and the pre-skip
value?

Now for seeking. If the user requests to seek to timecode Y what
timecode must the splitter look for in the file? Y as well? Y +
pre-skip? Do those samples always have to be dropped (after each
seek)? Only if seeking to the start of the file?

> The trimming can be handled by the decoder wrapper, but the container
> needs to report the timestamps correctly, which means there needs to be
> away to deal with initial blocks with apparently negative timestamps.

While negative timestamps are possible in Matroska they're very, very
limited in scope. Negative timestamps can occur because a Block's and
SimpleBlock's "relative timestamp" field is a signed 16bit integer
that is added to the current cluster's timestamp. However, that would
leave ~ 16s of negative timecodes with the default timecode scale
factor of 1000000 (meaning timestamps have a 1ms precision) and down
to ~ 16us with 1ns precision. As it is up to the muxer (in mkvmerge's
case: even up to the user) which timecode scale is applied we cannot
rely on negative timestamps being available _just with existing
fields_ for Opus. So yes, we do need a new element for that as well.

> Otherwise, maybe a TrackEntry::TimestampOffset makes sense?

Well. I'd call it something different, I guess. Rationale is that I
would make that element's unit be "number of samples" and not any
time-based unit, and "TimestampOffset" always implies... well... a
time-based unit :) If we chose a time-based unit then it must be
1ns-based and not based on the segment's timecode scale argument as
that can easily be too imprecise (1ms at e.g. 48000 Hz sampling
frequency would already be 4.8 samples).

> Those are the two issues I was thinking of. Trimming the end of the
> stream is also a problem.

I'd suggest we don't add another track header element for this but a
new child element beneath the BlockGroup that signals the number of
samples to trim from the last block inside that BlockGroup. While
there is aready the BlockDuration element that could also be used it
is also based on the segment's timecode scale factor, and as I said
above that is too imprecise for sample-precision in 99% of all cases
(with the default value 1000000 for timecode scale).

Kind regards,
mosu


More information about the Matroska-devel mailing list