[Matroska-devel] Opus in Matroska
slhomme at matroska.org
Sat Sep 15 12:51:07 CEST 2012
On Fri, Sep 14, 2012 at 9:59 PM, Ralph Giles <giles at thaumas.net> wrote:
> On 12-09-14 10:24 AM, Moritz Bunkus wrote:
>> What if
>> pre-rool "starts" with a non-key frame packet? In such cases no cue
>> point must be created for that pre-roll packet. Therefore we might
>> need some kind of whole new structure inside a cue point.
> You're right. As written this just isn't permitted by the cue rules. As
> far as I can tell, preroll requires that everyone rewrite their seek
> code. :(
This is *definitely* something we don't want. The Cue/index in a file
should tell where to jump in a file to play correctly at a particular
time. And the timecode of whichever Block found in the file should be
the actual timecode of the frame. I think this should be the basis of
the discussion because that's how it's has always been done everywhere
and the trickiest part of an audio/video player, so the last thing you
want to adjust to have special cases.
That being said, I think we can find ways to accommodate things to
work well with Opus which we want to support ASAP.
First I'd like to take a step back. We need to add things in Matroska
and it may become Matroska v4 if we need introduce incompatible
changes, also could be a WebM v2 once Opus is added, which I am sure
it will. I think it's important to keep that in mind as a possibility.
The only thing we cannot do is remove fields necessary today or make
new mandatory elements with default values incompatible with the
previous meaning. After that we are pretty much free.
One of the critics I often had on Matroska is that the timecodes are
based on a division of a second and not a fraction as it's often done,
meaning timecodes for 1001/30000 are not totally precise. And for
audio files they may not be sample accurate for the same reason, that
makes the gapless playback not accurate either. If we go the v4 we
could fix that by adding a TimecodeScaleNumerator, combined with
TimecodeScale (the denominator) we would have our fraction, and with a
default value of 1 we keep backward compatibility. Players that add
Opus support in Matroska/WebM will be required to support this new
element to get proper timecodes.
With that solution the BlockDuration could be used for the cut-off
time inside a frame. It may not even require any code change in any
player (I know DirectShow should be fine with that, any renderer that
can handle gapless should know about such cut-off durations as well).
Now back to seeking. I think the timecode shift should be handled by
the codec internally. So that number of samples to shift would be put
in the CodecPrivate as a constant. The decoder can handle it on its
own. But that means the timecode written in Matroska may need to take
it in account. Let's imagine, theoretically an Opus frame of 1s that
needs decoder pre-roll of 100ms of "junk?" data. The frame would
actually be 1.1s but from Matroska's point of view it should be 1s and
start at timecode 0. Did I understand the problem correctly ?
When seeking, at any position that pre-roll sample is known to the
codec and should use it to adjust its decoders and "external"
timecodes. It would all appear to have just needed data from the
outside, even though internally it's shifted. Since it's transparent,
no player needs to be changed. And that leaves the flexibility on the
codec to ajdust these values internally. Would that be a working
> I was thinking of just changing the rules, for when the PreRoll element
> is present, but having a new Cue structure would make that more obvious.
>> 1. is not as easy as it sounds either. What would that track header
>> element's unit be? Number of samples?
> I think PreRoll can be a time offset, i.e. an unsigned int in
> TimecodeScale units. Unlike pre-skip, it doesn't need to be
> sample-accurate, just "big enough" to allow convergence. The seek
> algorithm can pick whatever's conventient and at least that far before
> the target time and start decoding there.
> It's sufficient for Opus to assume it's constant for the whole Track (or
> all Opus tracks, for that matter). If you want to support it varying,
> maybe also having a PreRoll element (or field) in the Cue new structure
> would work?
>>> The next issue is the 'preskip' field from the CodecPrivate field.
>> I'm not sure I actually understand how this would work, especially
>> from two perspectives: calculating timecodes when storing that stuff
>> in the container and how it interacts with seeking.
> It's confusing. I try to keep 'pre-roll' and 'pre-skip' separate in my
> mind; they work differently.
>> Now for a simple tool like mkvmerge it should be easy enough: take the
>> timecodes provided by the Ogg encapsulation and copy them over to
>> Matroska, same for the pre-skip header information. But how do
>> encoders calculate both the timecode for packet X and the pre-skip
> The timecode in the file takes pre-skip into account. The encoder adds
> up the encoding delays for the encoder (the reference implementation has
> OPUS_GET_LOOKAHEAD for this) the resampler, etc. It puts this value in
> the header, and then subtracts it from the number of samples it has
> actually fed to the codec when generating timestamps. It also has to
> append the same number of samples of extra data to the end of the input
> audio to flush all the valid data through the encoding pipeline.
> The decoder discards that same number of samples from the output it gets
> from the codec at the start of the stream, but because the encoder has
> already subtracted it from the timestamps it wrote, the decoder finds
> them to be correct for what it's actually outputting.
>> Now for seeking. If the user requests to seek to timecode Y what
>> timecode must the splitter look for in the file? Y as well? Y +
>> pre-skip? Do those samples always have to be dropped (after each
>> seek)? Only if seeking to the start of the file?
> Likewise, the timestamps are already corrected, so the splitter just
> looks for Y (or Y-preroll, really) to start playback at Y. It doesn't
> have to care about the pre-skip at all, except to the extent that it
> might create blocks with negative timestamps. The decoder also has to
> know whether it's starting from the beginning, and should discard
> pre-skip samples of output, near the beginning, so it should discard
> less than pre-skip samples, or in the middle and it should discard none.
> Negative timestamps are one way to pass this information, but of course
> it could be signalled by the splitter in other ways.
>> While negative timestamps are possible in Matroska they're very, very
>> limited in scope. Negative timestamps can occur because a Block's and
>> SimpleBlock's "relative timestamp" field is a signed 16bit integer
>> that is added to the current cluster's timestamp. However, that would
>> leave ~ 16s of negative timecodes with the default timecode scale
>> factor of 1000000 (meaning timestamps have a 1ms precision) and down
>> to ~ 16us with 1ns precision.
>> As it is up to the muxer (in mkvmerge's
>> case: even up to the user) which timecode scale is applied we cannot
>> rely on negative timestamps being available _just with existing
>> fields_ for Opus. So yes, we do need a new element for that as well.
> Ok, thanks for clarifying that.
> Having some kind of pre-trim is useful in general, btw, for clip
> trimming when editing lossy-compressed files. The interaction with
> timestamps is messy there too, though.
>> I'd suggest we don't add another track header element for this but a
>> new child element beneath the BlockGroup that signals the number of
>> samples to trim from the last block inside that BlockGroup.
> That sounds good (except for not being about to use SimpleBlock). Call
> it BlockGroup::EndTrim?
> Matroska-devel mailing list
> Matroska-devel at lists.matroska.org
> Read Matroska-Devel on GMane: http://dir.gmane.org/gmane.comp.multimedia.matroska.devel
Matroska association Chairman
More information about the Matroska-devel