[Matroska-devel] Opus in Matroksa Cont.

Frank Galligan frankgalligan at gmail.com
Thu May 23 00:15:16 CEST 2013


I just realized I sent a reply only to Ralph on 4/12. I'm copying the reply
below, but I have since changed my position. I will follow up  in another
email.

I updated the wiki (https://wiki.xiph.org/MatroskaOpus) with options that I
have seen for handling pre-skip.


On Fri, Apr 12, 2013 at 12:15 PM, Ralph Giles <giles at thaumas.net> wrote:

> On 13-04-12 10:35 AM, Frank Galligan wrote:
>
> >     First, the number of samples to be skipped is not an integer
> multiple of
> >     compressed packets, so this isn't actually possible without clipping
> >     valid audio from the start of the stream.
> >
> > Ahh, this was one of my earlier questions.
> >
> > [...]
> > I don't think we should worry about a decoder that ignores CodecPrivate.
> > Current decoders must handle CodecPrivate, so I think we can treat
> > decoders that ignore CodecPrivate as broken.
>
> Well sure, but they're less broken with Opus than with e.g. Vorbis,
> which can't work at all. For mono and stereo Opus files, the only thing
> that needs to be signaled outside the data stream is exactly the preskip
> value. So ignoring CodecPrivate for Opus is no worse than not
> implementing any hypothetical preskip element.
>
The output gain too.


>
> >     Having to feed and then discard output from special data in
> CodecPrivate
> >     is moving away from a general container-level solution to this
> >     requirement,
> >
> > I agree.
> >
> >     which is generally useful for other codecs as well, to
> >     implement trimming.
> >
> > Yes, I was leaning to include the data in CodecPrivate so we didn't need
> > to change all players to handle this feature, as we could potentially
> > hide it in the specific decoder.
>
> Where is the specific decoder going to live though? Are you planning to
> distribute a wrapper with which accepts the CodecPrivate data? Do you
> think we should add that to the libopus API?
>
If we added it to the libopus API that would be the easiest. Otherwise
it would definitely have to be some type of wrapper on top of libopus. We
had to something like this for Vorbis.


>
> > If we truly think this will be useful
> > to other codecs (currently or in the future) then we can try and
> > generalize this feature.
>
> Maybe it's helpful to think of video here. If I press 'record' in the
> middle of a WebRTC session, how do we reprent the start point which
> won't generally fall on a keyframe?
>
This has been handled for years already. Those frames are not marked with a
key frame. So players have thrown out those frames until the first keyframe
(or rendered garbage).

I wouldn't recommend changing that to something that adds a pre-skip value
to the Track header, because you will have to add latency of starting to
write the data until the recorder sees the first keyframe. Or if we decided
on setting invisible flag or TimeToDiscard then the recorder could start
writing to disk right away.

>
> > Maybe we can add an element to the Block element, TimeToDiscard in
> > nanoseconds. A value of -1 would not render the whole Block, which would
> > have the same effect as setting the invisible bit. Otherwise the
> > player would need to discard TimeToDiscard time. This should satisfy
> > "preskip data does not have to be an integer multiple of compressed
> > packets", while also preserving the timestamp of the Block matches the
> > timestamp of the playback position.
>
> Or under the TrackEntry element? Since it only happens at the start of
> the track in the use cases I can think of.
>
I was trying to generalize it further so future codecs could take advantage
of "decoded data, that may have a duration attached to it, but should not
be rendered", within any part of the stream. If we did this we could change
how we handle VP8 altref frames (highly doubt most players would though).

>
> I think you're still missing part of why the Ogg mapping shifts the
> timestamp though. Part of what pre-skip is for is to account for
> algorithmic delay. The encoder has some. If the original input isn't 48
> kHz, then it went through a resampler, which can also have some. So
> shifting the timecode is _necessary_ for sync. Without it, a peak in the
> output won't align with a peak in the input.
>
I understand it, but I don't think the timeshift is necessary to be muxed
into Matroska files (actually I think this will have major consequences
later as we are fundamentally changing how time is handled within
Matroska.)  Adding duration to the pre-skip data was a design choice.
The algorithmic delay (and any other data) could have been easily handled
within the codec if the bitstream was defined differently.

I'm not advocating players/decoders do not decode the pre-skip data. I
understand that the output may not align with the input if the decoder is
not primed with pre-skip data, well it will align after SeekPreRoll time
has passed. I'm just trying to come up with a solution that does not offset
all timestamps within the file, as no other codec (that I know of) has done
this. And this at a minimum will force all muxer/demuxers to
handle their timing differently. But I think this will actually cause
problems later that we are not currently thinking of.

I think we can generalize the pre-skip data by adding the TimeToDiscard (or
SamplesToDiscard, DataToDiscard ) to the Block or to the TrackEntry (but
I think it will be cleaner if it is added to the Block) and still keep the
timestamps == playback position. Am I mistaken?


On Fri, Apr 12, 2013 at 12:15 PM, Ralph Giles <giles at thaumas.net> wrote:

> On 13-04-12 10:35 AM, Frank Galligan wrote:
>
> >     First, the number of samples to be skipped is not an integer
> multiple of
> >     compressed packets, so this isn't actually possible without clipping
> >     valid audio from the start of the stream.
> >
> > Ahh, this was one of my earlier questions.
> >
> > [...]
> > I don't think we should worry about a decoder that ignores CodecPrivate.
> > Current decoders must handle CodecPrivate, so I think we can treat
> > decoders that ignore CodecPrivate as broken.
>
> Well sure, but they're less broken with Opus than with e.g. Vorbis,
> which can't work at all. For mono and stereo Opus files, the only thing
> that needs to be signaled outside the data stream is exactly the preskip
> value. So ignoring CodecPrivate for Opus is no worse than not
> implementing any hypothetical preskip element.
>
> >     Having to feed and then discard output from special data in
> CodecPrivate
> >     is moving away from a general container-level solution to this
> >     requirement,
> >
> > I agree.
> >
> >     which is generally useful for other codecs as well, to
> >     implement trimming.
> >
> > Yes, I was leaning to include the data in CodecPrivate so we didn't need
> > to change all players to handle this feature, as we could potentially
> > hide it in the specific decoder.
>
> Where is the specific decoder going to live though? Are you planning to
> distribute a wrapper with which accepts the CodecPrivate data? Do you
> think we should add that to the libopus API?
>
> > If we truly think this will be useful
> > to other codecs (currently or in the future) then we can try and
> > generalize this feature.
>
> Maybe it's helpful to think of video here. If I press 'record' in the
> middle of a WebRTC session, how do we reprent the start point which
> won't generally fall on a keyframe?
>
> > Maybe we can add an element to the Block element, TimeToDiscard in
> > nanoseconds. A value of -1 would not render the whole Block, which would
> > have the same effect as setting the invisible bit. Otherwise the
> > player would need to discard TimeToDiscard time. This should satisfy
> > "preskip data does not have to be an integer multiple of compressed
> > packets", while also preserving the timestamp of the Block matches the
> > timestamp of the playback position.
>
> Or under the TrackEntry element? Since it only happens at the start of
> the track in the use cases I can think of.
>
> I think you're still missing part of why the Ogg mapping shifts the
> timestamp though. Part of what pre-skip is for is to account for
> algorithmic delay. The encoder has some. If the original input isn't 48
> kHz, then it went through a resampler, which can also have some. So
> shifting the timecode is _necessary_ for sync. Without it, a peak in the
> output won't align with a peak in the input.
>
>  -r
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.matroska.org/pipermail/matroska-devel/attachments/20130522/b2f3ebd9/attachment.html>


More information about the Matroska-devel mailing list