[Matroska-devel] Opus in Matroksa Cont.

Frank Galligan frankgalligan at gmail.com
Thu May 23 19:19:29 CEST 2013


I was hoping to fix the codec delay for Opus, as well AAC and Vorbis (and
any other codecs). I was hoping that older demuxers would just skip over
the new PreSkip. Then the older demuxers can behave like they currently do
with the old files as well as the new files with the PreSkip element.

I actually ran into a high priority issue we had to address while I was
getting ready for IO. Around December of last year FFmpeg changed how they
handle codec delay in Matroska. Before December FFmpeg would prepend the
codec delay to he audio stream and shift all of the encoded audio by codec
delay. For Vorbis this is 128 samples and AAC 1024 samples.
The video stream would be left alone. After December FFmpeg would prepend
the codec delay to he audio stream and shift all of the encoded audio by
codec delay, as it did before. But FFmpeg would then shift all of the video
by codec delay. I'm guessing in hopes of keeping better AV sync. Both
workflows are still wrong as both require players to implicitly know that
Codec A has a delay of N. Also switching from one way to another, I think
made it worse as now players have to try and guess if the video really
starts at N or 0. For editors, this is even worse if they re-compress a few
times.

This is why I want to explicitly represent what the muxers are doing with
codec delay currently in the file, for all audio codecs. Older
demuxers shouldn't have a problem with the new element, unless they error
on unknown elements, but wouldn't that make them non-complaint?

As for the unit of PreSkip this is a value, that translates to the exact
number of samples, that have been generated and prepended by the muxer that
created the file. We are not specifying which component of the player needs
to handle the codec delay. We are just modeling what the encoder/muxer did.
I think in most cases this will be handled outside of the coded. I think it
is better to have accurate sample accuracy now rather than later. We
already have issues with editors today. Also I can see a sample cases that
probably wants exact sample duration, a radio station playing files back to
back. As for the unit itself I would prefer samples, then we won't have any
conversion issues. But I would settle for nanoseconds as I know nothing is
expressed in samples today, worst case we should only be off by one sample.
I would not want timecodescale as the resolution is usually too small.

PostPadding I agree adding it to the BlockGroup should be better. Live
streaming could than use PostPadding to have a sample accurate duration.

Frank




On Thu, May 23, 2013 at 12:35 AM, Steve Lhomme <slhomme at matroska.org> wrote:

> Hi guys,
>
> Glad we're back at this. I saw all the I/O talks on WebM/VP9 and had the
> feeling this Opus decision was slowing things down. So we should try to
> finalize a solution soon.
>
> I'm not too keen on forcing all demuxers to have to handle a new element.
> But since it's only for Opus, if players work on adding support for Opus,
> they might as well support for this element too. Plus it's not too much
> work to add a shift in the pipeline (at least the frameworks I know). It
> will just be a bit more work than just dropping the codec library in there
> and plugging it in the framework. But it seems to be the only way to make
> it work properly for all use cases.
>
> About the unit, there is currently nothing in Matroska that is accurate to
> a sample. On the other hand any other value (average time units) would not
> make sense for this. If the value is passed to the codec, then it's codec
> specific. If the value is just used by the playback framework then sample
> accuracy may not be needed, we don't have it for audio sync anyway (unless
> timecodescale values are carefully picked) and a value in timecodescale
> would be enough. In the future if we change the timecodescale for more
> accuracy, this value will benefit from it too.
>
> About PostPadding, since it's only for the las Block, why not just add it
> in the BlockGroup of that lst Block. That information is useless everywhere
> else.
>
>
> On Thu, May 23, 2013 at 12:23 AM, Frank Galligan <frankgalligan at gmail.com>wrote:
>
>> Hello all,
>>
>> I have changed my position and I'm in favor of 2.1 from the wiki [1],
>> which I think is in line with what Ralph and Mosu were advocating. One of
>> the biggest issues I had with 2.1, was that I was worried about the unknown
>> ramifications of timeshifting all the samples. Well as it turns out,
>> I didn't really need to worry as they are already timeshifted. Vorbis is
>> shifted 128 samples and aac is shifted by 1024 (with FFmpeg at least). So
>> encoders/muxers are already doing this currently, but not explicitly
>> representing that in the Matroska file. I think Raplh mentioned that
>> earlier.
>>
>> So I'm advocating 2.1, I.e. add a PreSkip element to the TrackEntry
>> element. PreSkip would be a non-mandatory unsigned integer with a default
>> value of 0. I agree with Mosu that PreSkip units should be samples wrt
>> audio. If we choose another resolution, I just want to make sure we can
>> convert exactly to samples.
>>
>> I would also like to propose adding a new element, PostPadding to the
>> TrackEntry element. PostPadding is the number of samples that are added by
>> the encoder to the end of the stream. PostPadding would be a non-mandatory
>> unsigned integer with a default value of 0. PostPadding units would match
>> PreSkip units.
>>
>> With these 2 new elements, encoded Matroska files should be able
>> to accurately represent the duration of the source samples.
>>
>> Frank
>>
>> [1] https://wiki.xiph.org/MatroskaOpus
>>
>>
>> On Wed, May 22, 2013 at 3:15 PM, Frank Galligan <frankgalligan at gmail.com>wrote:
>>
>>> I just realized I sent a reply only to Ralph on 4/12. I'm copying the
>>> reply below, but I have since changed my position. I will follow up  in
>>> another email.
>>>
>>> I updated the wiki (https://wiki.xiph.org/MatroskaOpus) with options
>>> that I have seen for handling pre-skip.
>>>
>>>
>>> On Fri, Apr 12, 2013 at 12:15 PM, Ralph Giles <giles at thaumas.net> wrote:
>>>
>>> On 13-04-12 10:35 AM, Frank Galligan wrote:
>>>>
>>>> >     First, the number of samples to be skipped is not an integer
>>>> multiple of
>>>> >     compressed packets, so this isn't actually possible without
>>>> clipping
>>>> >     valid audio from the start of the stream.
>>>> >
>>>> > Ahh, this was one of my earlier questions.
>>>> >
>>>> > [...]
>>>> > I don't think we should worry about a decoder that ignores
>>>> CodecPrivate.
>>>> > Current decoders must handle CodecPrivate, so I think we can treat
>>>> > decoders that ignore CodecPrivate as broken.
>>>>
>>>> Well sure, but they're less broken with Opus than with e.g. Vorbis,
>>>> which can't work at all. For mono and stereo Opus files, the only thing
>>>> that needs to be signaled outside the data stream is exactly the preskip
>>>> value. So ignoring CodecPrivate for Opus is no worse than not
>>>> implementing any hypothetical preskip element.
>>>>
>>> The output gain too.
>>>
>>>
>>>>
>>>> >     Having to feed and then discard output from special data in
>>>> CodecPrivate
>>>> >     is moving away from a general container-level solution to this
>>>> >     requirement,
>>>> >
>>>> > I agree.
>>>> >
>>>> >     which is generally useful for other codecs as well, to
>>>> >     implement trimming.
>>>> >
>>>> > Yes, I was leaning to include the data in CodecPrivate so we didn't
>>>> need
>>>> > to change all players to handle this feature, as we could potentially
>>>> > hide it in the specific decoder.
>>>>
>>>> Where is the specific decoder going to live though? Are you planning to
>>>> distribute a wrapper with which accepts the CodecPrivate data? Do you
>>>> think we should add that to the libopus API?
>>>>
>>> If we added it to the libopus API that would be the easiest. Otherwise
>>> it would definitely have to be some type of wrapper on top of libopus. We
>>> had to something like this for Vorbis.
>>>
>>>
>>>>
>>>> > If we truly think this will be useful
>>>> > to other codecs (currently or in the future) then we can try and
>>>> > generalize this feature.
>>>>
>>>> Maybe it's helpful to think of video here. If I press 'record' in the
>>>> middle of a WebRTC session, how do we reprent the start point which
>>>> won't generally fall on a keyframe?
>>>>
>>> This has been handled for years already. Those frames are not marked
>>> with a key frame. So players have thrown out those frames until the first
>>> keyframe (or rendered garbage).
>>>
>>> I wouldn't recommend changing that to something that adds a pre-skip
>>> value to the Track header, because you will have to add latency of starting
>>> to write the data until the recorder sees the first keyframe. Or if we
>>> decided on setting invisible flag or TimeToDiscard then
>>> the recorder could start writing to disk right away.
>>>
>>>>
>>>> > Maybe we can add an element to the Block element, TimeToDiscard in
>>>> > nanoseconds. A value of -1 would not render the whole Block, which
>>>> would
>>>> > have the same effect as setting the invisible bit. Otherwise the
>>>> > player would need to discard TimeToDiscard time. This should satisfy
>>>> > "preskip data does not have to be an integer multiple of compressed
>>>> > packets", while also preserving the timestamp of the Block matches the
>>>> > timestamp of the playback position.
>>>>
>>>> Or under the TrackEntry element? Since it only happens at the start of
>>>> the track in the use cases I can think of.
>>>>
>>> I was trying to generalize it further so future codecs could take
>>> advantage of "decoded data, that may have a duration attached to it,
>>> but should not be rendered", within any part of the stream. If we did this
>>> we could change how we handle VP8 altref frames (highly doubt most players
>>> would though).
>>>
>>>>
>>>> I think you're still missing part of why the Ogg mapping shifts the
>>>> timestamp though. Part of what pre-skip is for is to account for
>>>> algorithmic delay. The encoder has some. If the original input isn't 48
>>>> kHz, then it went through a resampler, which can also have some. So
>>>> shifting the timecode is _necessary_ for sync. Without it, a peak in the
>>>> output won't align with a peak in the input.
>>>>
>>> I understand it, but I don't think the timeshift is necessary to be
>>> muxed into Matroska files (actually I think this will have major
>>> consequences later as we are fundamentally changing how time is handled
>>> within Matroska.)  Adding duration to the pre-skip data was a design
>>> choice. The algorithmic delay (and any other data) could have been easily
>>> handled within the codec if the bitstream was defined differently.
>>>
>>> I'm not advocating players/decoders do not decode the pre-skip data. I
>>> understand that the output may not align with the input if the decoder is
>>> not primed with pre-skip data, well it will align after SeekPreRoll time
>>> has passed. I'm just trying to come up with a solution that does not offset
>>> all timestamps within the file, as no other codec (that I know of) has done
>>> this. And this at a minimum will force all muxer/demuxers to
>>> handle their timing differently. But I think this will actually cause
>>> problems later that we are not currently thinking of.
>>>
>>> I think we can generalize the pre-skip data by adding the TimeToDiscard (or
>>> SamplesToDiscard, DataToDiscard ) to the Block or to the TrackEntry
>>> (but I think it will be cleaner if it is added to the Block) and still keep
>>> the timestamps == playback position. Am I mistaken?
>>>
>>>
>>> On Fri, Apr 12, 2013 at 12:15 PM, Ralph Giles <giles at thaumas.net> wrote:
>>>
>>>> On 13-04-12 10:35 AM, Frank Galligan wrote:
>>>>
>>>> >     First, the number of samples to be skipped is not an integer
>>>> multiple of
>>>> >     compressed packets, so this isn't actually possible without
>>>> clipping
>>>> >     valid audio from the start of the stream.
>>>> >
>>>> > Ahh, this was one of my earlier questions.
>>>> >
>>>> > [...]
>>>> > I don't think we should worry about a decoder that ignores
>>>> CodecPrivate.
>>>> > Current decoders must handle CodecPrivate, so I think we can treat
>>>> > decoders that ignore CodecPrivate as broken.
>>>>
>>>> Well sure, but they're less broken with Opus than with e.g. Vorbis,
>>>> which can't work at all. For mono and stereo Opus files, the only thing
>>>> that needs to be signaled outside the data stream is exactly the preskip
>>>> value. So ignoring CodecPrivate for Opus is no worse than not
>>>> implementing any hypothetical preskip element.
>>>>
>>>> >     Having to feed and then discard output from special data in
>>>> CodecPrivate
>>>> >     is moving away from a general container-level solution to this
>>>> >     requirement,
>>>> >
>>>> > I agree.
>>>> >
>>>> >     which is generally useful for other codecs as well, to
>>>> >     implement trimming.
>>>> >
>>>> > Yes, I was leaning to include the data in CodecPrivate so we didn't
>>>> need
>>>> > to change all players to handle this feature, as we could potentially
>>>> > hide it in the specific decoder.
>>>>
>>>> Where is the specific decoder going to live though? Are you planning to
>>>> distribute a wrapper with which accepts the CodecPrivate data? Do you
>>>> think we should add that to the libopus API?
>>>>
>>>> > If we truly think this will be useful
>>>> > to other codecs (currently or in the future) then we can try and
>>>> > generalize this feature.
>>>>
>>>> Maybe it's helpful to think of video here. If I press 'record' in the
>>>> middle of a WebRTC session, how do we reprent the start point which
>>>> won't generally fall on a keyframe?
>>>>
>>>> > Maybe we can add an element to the Block element, TimeToDiscard in
>>>> > nanoseconds. A value of -1 would not render the whole Block, which
>>>> would
>>>> > have the same effect as setting the invisible bit. Otherwise the
>>>> > player would need to discard TimeToDiscard time. This should satisfy
>>>> > "preskip data does not have to be an integer multiple of compressed
>>>> > packets", while also preserving the timestamp of the Block matches the
>>>> > timestamp of the playback position.
>>>>
>>>> Or under the TrackEntry element? Since it only happens at the start of
>>>> the track in the use cases I can think of.
>>>>
>>>> I think you're still missing part of why the Ogg mapping shifts the
>>>> timestamp though. Part of what pre-skip is for is to account for
>>>> algorithmic delay. The encoder has some. If the original input isn't 48
>>>> kHz, then it went through a resampler, which can also have some. So
>>>> shifting the timecode is _necessary_ for sync. Without it, a peak in the
>>>> output won't align with a peak in the input.
>>>>
>>>>  -r
>>>>
>>>>
>>>
>>
>> _______________________________________________
>> Matroska-devel mailing list
>> Matroska-devel at lists.matroska.org
>> http://lists.matroska.org/cgi-bin/mailman/listinfo/matroska-devel
>> Read Matroska-Devel on GMane:
>> http://dir.gmane.org/gmane.comp.multimedia.matroska.devel
>>
>
>
>
> --
> Steve Lhomme
> Matroska association Chairman
>
> _______________________________________________
> Matroska-devel mailing list
> Matroska-devel at lists.matroska.org
> http://lists.matroska.org/cgi-bin/mailman/listinfo/matroska-devel
> Read Matroska-Devel on GMane:
> http://dir.gmane.org/gmane.comp.multimedia.matroska.devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.matroska.org/pipermail/matroska-devel/attachments/20130523/e14cc14b/attachment-0001.html>


More information about the Matroska-devel mailing list