[Matroska-devel] Opus in Matroksa Cont.

Steve Lhomme slhomme at matroska.org
Thu May 23 19:39:50 CEST 2013


For the sample accuracy we could introduce the fraction timecodescale in
parallel of the existing one. For now only PreSkip would use it. The
drawback would be that a bad remuxing may lose that information (but
PreSkip would be lost too).

I have to check in the specs there might be one similar thing already. If
not it could use the new timecodescale too.
On May 23, 2013 7:19 PM, "Frank Galligan" <frankgalligan at gmail.com> wrote:

> I was hoping to fix the codec delay for Opus, as well AAC and Vorbis (and
> any other codecs). I was hoping that older demuxers would just skip over
> the new PreSkip. Then the older demuxers can behave like they currently do
> with the old files as well as the new files with the PreSkip element.
>
> I actually ran into a high priority issue we had to address while I was
> getting ready for IO. Around December of last year FFmpeg changed how they
> handle codec delay in Matroska. Before December FFmpeg would prepend the
> codec delay to he audio stream and shift all of the encoded audio by codec
> delay. For Vorbis this is 128 samples and AAC 1024 samples.
> The video stream would be left alone. After December FFmpeg would prepend
> the codec delay to he audio stream and shift all of the encoded audio by
> codec delay, as it did before. But FFmpeg would then shift all of the video
> by codec delay. I'm guessing in hopes of keeping better AV sync. Both
> workflows are still wrong as both require players to implicitly know that
> Codec A has a delay of N. Also switching from one way to another, I think
> made it worse as now players have to try and guess if the video really
> starts at N or 0. For editors, this is even worse if they re-compress a few
> times.
>
> This is why I want to explicitly represent what the muxers are doing with
> codec delay currently in the file, for all audio codecs. Older
> demuxers shouldn't have a problem with the new element, unless they error
> on unknown elements, but wouldn't that make them non-complaint?
>
> As for the unit of PreSkip this is a value, that translates to the exact
> number of samples, that have been generated and prepended by the muxer that
> created the file. We are not specifying which component of the player needs
> to handle the codec delay. We are just modeling what the encoder/muxer did.
> I think in most cases this will be handled outside of the coded. I think it
> is better to have accurate sample accuracy now rather than later. We
> already have issues with editors today. Also I can see a sample cases that
> probably wants exact sample duration, a radio station playing files back to
> back. As for the unit itself I would prefer samples, then we won't have any
> conversion issues. But I would settle for nanoseconds as I know nothing is
> expressed in samples today, worst case we should only be off by one sample.
> I would not want timecodescale as the resolution is usually too small.
>
> PostPadding I agree adding it to the BlockGroup should be better. Live
> streaming could than use PostPadding to have a sample accurate duration.
>
> Frank
>
>
>
>
> On Thu, May 23, 2013 at 12:35 AM, Steve Lhomme <slhomme at matroska.org>wrote:
>
>> Hi guys,
>>
>> Glad we're back at this. I saw all the I/O talks on WebM/VP9 and had the
>> feeling this Opus decision was slowing things down. So we should try to
>> finalize a solution soon.
>>
>> I'm not too keen on forcing all demuxers to have to handle a new element.
>> But since it's only for Opus, if players work on adding support for Opus,
>> they might as well support for this element too. Plus it's not too much
>> work to add a shift in the pipeline (at least the frameworks I know). It
>> will just be a bit more work than just dropping the codec library in there
>> and plugging it in the framework. But it seems to be the only way to make
>> it work properly for all use cases.
>>
>> About the unit, there is currently nothing in Matroska that is accurate
>> to a sample. On the other hand any other value (average time units) would
>> not make sense for this. If the value is passed to the codec, then it's
>> codec specific. If the value is just used by the playback framework then
>> sample accuracy may not be needed, we don't have it for audio sync anyway
>> (unless timecodescale values are carefully picked) and a value in
>> timecodescale would be enough. In the future if we change the timecodescale
>> for more accuracy, this value will benefit from it too.
>>
>> About PostPadding, since it's only for the las Block, why not just add it
>> in the BlockGroup of that lst Block. That information is useless everywhere
>> else.
>>
>>
>> On Thu, May 23, 2013 at 12:23 AM, Frank Galligan <frankgalligan at gmail.com
>> > wrote:
>>
>>> Hello all,
>>>
>>> I have changed my position and I'm in favor of 2.1 from the wiki [1],
>>> which I think is in line with what Ralph and Mosu were advocating. One of
>>> the biggest issues I had with 2.1, was that I was worried about the unknown
>>> ramifications of timeshifting all the samples. Well as it turns out,
>>> I didn't really need to worry as they are already timeshifted. Vorbis is
>>> shifted 128 samples and aac is shifted by 1024 (with FFmpeg at least). So
>>> encoders/muxers are already doing this currently, but not explicitly
>>> representing that in the Matroska file. I think Raplh mentioned that
>>> earlier.
>>>
>>> So I'm advocating 2.1, I.e. add a PreSkip element to the TrackEntry
>>> element. PreSkip would be a non-mandatory unsigned integer with a default
>>> value of 0. I agree with Mosu that PreSkip units should be samples wrt
>>> audio. If we choose another resolution, I just want to make sure we can
>>> convert exactly to samples.
>>>
>>> I would also like to propose adding a new element, PostPadding to the
>>> TrackEntry element. PostPadding is the number of samples that are added by
>>> the encoder to the end of the stream. PostPadding would be a non-mandatory
>>> unsigned integer with a default value of 0. PostPadding units would match
>>> PreSkip units.
>>>
>>> With these 2 new elements, encoded Matroska files should be able
>>> to accurately represent the duration of the source samples.
>>>
>>> Frank
>>>
>>> [1] https://wiki.xiph.org/MatroskaOpus
>>>
>>>
>>> On Wed, May 22, 2013 at 3:15 PM, Frank Galligan <frankgalligan at gmail.com
>>> > wrote:
>>>
>>>> I just realized I sent a reply only to Ralph on 4/12. I'm copying the
>>>> reply below, but I have since changed my position. I will follow up  in
>>>> another email.
>>>>
>>>> I updated the wiki (https://wiki.xiph.org/MatroskaOpus) with options
>>>> that I have seen for handling pre-skip.
>>>>
>>>>
>>>> On Fri, Apr 12, 2013 at 12:15 PM, Ralph Giles <giles at thaumas.net>
>>>>  wrote:
>>>>
>>>> On 13-04-12 10:35 AM, Frank Galligan wrote:
>>>>>
>>>>> >     First, the number of samples to be skipped is not an integer
>>>>> multiple of
>>>>> >     compressed packets, so this isn't actually possible without
>>>>> clipping
>>>>> >     valid audio from the start of the stream.
>>>>> >
>>>>> > Ahh, this was one of my earlier questions.
>>>>> >
>>>>> > [...]
>>>>> > I don't think we should worry about a decoder that ignores
>>>>> CodecPrivate.
>>>>> > Current decoders must handle CodecPrivate, so I think we can treat
>>>>> > decoders that ignore CodecPrivate as broken.
>>>>>
>>>>> Well sure, but they're less broken with Opus than with e.g. Vorbis,
>>>>> which can't work at all. For mono and stereo Opus files, the only thing
>>>>> that needs to be signaled outside the data stream is exactly the
>>>>> preskip
>>>>> value. So ignoring CodecPrivate for Opus is no worse than not
>>>>> implementing any hypothetical preskip element.
>>>>>
>>>> The output gain too.
>>>>
>>>>
>>>>>
>>>>> >     Having to feed and then discard output from special data in
>>>>> CodecPrivate
>>>>> >     is moving away from a general container-level solution to this
>>>>> >     requirement,
>>>>> >
>>>>> > I agree.
>>>>> >
>>>>> >     which is generally useful for other codecs as well, to
>>>>> >     implement trimming.
>>>>> >
>>>>> > Yes, I was leaning to include the data in CodecPrivate so we didn't
>>>>> need
>>>>> > to change all players to handle this feature, as we could potentially
>>>>> > hide it in the specific decoder.
>>>>>
>>>>> Where is the specific decoder going to live though? Are you planning to
>>>>> distribute a wrapper with which accepts the CodecPrivate data? Do you
>>>>> think we should add that to the libopus API?
>>>>>
>>>> If we added it to the libopus API that would be the easiest. Otherwise
>>>> it would definitely have to be some type of wrapper on top of libopus. We
>>>> had to something like this for Vorbis.
>>>>
>>>>
>>>>>
>>>>> > If we truly think this will be useful
>>>>> > to other codecs (currently or in the future) then we can try and
>>>>> > generalize this feature.
>>>>>
>>>>> Maybe it's helpful to think of video here. If I press 'record' in the
>>>>> middle of a WebRTC session, how do we reprent the start point which
>>>>> won't generally fall on a keyframe?
>>>>>
>>>> This has been handled for years already. Those frames are not marked
>>>> with a key frame. So players have thrown out those frames until the first
>>>> keyframe (or rendered garbage).
>>>>
>>>> I wouldn't recommend changing that to something that adds a pre-skip
>>>> value to the Track header, because you will have to add latency of starting
>>>> to write the data until the recorder sees the first keyframe. Or if we
>>>> decided on setting invisible flag or TimeToDiscard then
>>>> the recorder could start writing to disk right away.
>>>>
>>>>>
>>>>> > Maybe we can add an element to the Block element, TimeToDiscard in
>>>>> > nanoseconds. A value of -1 would not render the whole Block, which
>>>>> would
>>>>> > have the same effect as setting the invisible bit. Otherwise the
>>>>> > player would need to discard TimeToDiscard time. This should satisfy
>>>>> > "preskip data does not have to be an integer multiple of compressed
>>>>> > packets", while also preserving the timestamp of the Block matches
>>>>> the
>>>>> > timestamp of the playback position.
>>>>>
>>>>> Or under the TrackEntry element? Since it only happens at the start of
>>>>> the track in the use cases I can think of.
>>>>>
>>>> I was trying to generalize it further so future codecs could take
>>>> advantage of "decoded data, that may have a duration attached to it,
>>>> but should not be rendered", within any part of the stream. If we did this
>>>> we could change how we handle VP8 altref frames (highly doubt most players
>>>> would though).
>>>>
>>>>>
>>>>> I think you're still missing part of why the Ogg mapping shifts the
>>>>> timestamp though. Part of what pre-skip is for is to account for
>>>>> algorithmic delay. The encoder has some. If the original input isn't 48
>>>>> kHz, then it went through a resampler, which can also have some. So
>>>>> shifting the timecode is _necessary_ for sync. Without it, a peak in
>>>>> the
>>>>> output won't align with a peak in the input.
>>>>>
>>>> I understand it, but I don't think the timeshift is necessary to be
>>>> muxed into Matroska files (actually I think this will have major
>>>> consequences later as we are fundamentally changing how time is handled
>>>> within Matroska.)  Adding duration to the pre-skip data was a design
>>>> choice. The algorithmic delay (and any other data) could have been easily
>>>> handled within the codec if the bitstream was defined differently.
>>>>
>>>> I'm not advocating players/decoders do not decode the pre-skip data. I
>>>> understand that the output may not align with the input if the decoder is
>>>> not primed with pre-skip data, well it will align after SeekPreRoll time
>>>> has passed. I'm just trying to come up with a solution that does not offset
>>>> all timestamps within the file, as no other codec (that I know of) has done
>>>> this. And this at a minimum will force all muxer/demuxers to
>>>> handle their timing differently. But I think this will actually cause
>>>> problems later that we are not currently thinking of.
>>>>
>>>> I think we can generalize the pre-skip data by adding the TimeToDiscard (or
>>>> SamplesToDiscard, DataToDiscard ) to the Block or to the TrackEntry
>>>> (but I think it will be cleaner if it is added to the Block) and still keep
>>>> the timestamps == playback position. Am I mistaken?
>>>>
>>>>
>>>> On Fri, Apr 12, 2013 at 12:15 PM, Ralph Giles <giles at thaumas.net>wrote:
>>>>
>>>>> On 13-04-12 10:35 AM, Frank Galligan wrote:
>>>>>
>>>>> >     First, the number of samples to be skipped is not an integer
>>>>> multiple of
>>>>> >     compressed packets, so this isn't actually possible without
>>>>> clipping
>>>>> >     valid audio from the start of the stream.
>>>>> >
>>>>> > Ahh, this was one of my earlier questions.
>>>>> >
>>>>> > [...]
>>>>> > I don't think we should worry about a decoder that ignores
>>>>> CodecPrivate.
>>>>> > Current decoders must handle CodecPrivate, so I think we can treat
>>>>> > decoders that ignore CodecPrivate as broken.
>>>>>
>>>>> Well sure, but they're less broken with Opus than with e.g. Vorbis,
>>>>> which can't work at all. For mono and stereo Opus files, the only thing
>>>>> that needs to be signaled outside the data stream is exactly the
>>>>> preskip
>>>>> value. So ignoring CodecPrivate for Opus is no worse than not
>>>>> implementing any hypothetical preskip element.
>>>>>
>>>>> >     Having to feed and then discard output from special data in
>>>>> CodecPrivate
>>>>> >     is moving away from a general container-level solution to this
>>>>> >     requirement,
>>>>> >
>>>>> > I agree.
>>>>> >
>>>>> >     which is generally useful for other codecs as well, to
>>>>> >     implement trimming.
>>>>> >
>>>>> > Yes, I was leaning to include the data in CodecPrivate so we didn't
>>>>> need
>>>>> > to change all players to handle this feature, as we could potentially
>>>>> > hide it in the specific decoder.
>>>>>
>>>>> Where is the specific decoder going to live though? Are you planning to
>>>>> distribute a wrapper with which accepts the CodecPrivate data? Do you
>>>>> think we should add that to the libopus API?
>>>>>
>>>>> > If we truly think this will be useful
>>>>> > to other codecs (currently or in the future) then we can try and
>>>>> > generalize this feature.
>>>>>
>>>>> Maybe it's helpful to think of video here. If I press 'record' in the
>>>>> middle of a WebRTC session, how do we reprent the start point which
>>>>> won't generally fall on a keyframe?
>>>>>
>>>>> > Maybe we can add an element to the Block element, TimeToDiscard in
>>>>> > nanoseconds. A value of -1 would not render the whole Block, which
>>>>> would
>>>>> > have the same effect as setting the invisible bit. Otherwise the
>>>>> > player would need to discard TimeToDiscard time. This should satisfy
>>>>> > "preskip data does not have to be an integer multiple of compressed
>>>>> > packets", while also preserving the timestamp of the Block matches
>>>>> the
>>>>> > timestamp of the playback position.
>>>>>
>>>>> Or under the TrackEntry element? Since it only happens at the start of
>>>>> the track in the use cases I can think of.
>>>>>
>>>>> I think you're still missing part of why the Ogg mapping shifts the
>>>>> timestamp though. Part of what pre-skip is for is to account for
>>>>> algorithmic delay. The encoder has some. If the original input isn't 48
>>>>> kHz, then it went through a resampler, which can also have some. So
>>>>> shifting the timecode is _necessary_ for sync. Without it, a peak in
>>>>> the
>>>>> output won't align with a peak in the input.
>>>>>
>>>>>  -r
>>>>>
>>>>>
>>>>
>>>
>>> _______________________________________________
>>> Matroska-devel mailing list
>>> Matroska-devel at lists.matroska.org
>>> http://lists.matroska.org/cgi-bin/mailman/listinfo/matroska-devel
>>> Read Matroska-Devel on GMane:
>>> http://dir.gmane.org/gmane.comp.multimedia.matroska.devel
>>>
>>
>>
>>
>> --
>> Steve Lhomme
>> Matroska association Chairman
>>
>> _______________________________________________
>> Matroska-devel mailing list
>> Matroska-devel at lists.matroska.org
>> http://lists.matroska.org/cgi-bin/mailman/listinfo/matroska-devel
>> Read Matroska-Devel on GMane:
>> http://dir.gmane.org/gmane.comp.multimedia.matroska.devel
>>
>
>
> _______________________________________________
> Matroska-devel mailing list
> Matroska-devel at lists.matroska.org
> http://lists.matroska.org/cgi-bin/mailman/listinfo/matroska-devel
> Read Matroska-Devel on GMane:
> http://dir.gmane.org/gmane.comp.multimedia.matroska.devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.matroska.org/pipermail/matroska-devel/attachments/20130523/89b7eb46/attachment-0001.html>


More information about the Matroska-devel mailing list