[Matroska-devel] Opus in Matroksa Cont.

Steve Lhomme slhomme at matroska.org
Fri May 24 08:44:51 CEST 2013


In fact I'm thinking of having 2 new elements for the timecode scale
fraction. The current value would be an approximative clock in nanoseconds
(fallback) while the new elements would be the "perfect" clock for the
stream. The track timecodescale would follow the same rule too. So the main
one could be 1/1 and each track could be 1/44100 for audio and 30,000/1001
for video. Older players would use the current systems, while players aware
of the new fields would use these. That means they will also have to adjust
every value that is in "scaled" units accordingly. It's both forward and
backward compatible.

On Fri, May 24, 2013 at 8:01 AM, Frank Galligan <frankgalligan at gmail.com>wrote:

> Are you talking about the TimecodeScaleDenominator? So to get get sample
> accurate timing with PreSkip, of 44100Hz, we would set TimecodeScaleDenominator
> to 44,100,000,000 assuming TimecodeScale is set to 1000,000. E.g. if we
> wanted to skip 1024 samples we would set PreSkip to 1,024,000,000?
>
> I'm fine with it.
>
> As for a bad remuxing, we are no worse than what current muxers do today.
>
>
> On Thu, May 23, 2013 at 1:39 PM, Steve Lhomme <slhomme at matroska.org>wrote:
>
>> For the sample accuracy we could introduce the fraction timecodescale in
>> parallel of the existing one. For now only PreSkip would use it. The
>> drawback would be that a bad remuxing may lose that information (but
>> PreSkip would be lost too).
>>
>> I have to check in the specs there might be one similar thing already. If
>> not it could use the new timecodescale too.
>> On May 23, 2013 7:19 PM, "Frank Galligan" <frankgalligan at gmail.com>
>> wrote:
>>
>>> I was hoping to fix the codec delay for Opus, as well AAC and Vorbis
>>> (and any other codecs). I was hoping that older demuxers would just skip
>>> over the new PreSkip. Then the older demuxers can behave like they
>>> currently do with the old files as well as the new files with the PreSkip
>>> element.
>>>
>>> I actually ran into a high priority issue we had to address while I was
>>> getting ready for IO. Around December of last year FFmpeg changed how they
>>> handle codec delay in Matroska. Before December FFmpeg would prepend the
>>> codec delay to he audio stream and shift all of the encoded audio by codec
>>> delay. For Vorbis this is 128 samples and AAC 1024 samples.
>>> The video stream would be left alone. After December FFmpeg would prepend
>>> the codec delay to he audio stream and shift all of the encoded audio by
>>> codec delay, as it did before. But FFmpeg would then shift all of the video
>>> by codec delay. I'm guessing in hopes of keeping better AV sync. Both
>>> workflows are still wrong as both require players to implicitly know that
>>> Codec A has a delay of N. Also switching from one way to another, I think
>>> made it worse as now players have to try and guess if the video really
>>> starts at N or 0. For editors, this is even worse if they re-compress a few
>>> times.
>>>
>>> This is why I want to explicitly represent what the muxers are doing
>>> with codec delay currently in the file, for all audio codecs. Older
>>> demuxers shouldn't have a problem with the new element, unless they error
>>> on unknown elements, but wouldn't that make them non-complaint?
>>>
>>> As for the unit of PreSkip this is a value, that translates to the exact
>>> number of samples, that have been generated and prepended by the muxer that
>>> created the file. We are not specifying which component of the player needs
>>> to handle the codec delay. We are just modeling what the encoder/muxer did.
>>> I think in most cases this will be handled outside of the coded. I think it
>>> is better to have accurate sample accuracy now rather than later. We
>>> already have issues with editors today. Also I can see a sample cases that
>>> probably wants exact sample duration, a radio station playing files back to
>>> back. As for the unit itself I would prefer samples, then we won't have any
>>> conversion issues. But I would settle for nanoseconds as I know nothing is
>>> expressed in samples today, worst case we should only be off by one sample.
>>> I would not want timecodescale as the resolution is usually too small.
>>>
>>> PostPadding I agree adding it to the BlockGroup should be better. Live
>>> streaming could than use PostPadding to have a sample accurate duration.
>>>
>>> Frank
>>>
>>>
>>>
>>>
>>> On Thu, May 23, 2013 at 12:35 AM, Steve Lhomme <slhomme at matroska.org>wrote:
>>>
>>>> Hi guys,
>>>>
>>>> Glad we're back at this. I saw all the I/O talks on WebM/VP9 and had
>>>> the feeling this Opus decision was slowing things down. So we should try to
>>>> finalize a solution soon.
>>>>
>>>> I'm not too keen on forcing all demuxers to have to handle a new
>>>> element. But since it's only for Opus, if players work on adding support
>>>> for Opus, they might as well support for this element too. Plus it's not
>>>> too much work to add a shift in the pipeline (at least the frameworks I
>>>> know). It will just be a bit more work than just dropping the codec library
>>>> in there and plugging it in the framework. But it seems to be the only way
>>>> to make it work properly for all use cases.
>>>>
>>>> About the unit, there is currently nothing in Matroska that is accurate
>>>> to a sample. On the other hand any other value (average time units) would
>>>> not make sense for this. If the value is passed to the codec, then it's
>>>> codec specific. If the value is just used by the playback framework then
>>>> sample accuracy may not be needed, we don't have it for audio sync anyway
>>>> (unless timecodescale values are carefully picked) and a value in
>>>> timecodescale would be enough. In the future if we change the timecodescale
>>>> for more accuracy, this value will benefit from it too.
>>>>
>>>> About PostPadding, since it's only for the las Block, why not just add
>>>> it in the BlockGroup of that lst Block. That information is useless
>>>> everywhere else.
>>>>
>>>>
>>>> On Thu, May 23, 2013 at 12:23 AM, Frank Galligan <
>>>> frankgalligan at gmail.com> wrote:
>>>>
>>>>> Hello all,
>>>>>
>>>>> I have changed my position and I'm in favor of 2.1 from the wiki [1],
>>>>> which I think is in line with what Ralph and Mosu were advocating. One of
>>>>> the biggest issues I had with 2.1, was that I was worried about the unknown
>>>>> ramifications of timeshifting all the samples. Well as it turns out,
>>>>> I didn't really need to worry as they are already timeshifted. Vorbis is
>>>>> shifted 128 samples and aac is shifted by 1024 (with FFmpeg at least). So
>>>>> encoders/muxers are already doing this currently, but not explicitly
>>>>> representing that in the Matroska file. I think Raplh mentioned that
>>>>> earlier.
>>>>>
>>>>> So I'm advocating 2.1, I.e. add a PreSkip element to the TrackEntry
>>>>> element. PreSkip would be a non-mandatory unsigned integer with a default
>>>>> value of 0. I agree with Mosu that PreSkip units should be samples wrt
>>>>> audio. If we choose another resolution, I just want to make sure we can
>>>>> convert exactly to samples.
>>>>>
>>>>> I would also like to propose adding a new element, PostPadding to the
>>>>> TrackEntry element. PostPadding is the number of samples that are added by
>>>>> the encoder to the end of the stream. PostPadding would be a non-mandatory
>>>>> unsigned integer with a default value of 0. PostPadding units would match
>>>>> PreSkip units.
>>>>>
>>>>> With these 2 new elements, encoded Matroska files should be able
>>>>> to accurately represent the duration of the source samples.
>>>>>
>>>>> Frank
>>>>>
>>>>> [1] https://wiki.xiph.org/MatroskaOpus
>>>>>
>>>>>
>>>>> On Wed, May 22, 2013 at 3:15 PM, Frank Galligan <
>>>>> frankgalligan at gmail.com> wrote:
>>>>>
>>>>>> I just realized I sent a reply only to Ralph on 4/12. I'm copying the
>>>>>> reply below, but I have since changed my position. I will follow up  in
>>>>>> another email.
>>>>>>
>>>>>> I updated the wiki (https://wiki.xiph.org/MatroskaOpus) with options
>>>>>> that I have seen for handling pre-skip.
>>>>>>
>>>>>>
>>>>>> On Fri, Apr 12, 2013 at 12:15 PM, Ralph Giles <giles at thaumas.net>
>>>>>>  wrote:
>>>>>>
>>>>>> On 13-04-12 10:35 AM, Frank Galligan wrote:
>>>>>>>
>>>>>>> >     First, the number of samples to be skipped is not an integer
>>>>>>> multiple of
>>>>>>> >     compressed packets, so this isn't actually possible without
>>>>>>> clipping
>>>>>>> >     valid audio from the start of the stream.
>>>>>>> >
>>>>>>> > Ahh, this was one of my earlier questions.
>>>>>>> >
>>>>>>> > [...]
>>>>>>> > I don't think we should worry about a decoder that ignores
>>>>>>> CodecPrivate.
>>>>>>> > Current decoders must handle CodecPrivate, so I think we can treat
>>>>>>> > decoders that ignore CodecPrivate as broken.
>>>>>>>
>>>>>>> Well sure, but they're less broken with Opus than with e.g. Vorbis,
>>>>>>> which can't work at all. For mono and stereo Opus files, the only
>>>>>>> thing
>>>>>>> that needs to be signaled outside the data stream is exactly the
>>>>>>> preskip
>>>>>>> value. So ignoring CodecPrivate for Opus is no worse than not
>>>>>>> implementing any hypothetical preskip element.
>>>>>>>
>>>>>> The output gain too.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> >     Having to feed and then discard output from special data in
>>>>>>> CodecPrivate
>>>>>>> >     is moving away from a general container-level solution to this
>>>>>>> >     requirement,
>>>>>>> >
>>>>>>> > I agree.
>>>>>>> >
>>>>>>> >     which is generally useful for other codecs as well, to
>>>>>>> >     implement trimming.
>>>>>>> >
>>>>>>> > Yes, I was leaning to include the data in CodecPrivate so we
>>>>>>> didn't need
>>>>>>> > to change all players to handle this feature, as we could
>>>>>>> potentially
>>>>>>> > hide it in the specific decoder.
>>>>>>>
>>>>>>> Where is the specific decoder going to live though? Are you planning
>>>>>>> to
>>>>>>> distribute a wrapper with which accepts the CodecPrivate data? Do you
>>>>>>> think we should add that to the libopus API?
>>>>>>>
>>>>>> If we added it to the libopus API that would be the easiest.
>>>>>> Otherwise it would definitely have to be some type of wrapper on top of
>>>>>> libopus. We had to something like this for Vorbis.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> > If we truly think this will be useful
>>>>>>> > to other codecs (currently or in the future) then we can try and
>>>>>>> > generalize this feature.
>>>>>>>
>>>>>>> Maybe it's helpful to think of video here. If I press 'record' in the
>>>>>>> middle of a WebRTC session, how do we reprent the start point which
>>>>>>> won't generally fall on a keyframe?
>>>>>>>
>>>>>> This has been handled for years already. Those frames are not marked
>>>>>> with a key frame. So players have thrown out those frames until the first
>>>>>> keyframe (or rendered garbage).
>>>>>>
>>>>>> I wouldn't recommend changing that to something that adds a pre-skip
>>>>>> value to the Track header, because you will have to add latency of starting
>>>>>> to write the data until the recorder sees the first keyframe. Or if we
>>>>>> decided on setting invisible flag or TimeToDiscard then
>>>>>> the recorder could start writing to disk right away.
>>>>>>
>>>>>>>
>>>>>>> > Maybe we can add an element to the Block element, TimeToDiscard in
>>>>>>> > nanoseconds. A value of -1 would not render the whole Block, which
>>>>>>> would
>>>>>>> > have the same effect as setting the invisible bit. Otherwise the
>>>>>>> > player would need to discard TimeToDiscard time. This should
>>>>>>> satisfy
>>>>>>> > "preskip data does not have to be an integer multiple of compressed
>>>>>>> > packets", while also preserving the timestamp of the Block matches
>>>>>>> the
>>>>>>> > timestamp of the playback position.
>>>>>>>
>>>>>>> Or under the TrackEntry element? Since it only happens at the start
>>>>>>> of
>>>>>>> the track in the use cases I can think of.
>>>>>>>
>>>>>> I was trying to generalize it further so future codecs could take
>>>>>> advantage of "decoded data, that may have a duration attached to it,
>>>>>> but should not be rendered", within any part of the stream. If we did this
>>>>>> we could change how we handle VP8 altref frames (highly doubt most players
>>>>>> would though).
>>>>>>
>>>>>>>
>>>>>>> I think you're still missing part of why the Ogg mapping shifts the
>>>>>>> timestamp though. Part of what pre-skip is for is to account for
>>>>>>> algorithmic delay. The encoder has some. If the original input isn't
>>>>>>> 48
>>>>>>> kHz, then it went through a resampler, which can also have some. So
>>>>>>> shifting the timecode is _necessary_ for sync. Without it, a peak in
>>>>>>> the
>>>>>>> output won't align with a peak in the input.
>>>>>>>
>>>>>> I understand it, but I don't think the timeshift is necessary to be
>>>>>> muxed into Matroska files (actually I think this will have major
>>>>>> consequences later as we are fundamentally changing how time is handled
>>>>>> within Matroska.)  Adding duration to the pre-skip data was a design
>>>>>> choice. The algorithmic delay (and any other data) could have been easily
>>>>>> handled within the codec if the bitstream was defined differently.
>>>>>>
>>>>>> I'm not advocating players/decoders do not decode the pre-skip data.
>>>>>> I understand that the output may not align with the input if the decoder is
>>>>>> not primed with pre-skip data, well it will align after SeekPreRoll time
>>>>>> has passed. I'm just trying to come up with a solution that does not offset
>>>>>> all timestamps within the file, as no other codec (that I know of) has done
>>>>>> this. And this at a minimum will force all muxer/demuxers to
>>>>>> handle their timing differently. But I think this will actually cause
>>>>>> problems later that we are not currently thinking of.
>>>>>>
>>>>>> I think we can generalize the pre-skip data by adding the
>>>>>> TimeToDiscard (or SamplesToDiscard, DataToDiscard ) to the Block or
>>>>>> to the TrackEntry (but I think it will be cleaner if it is added to the
>>>>>> Block) and still keep the timestamps == playback position. Am I mistaken?
>>>>>>
>>>>>>
>>>>>> On Fri, Apr 12, 2013 at 12:15 PM, Ralph Giles <giles at thaumas.net>wrote:
>>>>>>
>>>>>>> On 13-04-12 10:35 AM, Frank Galligan wrote:
>>>>>>>
>>>>>>> >     First, the number of samples to be skipped is not an integer
>>>>>>> multiple of
>>>>>>> >     compressed packets, so this isn't actually possible without
>>>>>>> clipping
>>>>>>> >     valid audio from the start of the stream.
>>>>>>> >
>>>>>>> > Ahh, this was one of my earlier questions.
>>>>>>> >
>>>>>>> > [...]
>>>>>>> > I don't think we should worry about a decoder that ignores
>>>>>>> CodecPrivate.
>>>>>>> > Current decoders must handle CodecPrivate, so I think we can treat
>>>>>>> > decoders that ignore CodecPrivate as broken.
>>>>>>>
>>>>>>> Well sure, but they're less broken with Opus than with e.g. Vorbis,
>>>>>>> which can't work at all. For mono and stereo Opus files, the only
>>>>>>> thing
>>>>>>> that needs to be signaled outside the data stream is exactly the
>>>>>>> preskip
>>>>>>> value. So ignoring CodecPrivate for Opus is no worse than not
>>>>>>> implementing any hypothetical preskip element.
>>>>>>>
>>>>>>> >     Having to feed and then discard output from special data in
>>>>>>> CodecPrivate
>>>>>>> >     is moving away from a general container-level solution to this
>>>>>>> >     requirement,
>>>>>>> >
>>>>>>> > I agree.
>>>>>>> >
>>>>>>> >     which is generally useful for other codecs as well, to
>>>>>>> >     implement trimming.
>>>>>>> >
>>>>>>> > Yes, I was leaning to include the data in CodecPrivate so we
>>>>>>> didn't need
>>>>>>> > to change all players to handle this feature, as we could
>>>>>>> potentially
>>>>>>> > hide it in the specific decoder.
>>>>>>>
>>>>>>> Where is the specific decoder going to live though? Are you planning
>>>>>>> to
>>>>>>> distribute a wrapper with which accepts the CodecPrivate data? Do you
>>>>>>> think we should add that to the libopus API?
>>>>>>>
>>>>>>> > If we truly think this will be useful
>>>>>>> > to other codecs (currently or in the future) then we can try and
>>>>>>> > generalize this feature.
>>>>>>>
>>>>>>> Maybe it's helpful to think of video here. If I press 'record' in the
>>>>>>> middle of a WebRTC session, how do we reprent the start point which
>>>>>>> won't generally fall on a keyframe?
>>>>>>>
>>>>>>> > Maybe we can add an element to the Block element, TimeToDiscard in
>>>>>>> > nanoseconds. A value of -1 would not render the whole Block, which
>>>>>>> would
>>>>>>> > have the same effect as setting the invisible bit. Otherwise the
>>>>>>> > player would need to discard TimeToDiscard time. This should
>>>>>>> satisfy
>>>>>>> > "preskip data does not have to be an integer multiple of compressed
>>>>>>> > packets", while also preserving the timestamp of the Block matches
>>>>>>> the
>>>>>>> > timestamp of the playback position.
>>>>>>>
>>>>>>> Or under the TrackEntry element? Since it only happens at the start
>>>>>>> of
>>>>>>> the track in the use cases I can think of.
>>>>>>>
>>>>>>> I think you're still missing part of why the Ogg mapping shifts the
>>>>>>> timestamp though. Part of what pre-skip is for is to account for
>>>>>>> algorithmic delay. The encoder has some. If the original input isn't
>>>>>>> 48
>>>>>>> kHz, then it went through a resampler, which can also have some. So
>>>>>>> shifting the timecode is _necessary_ for sync. Without it, a peak in
>>>>>>> the
>>>>>>> output won't align with a peak in the input.
>>>>>>>
>>>>>>>  -r
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Matroska-devel mailing list
>>>>> Matroska-devel at lists.matroska.org
>>>>> http://lists.matroska.org/cgi-bin/mailman/listinfo/matroska-devel
>>>>> Read Matroska-Devel on GMane:
>>>>> http://dir.gmane.org/gmane.comp.multimedia.matroska.devel
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Steve Lhomme
>>>> Matroska association Chairman
>>>>
>>>> _______________________________________________
>>>> Matroska-devel mailing list
>>>> Matroska-devel at lists.matroska.org
>>>> http://lists.matroska.org/cgi-bin/mailman/listinfo/matroska-devel
>>>> Read Matroska-Devel on GMane:
>>>> http://dir.gmane.org/gmane.comp.multimedia.matroska.devel
>>>>
>>>
>>>
>>> _______________________________________________
>>> Matroska-devel mailing list
>>> Matroska-devel at lists.matroska.org
>>> http://lists.matroska.org/cgi-bin/mailman/listinfo/matroska-devel
>>> Read Matroska-Devel on GMane:
>>> http://dir.gmane.org/gmane.comp.multimedia.matroska.devel
>>>
>>
>> _______________________________________________
>> Matroska-devel mailing list
>> Matroska-devel at lists.matroska.org
>> http://lists.matroska.org/cgi-bin/mailman/listinfo/matroska-devel
>> Read Matroska-Devel on GMane:
>> http://dir.gmane.org/gmane.comp.multimedia.matroska.devel
>>
>
>
> _______________________________________________
> Matroska-devel mailing list
> Matroska-devel at lists.matroska.org
> http://lists.matroska.org/cgi-bin/mailman/listinfo/matroska-devel
> Read Matroska-Devel on GMane:
> http://dir.gmane.org/gmane.comp.multimedia.matroska.devel
>



-- 
Steve Lhomme
Matroska association Chairman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.matroska.org/pipermail/matroska-devel/attachments/20130524/49883f0f/attachment-0001.html>


More information about the Matroska-devel mailing list