[Matroska-devel] Opus in Matroska

Joseph Ashwood ashwood at msn.com
Fri Sep 21 01:21:09 CEST 2012


[removed all the content I'm replying too, there was simply too much, and I 
address things very much out of order]

I'm going to begin with a viewpoint, and then go from there. In the longer 
term it is important to have gapless playback, but in the near term I don't 
think its necessary. I'm also going to just call it all preprocessing 
because it is all about the delays. Lets look at some of the main usage 
cases:

Video Playback:
Genuine video almost always has a few frames at the beginning that are throw 
away anyway. Having an audio codec that has no sound for 3 frames (~90 ms) 
won't pose a viewing problem.

Music Playback:
Almost all songs, mixes, etc have a built in cue delay. This goes back to 
the tape recording days, but has been carried over. This is almost always > 
100 ms

Video seek:
During a video seek a black screen should be displayed to signal the cut 
anyway (although very often a frozen one is used in computer decoding), 
giving this black screen for 3 frames (~90 ms) is still reasonable. An audio 
delay of 100 ms is perceptible but ignored by the human mind, this is the 
same delay we all experience on a daily basis from communicating over about 
110 feet (33 meters), and we never notice it.

Music seek:
In a music seek a delay is necessary to avoid popping the speaker anyway. 
While this can be 1 sample (1/44100 second) typical lengths are much longer 
for human comfort. Saying that these are now 80ms is not a real problem. A 
roll-up volume change should be applied anyway, for human comfort, the 
roll-up provides plenty of time for the codec to preprocess.

So I contend that from the actual usage standpoint, the addition of 80 ms 
where the codec seems to be just sitting around does not form a real 
problem.

While this is an issue that should be addressed at the next significant 
update (v4) it can be addressed along with the other minor issues that have 
been found. This also gives time to consider whether this will be an anomaly 
for Opus alone, or if other codecs will be developed using the techniques as 
well.

So to be specific:
Opus stores sound to be played at time T as being at time T, there is no 
variation.
When playback begins: Opus codec requests audio sample T+0. Opus codec 
processes but provides no audio, only a series of samples all 0.
After preprocessing: Opus codec requests sample T+n for playback at time 
T+n.
After seek: Opus returns to playback begin state with new T.

The encoder (the person, not necessarily the program) needs to be aware of 
the anomaly to correct for it at the beginning of the video, probably 
through a sound roll-up (no sound for the first 100ms, then bringing the 
sound from 0 to full volume over the next 100 ms). The viewer does not need 
to even be aware.

Then fixing the problems that become known during the usage in Matroska v4 
makes sense.
                Joe



More information about the Matroska-devel mailing list