[Matroska-devel] Opus in Matroksa Cont.

Frank Galligan frankgalligan at gmail.com
Fri Mar 22 22:24:49 CET 2013


*

Hello all,

I want to continue the discussion on adding Opus to Matroska. The current
Draft [1].

Open issues from [1] I want to address:

1. Seeking in Opus streams requires a pre-roll.

2. How does the OpusHead pre-skip field interact with the timestamps?


1. Pre-roll (and muxed files). I think we should add a new element in the
TrackHeader, SeekPreRoll, which uses the same units as the Cluster
timecode. In general I would like to have the container model exactly what
is happening, and let the player decide how it wants to handle playback and
seeking.

For example, file A is a muxed MKV with the following characteristics:

- 5 second interval between video keyframes

- Each video keyframe begins a new Cluster

- Cues only contain video keyframes

- Audio and video are interleaved in monotonically increasing order

For this section on pre-roll, we will assume pre-skip doesn't exist :).

For file A I’m going to describe two types of players.

The first player, the lazy player, will seek to Cluster N, which contains
timestamp T, and start video playback at T and audio at T + SeekPreRoll.

The second player, the strict player, wants to start playback of audio and
video at T. The strict player must seek to T - SeekPreRoll. If T -
SeekPreRoll and T are in the same Cluster N then the strict player will
seek to Cluster N and start decoding the audio stream from T - SeekPreRoll
and the video stream from T. If T - SeekPreRoll is in Cluster N-1 and T is
in Cluster N then the strict player will seek to Cluster N-1, parse the
Cluster N-1 data (not decode) until the player reaches T - SeekPreRoll.
Then start decoding the audio stream from T - SeekPreRoll and later, the
video stream from T.

In order to prevent extraneous parsing for the strict player, Mosu proposed
using the CueRelativePosition on the audio stream [2].  The strict player
can seek to Cluster N-1 and parse the Cluster timecode of N-1. Then use the
CueRelativePosition to seek to audio T - SeekPreRoll to start decoding the
audio. This should work well for local content, but for HTTP playback the 2
seeks will most likely be more costly in terms of latency than just having
the strict player perform a seek without CueRelativePosition.

I think a better solution of muxed content for strict players is to create
another Cluster within N-1 at T - SeekPreRoll, where T is the start time of
Cluster N. Then add CuePoints for all the new T - SeekPreRoll Clusters with
a CueTrack of the audio stream. The CuePoints for the video stream will not
change.

For example, file B is a muxed MKV with the following characteristics:

- 5 second interval between video keyframes

- Each video keyframe begins a new Cluster

- Cues will contain video keyframe CuePoints

- For each video keyframe at time T there will be new Cluster at T -
SeekPreRoll

- Cues will contain audio CuePoints for T - SeekPreRoll Clusters

- Audio and video are interleaved in monotonically increasing order

So in file A, the first Cluster starts at 0 milliseconds with a video
keyframe Block and has a duration of 5000 milliseconds. The second Cluster
starts at 5000 milliseconds with a video keyframe Block and has a duration
of 5000 milliseconds.

In file B, assume SeekPreRoll is 80 milliseconds, the first Cluster starts
at 0 milliseconds with a video keyframe Block and has a duration of 4920
milliseconds. The second Cluster starts at 4920 milliseconds with an audio
Block and has a duration of 80 milliseconds. Just to be clear, the second
Cluster can contain Blocks from all streams. The third Cluster starts at
5000 milliseconds with a video keyframe Block and has a duration of 4920
milliseconds. The fourth Cluster starts at 9920 milliseconds with an audio
Block and has a duration of 80 milliseconds.


With this proposal strict players that want audio and video to start
playback at time T can seek to Cluster T - SeekPreRoll and start decoding
the audio stream. This will work the same for both local and HTTP playback.
Unfortunately, this will increase the complexity of muxers compared to [2].

Also to be clear both [2] and Cluster T - SeekPreRoll proposals should
probably not be made mandatory of muxers. Both proposals are meant to
alleviate extra parsing, and in the case of HTTP downloading extraneous
data, for strict players.


2. Pre-skip. I see 4 possibilities for handling pre-skip.

2.1 The Opus audio stream pre-skip data starts from time 0 and adds the
pre-skip time to the normal audio time, like how Opus files are muxed into
ogg files. We would add a new element to the TrackHeader,  PreSkip, and the
decoder would adjust the timestamps of the decoded samples by subtracting
PreSkip. It would be up to the player on how to handle negative audio
timestamps.

Pros:

- Might work for future formats that want to add a PreSkip.

Cons:

- The Block timestamp does not match the decoded timestamp for all time T.
Greater chance of subtle issues popping up?

- There also could be an issue of when the real audio data starts if the
Block timecode scale is less than the sample rate. E.g a decoded block has
a timestamp of -10ms and a duration of 40ms.

- Added complexity outside of the decoder.

Questions:

- Can all players/frameworks handle negative timestamps on decoded audio?


2.2 The pre-skip data must be contained in the first audio Block (or maybe
in the CodecPrivate) with non-pre-skip encoded data.

Pros:

- No demuxer changes to handle Opus audio. Only the decoder will need to
know about the pre-skip data.

- Block timestamps will match decoded timestamps for all time T.

- No PreSkip element in the TrackHeader. The decoder will still have
pke-skip from the CodecPrivate data.

- No negative timestamps.

Cons:

- Data may need to be added to all Opus Blocks to delimit the pre-skip
data.

- Does not help future formats that need a PreSkip.

- Muxers may have to special case the pre-skip data.

Questions:

Can the decoder assume that a packet of timestamp 0 will have to decode and
throw out pre-skip samples? Then we wouldn’t need to delimit all of the
Opus Blocks.

2.3

Place the pre-skip frames in blocks that have the invisible flag set (or we
could signal with a Block with 0 duration, much like we do with VP8 altref
frames). These blocks would have the same timecode as the first
non-pre-skip frame. Whenever a frame with the invisible flag is encountered
it signals that PreSkip samples must be dropped from the decoder output.

Pros:

- Block timestamps will match decoded timestamps for all time T.

- No PreSkip element in the TrackHeader. The decoder will still have
pre-skip from the CodecPrivate data.

- No negative timestamps.

- Should work for future formats that want to add a PreSkip.

Cons:

- Muxers and demuxers will have to start handling the invisible flag on
Blocks.

- Muxers will have to special case Opus streams to handle the pre-skip data.

- If the pre-skip Blocks have a valid duration then muxers/demuxers will
have to handle Blocks that have a duration but don’t increase the playtime.


2.4 Add the pre-skip data with a negative Block timecode. As haThe problem
with 2.4 is that TimecodeScale may be set so that the pre-skip data
timecode cannot be represented in the first Cluster.

At this point I'm leaning towards 2.2, as this seems the least disruptive
overall to current players. Are there any other solutions for handling
pre-skip?


Thanks,

Frank

[1] http://wiki.xiph.org/MatroskaOpus

[2]
http://lists.matroska.org/pipermail/matroska-devel/2012-September/004250.html

*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.matroska.org/pipermail/matroska-devel/attachments/20130322/957bbaf1/attachment-0001.html>


More information about the Matroska-devel mailing list