[Matroska-devel] Storage of WebVTT subtitles in Matroska

Moritz Bunkus via Matroska-devel matroska-devel at lists.matroska.org
Sun Mar 20 19:17:42 CET 2016


thanks for your input, Denis.

I've looked over the document from the WebM project[1] thinking about
WebVTT inclusion. Unfortunately it's rather old (from 2012), and its
text leads me to believe that WebVTT has changed considerable since
then. For one the WebM proposal states that no CodecPrivate content will
be used; however, several global blocks in WebVTT are not covered at

Additionally the WebM project proposes splitting the WebVTT content into
two tracks, one for the styling, one for the content. I don't like this
approach at all as we currently don't have any method of linking two
tracks in a way that clearly indicates that those tracks must be kept
and used together.

I therefore don't think we can use the WebM proposal.

That being said the WebM proposal contains similar thoughts to mine
about keeping the entry tags and the timecode line

Additionally using the BlockAdditions is a good idea; thanks, Denis. I'm
incorporating it.

Denis had an objection:

> That's basically useless... I'd remove it too, the less useless stuff
> the decoder/muxer has to handle, the better.

My goal is still to keep as much information from the WebVTT file as
possible without compromising Matroska's general ideas (and hopefully
without jumping through hoops). I aspire to the same for other source
container formats, not just for WebVTT. The reason is that Matroska is
not solely used for storing content for later playback but also as an
intermediate format. And keeping as much information as possible makes
it a lot easier to edit the content later.

Another objection/question:

> >6. cue timestamps in entries

> Wouldn't it be easier to just create another block starting with the
> new timestamp and ending with containing block end?

Maybe. My problem is that I don't fully understand where they're trying
to go with allowing cue timestamps within the content in the first
place. Would something like this be valid?

00:02:00.000 --> 00:02:10.000
<v Professor Farnsworth>No fair!<00:02:01.500>You changed the outcome by
measuring it!

I guess yes, and splitting this up into two entries would require
duplicating at leas the <v…> tag like. Additionally the muxer would have
to calculate where exactly the text after the embedded cue time stamp
would have to appear.

It quickly gets pretty complicated for a muxer. Therefore (and for
keeping the original structure as intact as possible) I'm against
splitting entries on embedded cue timestamps.

Here's my updated storage format proposal:


(B) CodecPrivate: This element contains all global blocks before the
    first subtitle entry. The »WEBVTT« file identification marker is NOT
    part of CodecPrivate.

(C) Non-global blocks (e.g. »NOTE«) before an entry are stored in
    Matroska's BlockAddition element together with the entry they

(D) Each entry consists of three or more lines:

1. The first line contains the entry's cue identifier if present in the
   source file followed by a WebVTT line terminator. If no cue
   identifier was used then only the WebVTT line terminator is used.

2. The second line contains the entry's timestamp line with the actual
   timestamps removed followed by cue settings if present followed by a
   WebVTT line terminator. The start timestamp is used as the block's
   start timestamp, and the difference between the block's end and start
   timestamps are used as the block's duration.

3. All following lines are the content lines from the entry. If the
   content lines contain cue timestamps then those timestamps MUST be
   adjusted to be relative to the entry's start timestamp (and a demuxer
   has to reverse this process again).

Here's an example how an entry could be converted. If a WebVTT file
looks like this:


example entry 1
00:03:10.000 --> 00:03:20.000 region:bill align:right
Entries can even include timestamps. For example:
<00:03:15.000>This becomes visible five seconds after the first part.

NOTE This is a comment block.

00:05:22.000 --> 00:05:28.700
Another entry, this one with neither a cue identifier nor cue settings.

then the converted first entry would look line this:

example entry 1
--> region:bill align:right
Entries can even include timestamps. For example:
<00:00:05.000>This becomes visible five seconds after the first part.

The second Matroska block will contain a BlockAddition element:

NOTE This is a comment block.

and the actual block:


Another entry, this one with neither a cue identifier nor cue settings.

Kind regards,

[1]  http://wiki.webmproject.org/webm-metadata/temporal-metadata/webvtt-in-webm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: not available
URL: <http://lists.matroska.org/pipermail/matroska-devel/attachments/20160320/2f0f6ce0/attachment.sig>

More information about the Matroska-devel mailing list