[Matroska-devel] Storage of WebVTT subtitles in Matroska

Denis Charmet via Matroska-devel matroska-devel at lists.matroska.org
Mon Mar 21 11:31:01 CET 2016


On 2016-03-20 19:17, Moritz Bunkus via Matroska-devel wrote:
> I've looked over the document from the WebM project[1] thinking about
> WebVTT inclusion. Unfortunately it's rather old (from 2012), and its
> text leads me to believe that WebVTT has changed considerable since
> then. For one the WebM proposal states that no CodecPrivate content 
> will
> be used; however, several global blocks in WebVTT are not covered at
> all.
> Additionally the WebM project proposes splitting the WebVTT content 
> into
> two tracks, one for the styling, one for the content. I don't like 
> this
> approach at all as we currently don't have any method of linking two
> tracks in a way that clearly indicates that those tracks must be kept
> and used together.
> I therefore don't think we can use the WebM proposal.

Amen :)

> My goal is still to keep as much information from the WebVTT file as
> possible without compromising Matroska's general ideas (and hopefully
> without jumping through hoops). I aspire to the same for other source
> container formats, not just for WebVTT. The reason is that Matroska is
> not solely used for storing content for later playback but also as an
> intermediate format. And keeping as much information as possible makes
> it a lot easier to edit the content later.

Fair enough it's a laudable intention.

> Maybe. My problem is that I don't fully understand where they're 
> trying
> to go with allowing cue timestamps within the content in the first
> place. Would something like this be valid?

 From what I understand, they try to circumvent the gray area of 
subtitles since srt doesn't really specify anything

If you have a :

00:00:00.00 --> 00:00:10.00
[alarm sounding]

00:00:01.00 -> 00:00:04.00

A basic player might display "[alarm sounding]" for 1 second and 
replace it with the "Alert"

> ------------------------------------------------------------
> 00:02:00.000 --> 00:02:10.000
> <v Professor Farnsworth>No fair!<00:02:01.500>You changed the outcome 
> by
> measuring it!
> ------------------------------------------------------------
> I guess yes, and splitting this up into two entries would require
> duplicating at leas the <v…> tag like. Additionally the muxer would 
> have
> to calculate where exactly the text after the embedded cue time stamp
> would have to appear.
> It quickly gets pretty complicated for a muxer. Therefore (and for
> keeping the original structure as intact as possible) I'm against
> splitting entries on embedded cue timestamps.

Point taken, in any case I suppose that the decoder will be able to 
split it itself.

> Here's my updated storage format proposal:
> ------------------------------------------------------------
> (A) CodecID: S_WEBVTT
> (B) CodecPrivate: This element contains all global blocks before the
>     first subtitle entry. The »WEBVTT« file identification marker is 
>     part of CodecPrivate.
> (C) Non-global blocks (e.g. »NOTE«) before an entry are stored in
>     Matroska's BlockAddition element together with the entry they
>     precede
> (D) Each entry consists of three or more lines:
> 1. The first line contains the entry's cue identifier if present in 
> the
>    source file followed by a WebVTT line terminator. If no cue
>    identifier was used then only the WebVTT line terminator is used.
> 2. The second line contains the entry's timestamp line with the actual
>    timestamps removed followed by cue settings if present followed by 
> a
>    WebVTT line terminator. The start timestamp is used as the block's
>    start timestamp, and the difference between the block's end and 
> start
>    timestamps are used as the block's duration.

Why not put the identifier and style inside the block addition keeping 
the same formalism?
The addition block would be:
Line 1: identifier or just \n
Line 2: style (without the arrow) or just \n
Line 3 if needed: NOTE

or even better (imo) swap line 1 and 2 since what interests the player 
is the style and not
the id and the notes.

This would really allow a player to interpret the whole WebVTT stuff as 
srt without too much
effort and then add in a second time the support of style because let's 
be honest people won't
reinvent the wheel. They will most likely patch their srt decoder to 
support the timestamp
inside the data.

Denis Charmet - TypX
Le mauvais esprit est un art de vivre

More information about the Matroska-devel mailing list