[Matroska-devel] Subtitles in MKV? Any docs?

Moritz Bunkus moritz at bunkus.org
Thu Jun 17 10:11:17 CEST 2004


Hi,,

> > well.. our point is to make support for matroska subtitles. I've written a
> > dshow filter for handling subtitles embedded in containers.

Out of curiosity - why aren't you using Gabest's VSFilter?

> > I've dumped several MKV subtitle streams and the data looks like the
> > following :
> >
> > 0,0,Default,,0,0,0,,Here we go.
> > 1,0,Default,,0,0,0,,Hiya fellas.

There are basically three subtitle formats in Matroska at the
moment. They're identified by their CodecID element. Two of them are
text formats (S_TEXT/UTF8 and S_TEXT/SSA / S_TEXT/ASS), then third one
is the same format used on DVDs (S_VOBSUB) which is a graphical format.

The simpler format of the two text formats is S_TEXT/UTF8. It just
contains the text to display in each packet. The timecode and the
duration are set by the container (BlockTimecode and BlockDuration
elements). There's no special format inside. It's just the simple SRT
file format converted to Matroska, so there may be HTML like tags to
alter the appearance.

S_TEXT/SSA and S_TEXT/ASS are imported SubStation Alpha / Advanced SSA
subtitles. You can find the general description of this format somewhere
on the net. As for the additions in ASS you can have a look at
http://sourceforge.net/docman/display_doc.php?docid=19407&group_id=82303

Now those lines are not stored 1:1 in Matroska but transformed a
bit. Here's a quick overview:

1. The global settings in a SSA file (the [Script Info] and [V4 Styles]
   sections) are stored in the track's CodecPrivate element in pure text
   form.

2. Each line in the original file looks something like this:
   Dialogue: Marked=0,0:00:01.00,0:00:10.00,*Default,,0000,0000,0000,,{\fs50\fnCHRISTINA\fe2}test
   Such a line is transformed. First, 'Dialogue: Marked=' is removed. So
   are the start and end time which are again stored in Matroska's
   BlockTimecode and BlockDuration elements. Last this line is prepended
   with the line number in the original file. So if this was the second
   'Dialogue: ...' entry in the original SSA file we'd end up with a
   line in Matroska like this:
   2,0,,,*Default,,0000,0000,0000,,{\fs50\fnCHRISTINA\fe2}test

> > the timestamped data looks a bit like SSA, but I want to be sure I'm doing
> > everything right....

It is SSA or ASS with the transformations mentioned above, yes.

Mosu

-- 
If Darl McBride was in charge, he'd probably make marriage
unconstitutional too, since clearly it de-emphasizes the commercial
nature of normal human interaction, and probably is a major impediment
to the commercial growth of prostitution. - Linus Torvalds



More information about the Matroska-devel mailing list