[Matroska-devel] S_DVBSUB

Steve Lhomme slhomme at matroska.org
Fri Feb 18 14:08:52 CET 2011

On Thu, Feb 17, 2011 at 10:26 PM, Dan Haddix <dan6992 at hotmail.com> wrote:
>> On Wed, Feb 16, 2011 at 10:57 PM, Dan Haddix <dan6992 at hotmail.com> wrote:
> The PID in TS is basically a track ID. Each TS packet contains a PID as part
> of it's header so that the demuxer knows which stream the packet belongs to.
> The Program Map Table, aka PMT, is a special packet inserted several time
> per second which contains a map of these PIDs and information about the
> streams they point to. Just like MKV, a TS file can hold pretty much any
> audio/video format, even if it's unknown to the demuxing application. It
> does this by using a unique descriptor tag in the PMT and then putting any
> addition information needed to properly parse the format in to the
> descriptor buffer, which is basically akin to the the CodecPrivate buffer in
> MKV. If the demuxer understands the descriptor tag then it parses the
> associated packets, if not then it ignores them. DVB subtitles are stored in
> a sort of unique way. If there are multiple subtitle tracks which share data
> then they are stored in the same PID, but each demuxed data payload contains
> another header, specific to the DVB subtitle format, which contains a page
> ID value. This page ID value tells the decoder which sub stream the packet
> belongs to and the decoder decides whether or not to display it based on the
> user selection. If the subtitle streams do not share data then they are
> simply stored as separate PIDs. In the vast majority of situations each
> subtitle stream is stored in a unique PID and the data stored in the
> descriptor buffer is mostly unnecessary. However on some occasions there
> will be two tracks attached to a single PID. This typically happens when
> there is a normal subtitle track, which only displays spoken words, and
> another for the hearing impaired which also describes ambient noises like
> phones ringing, dogs barking, etc... They do this to save space, since the
> majority of the data is shared (i.e. the spoken words) and only a small
> subset of the data is specific to the hearing impaired track. The decoder
> knows which track is selected and displays all packets designated as
> specific to that track or shared amongst all tracks, the rest are simply
> ignored.

Thanks for the detailed description. At least what I read before about
the format and my interpretation wasn't wrong :)

> Now just to be clear I'm not purposing we store any part of the TS packet,
> or the PES packet, in the MKV container. I'm only suggesting that we store
> the data payload of the PES packets, which is the DVB subtitle data, in the
> MKV chunk. I then suggest that we store the data from the TS PMT descriptor
> buffer, which describes any sub-streams, in the CodecPrivate portion of the
> track. This way the playing app can simply pass the CodecPrivate portion of
> the track along to the decoder so that it knows if this particular track
> contains multiple sub-streams or not. The demuxer itself does not need to
> know about, or care about, what is stored in the CodecPrivate portion of the
> track. It is up to the the decoder or playing app to parse that and display
> the track selection to the user as necessary. And if it's missing or the
> playing app fails to parse it then the decoder will still default to
> displaying all common packets and simply ignore those intended for a
> specific sub-stream.

But you are missing the part when the user selects which stream to
play. Maybe in the TS world it's less an issue as the streams
constantly change. But this is not how Matroska and most other
formats. You read a header, one time, and you know all the streams you
have and so you can tell the user which are possible.

While supporting your proposal is fine for single stream PIDs, it is a
lot more complicated when streams are combined. Just the startup would
be changed a lot, you would not be able to tell how many tracks there
are until you have parsed the CodecPrivate. This is not the case
nowadays. Then you'd have to have special access to the codec to tell
it to use one stream or the other, by passing the usual track
selection. It may work in some players, but I expect it to be close to
impossible in DirectShow. Especially since filters have a tendency to
have proprietary APIs for various novelty they add and usually not
made public to favor their own player. I don't think there is
currently any standard way to select only a part of a stream directly
in a codec (it's usually the job of the parser to do the data

> Here's the deal... I work as a developer for VideoReDo, a relatively popular
> video editing application which is specifically designed to edit TV
> programs. I have already added support for DVB subtitles in MKV using the
> method I've outlined and it works really well. I've also looked at the
> source code for VLC and I believe it will be relatively simple to add
> support for this method there as well. (I actually already wrote the code,
> but I can't get VLC to build so I haven't been able to test it yet) So
> basically if you go with my suggestion then there will be at least one app
> to create these files and one to play them back immediately. Using the
> TrackOperation method would require changes to the MKV muxers/demuxer in
> both applications, which I'm not capable of making, and would also require
> special processing of the DVB subtitle packets themselves during
> reading/writing which would require additional work to handle in VideoReDo
> and I'm not sure when/if I'd be able to add support.

Yes, but having one reader and one writer doesn't prove it's possible
to handle everywhere. How do you handle the stream selection in VLC
when 2 sub-streams exist ? I'd be curious to know if what you proposal
would work in GStreamer or Perian and I have serious doubt about
DirectShow. Of course a proprietary hack is always possible.

> I know that the whole sub-stream portion of DVB subs is not congruent with
> how things are normally done in MKV, but this is a unique case where you'll
> be supporting an established format in a way that applications designed to
> handle it will already be setup to understand. In fact I'd argue that using
> the TrackOperation method is actually worse since it will require the
> demuxer to recognize the format and recombine the sub-streams back into the
> established DVB subtitle data format. Where as doing it my way would put
> that burden on the decoder, which is most likely already designed to handle
> it.

But it doesn't break any design of how players usually handle stream
selection. It is easy on the demuxer side to recombine data to make a
virtual track. In fact the outside world of the container doesn't even
need to know the track is a virtual one. It's only internal cuisine.
And of course that solution is not tied to a single codec (DVBSUB). So
once you support it, it works for everything. If another combined
codec comes, you don't have to support yet another codec oddity.

As for reusing DVBSUB support that's already existing. Adding a fake
payload at the front with a Page ID is trivial too. We already use
something similar for header stripping (again, transparent outside of
the container). Except in this case it doesn't need to be put inside
the file. It's only specific to how the decoder works.

Steve Lhomme
Matroska association Chairman

More information about the Matroska-devel mailing list