[matroska-devel] Re: Common Opensource codec API

Ronald Bultje rbultje at ronald.bitfreak.net
Fri Jun 27 15:13:14 CEST 2003


Howdy,

On Fri, 2003-06-27 at 01:47, Guenter Bartsch wrote:
> so, basically i think it would be interesting to see if it is possible
> to agree on a common standard for free multimedia plugins, especially
> for
> 
> - a common input abstraction layer
> - demuxers
> - codecs
> 
> and maybe also
> 
> - audio/video output
> 
> among some media player projects.

I'd be in favour of some of these. There's some buts here...

* if we take ffmpeg's demuxer (application) interface as an example, we
see that it is severly limited. I don't see any way to specify subtitle
streams, nor can I get private streams. I'm limited to video/audio,
which isn't a good thing. Of course this is fixable, but in the best
case, it'd require some sort of objectification, and as far as I
understand, you guys aren't really in favour of this. There's three ways
to do this:
  * c++
  * g_* stuffies
  * c struct with casts
Basically, (2) is (3) with some nicities of separation of classes and
instances around it. Anyway, most people will dislike both (1) and (2)
simply because it has dependencies (c++/glib). (3) looks evil and is
quite some work to implement, but would be worth a try. What you
probably want is - taking the demuxer as an example - to have a
bytestream object which exports codecstreams (parent object) which can
be a video/audio/subtitle/teletext/private/any stream (a child of the
codecstream parent).

Fortunately, for codecs, the idea is much simpler, but the same problem
applies to properties: each codec must be able to generate *any*
property with *any* value type. This is my main problem with ffmpeg
currently (I do want to mention that ffmpeg totally rocks, but just like
anything, it isn't perfect. ;) ), it's just one static struct. This is
nice'n'simple, but also severely limiting. This is why using ffmpeg's
current form of identifying codecs/streams/etc. wouldn't be a good idea
to use as the basis of such a generic codec API, imo.

GStreamer isn't perfect either. It depends on glib - I can understand
that people won't like that. Same goes for other interfaces. It's
basically fairly complex for a thing like a codec interface. It's fairly
generalized towards basically anything, which makes it less suited as
example for specific purposes. Basically, if I want a codec API or a
demuxer API, I probably want these to be specifically suited for that
purpose. For GStreamer, both APIs would be the same. ;). What I'm saying
is that if we want to make a good codec API, some ideas of GStreamer
(extendibility etc.) might be worth considering, but in general,
GStreamer's API an sich might be a bit too general. I'm not sure what
others think of this. The good thing is that you're not constrained by a
limited set of properties, streams, types of streams or anything;
everything is extendible without having to modify any of the GStreamer
core.

As for Xine/Mplayer, I'm sorry to say that I don't have enough
experience with their code to give a strong opinion on it. I did look at
the code a few times, but don't know the codebase well enough. I'll try
to make some general comments, though, regarding the codec API of both.

I've had a quick look at the mplayer demuxer code (simply because I took
that as an example for the ffmpeg one too), and noticed that it has
entries for audio, video and subtitles in the demuxer_t struct. That
leads to the same complaint as for ffmpeg - it's pretty good, but it is
somewhat limiting. Another problem (well, take these as comments, not
complaints) is that the actual filestream data (which should be private
in the demuxer, not adaptable by the application) is integrated in the
demuxer_t struct, too. This isn't a problem, but it's not a good idea to
make this part of the codec/demuxer API - it should only contain things
that the application actually cares about. The rest should be kept
private.

As for Xine, I read through it quickly but apparently, I don't get it.
;). xine-lib/src/demuxers/demux.h doesn't mention anything apart from
video. I guess I'm missing something. ;).

> is there some documentation available on gstreamer's plugin api? sounds
> very interesting to me, but what i found on their website was pretty
> incomplete and hat lots of broken links in it. gstreamer is definitely
> worth looking into, though.

Well, there used to be some, but I can't find it anywhere, that's pretty
much a bad thing. In our CVS, there's a gst-template module. In the
directory gst-plugin/src/*, you'll find an example plugin. That is
probably a good start. Direct WWW link:
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/gstreamer/gst-template/gst-plugin/src/

At the bottom, plugin_desc is the structure that loads the plugin. The
_class_init(), _init() and _get_type() functions are all gobject core
stuff. _get_property() and _set_property() (for properties) are all
glib/gobject property functions. _chain() is the actual datastream
callback for I/O. At the top, the pad templates define I/O points for
this plugin, and a caps (none given here) defines the type of the
stream. A GstCaps is basically what we use as way of identifying data
types. Some more info on this is in CVS, module gstreamer:
docs/random/mimetypes. www link:
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/gstreamer/gstreamer/docs/random/mimetypes. We're currenty basically done documenting all this, but it's not all implemented perfectly yet.

> let's see, what he says :) one problem with gstreamer imho is their many
> g'isms, not sure what the current state there is. i think a common api
> should not be dependant on stuff like glib or gobject (though i love
> glib i don't think it's a good idea to force everyone into using it).

As said above, I tend to agree that it's not a good dependency for a
core thing such as a codec API.

What I'd love to see is the codec API not being an actual lib, but just
a protocol, like X. Each application can then write it's own
implementation code for this, and the codec API can, if wanted, provide
a header file describing all structs (if any) etc. used in the protocol.
Tying ourselves to one and the same lib likely won't work, even now
we're already reproducing so much of the same code... Do people agree
with this?

Ok, now back to the actual question, what would it look like. A protocol
describes structs or data fields, documents what properties mean what,
etc. If we want a common demuxer API, I'd very much prefer if people
would use the parent-child object stuffing I described in the beginning
of this email. Each bytestream can export codec streams, which can be
anything, including but not limited to audio, video, subtitle, teletext,
private data or whatever. Types define the actual type of the codec
stream, and by means of a child class, all specifics are filled in for
the particular stream.
For audio, this is rate, channels, maybe (in case of raw audio) the
bitsize (8, 16) or container type (uint8_t, int16_t, maybe expansions of
these for future reference), this can also be float data, btw! We just
need to document what property name means what, and then simply make a
name/value pair system that describes all these.
For video, this would be width, height, ... For subtitles, this would be
nothing at first, I guess.
The parent object describes things like timestamps, duration, the actual
data+size.

Codecs: much easier. One input, one output. Properties are the same as
the properties for the codec stream in the demuxer in case of the
encoded data. For the decoded data, the same applies, but we'll probably
want to provide some more properties for 'raw' data. Anyway, this just
needs specifying. The mimetypes document above is what we currently use,
other comments are welcome, of course.

Concerns from my side, for as far as my experience goes with codecs in
GStreamer: do we want to separate between codec and bytestream in case
where these are (almost) the same, such as ogg/vorbis, mp3, etc? If so,
what are the exact tasks of the parser (e.g. defining properties
required by the codec, maybe metadata) and the codec (decoding), and
what do we do if these interfere (e.g. I'm being told that not ogg, but
vorbis contains the metadata of a stream!).

And just to state clearly: our final goal is to propose a standardized
API or interface of how codecs, muxer/demuxer libraries etc. should look
to be usable by our applications. It is not to define how a bytestream
should look. ;). Just so I (and you) know what we're actually talking
about.

Enough typing for now, I'd better get back to actual work here. :-o.
Comments are very much appreciated. ;).

Thanks for reading,

Ronald

-- 
Ronald Bultje <rbultje at ronald.bitfreak.net>

http://www.matroska.org



More information about the Matroska-devel mailing list