[Matroska-devel] Re: [Media-api] Design ideas for the editing system.

Steve Lhomme steve.lhomme at free.fr
Wed Jan 28 11:35:32 CET 2004

Toby Hudon wrote:

> Ok, now that we're supposedly working on the file format, I've been thinking about what we really need to put in there.
> Basicly my main idea is that we have something that looks like this:
> 1 or more major media types (audio/video/text), specifics like codec or colorspace are unimportant.

I agree if the architecture can be extended later. Any hacker will be 
able to add it's own format then. As long as we can handle the most 
tricky cases.

One feature we need is the easy transcoding of AVI files into MKV/MKA 
files. With an option to keep VfW/DShow/ACM compatibility or convert to 
the native Matroska format.

> For each media type, there exists ONE control stream, and one or more source streams. All have timecodes.
> The control stream basicly is the master timecode system, and the timecodes in each control stream (not source streams) should be linked so they always sync, this way we can make sure our video and audio line up.

Why do you need a 3rd stream to have 2 streams in sync ?

> The source streams each have their own (relative) timecode sequences that we can access.

Roger that.

> So, to edit a file, for simple things like cuts, speed up, slow down, and reverse, all we have to do is in the master control stream indicate what timecode of what source track we want at a given "master clock" in the control stream, and the direction and rate of play. We just indicate which tracks are active then and what relative timecodes from those tracks we want to use. Since we can associate with any timecode from the source tracks looping or time scaling is not a problem, and just changing which track is active or inactive or which timecode we wish to display in a track lets us perform cuts. This would basicly be able to perform most of what VirtualDub and such do without much effort, all it takes is editing which track and position you want at each point in time.
> Now, as for more complex effects, you'd have to have an external system to render them, so it would be a bit more like function calls or ML tags. I.e., for a fade effect, we might indicate two active tracks in the control stream, and their start timecode, rate, and direction of play, then have another variable which tells the editor we wish to render an effect or apply some filter here. The parameters passed to that effect or filter would probably vary depending on which one it is, so it probably needs to be a bit more thought out than what I've done so far. So basicly to fade we say something in the control stream like:
> 50: <track ID="1" timeindex="13953" rate="24" direction="forward"><track ID="3" timeindex="2234" rate="30" direction="reverse"><fade input1="1" input2="3" duration="300">

I suggest we have a look at the possibilities of SMIL instead of 
reinventing the wheel.

> By putting everything in the control stream, this means we can take source tracks from existing files (direct stream copy) without modification other than just parsing them into the container, regardless of codec or attributes. This means adding a new video source track to an existing edit project should be as simple as selecting a new file to read source from, and doing a data copy into a new track on the project file. Granted there's probably going to be some sort of fun involving interleaving the new data etc, but that should be transparent to this process and handled by the container for the editing file format.

Yes, the Direct Stream Copy is something really important.

> Rendering a project to a final output file is as simple as reading the control stream in order, displaying the relative parts of the source streams as needed, and calling various rendering plugins to handle transitions and effects at specified times. All we need to do is take the desired output samplerate/framerate of the video we're creating, and generate the result at each point in time we need a frame. This may require some interpolation but that's things that should probably be handled "transparently" by the editor for ease of use. I realize this will be much harder than it sounds but at least it should only need to be written once if done right.

IMO the pull model might not be a good idea. But why not. We need to 
make sure it works fine with VFR content too.

More information about the Matroska-devel mailing list