[Matroska-devel] Re: USF : Universal Subtitles Standard - Improved hardware support with EBML muxing ?

Seiler Fabien SEILF at hta-bi.bfh.ch
Tue Oct 28 09:08:09 CET 2003

I was "absent" for a month now, because i first had pre-diploma exams, then i moved and at my new address i don't have intenet upto now (they have to install new cables in the house before i can use my cable modem :((( this could be done in the next two week , i hope) 

..so, i'm completely uninformed.. but still, i continued to code on u96, but i haven't uploaded recent builds.  

About which form to use for muxing, i don't have a preference. For sole PC use i see no problem in the XML approach and i guess the size would not be considerably bigger than in EBML format. In my experience, USF files are 20-50% bigger than the same content in "full"* SSA .. i guess EBML USF would be about 50% the size of XML USF, but as we talk about maybe 50-100 vs. 25-50 KB this is not really important IMHO. 
On the other side i understand very well it is not very good if a hardware device needs a XML parser just for USF subtitles ... 

(SSA can omit "attributes" decreasing its size by about 50% , but this feature is not supported by any of Gabest's parsers (VobSub, VSfilter) and U96 might be the only app beside Sub Station Alpha itself that supports it)

If the decision goes to EBML USF, i guess i would have to provide a export function, writing a *.mks file with pre-EBMLed USF, but i have not looked into EBML/Matroska structure at all and i most probably would need extensive help from the experienced matroska coders -even if it's only to use the libebml, libmatroska properly ;). Well, i am not opposed to the EBML variant, i just fear the work it would imply for me :P

I could also imagine allowing both formats. At least for a first time (test phase). Once a PC based parser would know both formats it is obviously it can map the tags xml<->ebml - This would later enable us to restrict the specs to EBML format only and this parser could be used in a transitional time to convert those files muxed with xml to EBML format. IF both formats are allowed for some time AND a later restriction is planned THEN the test phase allowing both should be short to avoid too many users creating files they have to convert later (or only releasing the muxer over IRC or hiding one format, like mkvmerge only supporting through CLI not in MMG or similar solution).

This approach is not very straight forward, but if there is no one who can give really convincing arguments for or against one of the formats, it could be desirable to evaluate both methods to gain experience which dis-/advantages each format has.


One thing i wait for is embedded files (base 64 encoded). I support the SSA file embedding by now and adding support for base 64 is already coded, but without a spec i don't know in what tags i should write the encoded file :) - my suggestion were:

- a new root level node <embedded> (same level as the <styles>,<metadata>,<subtitles> but preferably the last node due to the potentially large size of the embedded data)

- inside <embedded> any number of <file> nodes. <file> should have a mandatory attribute "filename" or "name" and optinal attributes "size" and/or "rawsize" (encoded, decored size in bytes). The content of the <file> node is the base64 encoded file data.

DTD proposal (dont know if the synthax is correct, i don't have XML references at hand here in school): 

<!element USFSubtitles (..., embedded?, ...) >

<!element embedded (file*)>

<!element file  #CDATA> <!-- the CDATA is the base64 data -->
<!attribute filename string> <!-- should be marked mandatory -->
<!attribute size string> <!-- should be marked optional -->
<!attribute rawsize string> <!-- should be marked optional -->

Furthermore i would like to see effects and shapes defined sometime and a decicion whether to merge karaoke and text nodes or not. If no one is working/going to work on these i might work out a real (valid) "DTD proposal" for a next specifications draft once i completed and uploaded the new u96 version (this or next week?)


>unmei, is the version here the most actual : 
>http://www.hta-bi.bfh.ch/~seilf/ ?? Do you have the >email adress from 
>Kovac Endre ?

sorry, i don't have the mail address of Endre and i haven't had contact with him for a long time already. (I didnt even know about his SSA support, which i btw can provide as well - ASS not yet but basic ASS could be implemented in a short time once i start it)

this is by far NOT the most recent version - heh you might have guessed from the date! - its july or august - right before i moved the project to corecodec. The most recent version should now always be on 


Still, this is at the moment a month old, i will upload a new build soon - through my school if i have to wait for internet much longer.

The next release already has the following new features over the current release (september, build 477):

-karaoke editor
-improved selection handling (set/copy/erase/move/attribute change/from OGM (text) and matroska (xml) chapters)
-image sequence generator
-undo (restore points)
-all the other things i forgot

-maybe more if i can't decide to release and keep adding new stuff any longer :p

unmei, u96 USF editor programmer

More information about the Matroska-devel mailing list