[matroska-devel] File ID / Track ID

ChristianHJW christian.hj.wiesner at web.de
Mon Jan 20 11:45:09 CET 2003

Hi all,

i was talking to Steve already about this and thought i should drop an email to the list also.

If any of you have been doing p2p networking ( i sometimes do :D ), you will certainly agree there is nothing more annoying than those morons spreading fakes around. They probably think they are very cool or simply love to fool others, i dont know. In any case its a major waste of bandwidth, and the matroska file ID nr. ( 128 bit MD5 of block headers ) could be used to fight this, once p2p app developers start to support matroska by reading file headers and validating the info given in it.

Now, as we all know its happening pretty often that there are more than one version of a file floating around in the internet, people mux different subtitles with it, remove subtitles, or will add covers and lyrics once matroska will be available to do so. In any case, those mods would result in a new file ID number ( correct ? ), while the basic content ( = the main video stream ) would be exactly the same.

A matroska ID was best used when there was a central, independant database server hosting all ID nrs of VALID ( = no fakes ) files, such that users can compare the indicated ID from the p2p app with the one given in the database for a specific movie he wants to have. If the 2 IDs match he can download the file with ( almost ) no risk, except the user faking the file did more than just rename it, but edited the ID nr. using spyder's or Pamel's XML/EBML tools. While this is of course well possible it will certainly help to reduce the number of fakes floating around, as today they people faking files simply have to rename them ( every fool can do that ).

Now, the big problem i see with that is that pretty fast there will be a huge number of valid IDs for a certain movie, given the number of mods that are done based on a certain video stream ( different languages, subtitles, etc. ). This broght me to the idea of introducing another ID number, calculated specifically for the first video stream in each file, thus allowing to identify the video stream itself AND NOT the file.

This  ID should be calculated on the frames itself, and not on the block headers, but of course only for a certain number of frames ( 10, 20, whatever ). To make sure the trailer at the beginning of the file ( the same may be used for many movies ) is not used for this ID the MD5 of the COMPLETE number of blocks following block number 200 ( means 201, 202, 203, 204, 205, etc. ..210 ) should be calculated and stored in the track header as a unique identifier for this particular video stream. If the file is shorter than 200 blocks ( = frames ) than simply the last 10 blocks can be used.

Of course, it is absolutely clear that when doing so a VALID video stream ( no fake ) will get a new ID number when a user decides to cut away the first 100 frames or so. This doesnt hurt much, the only problem raised was that a new ID number would be created indicating a certain movie, so the central database had to be updated accordingly. If this is not done, for whatever reason, the user who decided to cut the first 100 frames off could find himself in a situation that nobody wants to download his movie, because the IDs dont match with the central database. Thats all, and will hopefull also lead to less variations from the same movie floating around.

On a sidenote :

While it was impossible for any p2p application ( even if the devs love matroska ) to verify if a matroska file ID was faked by the user using any EBML editing tool ( you can only generate/verify this ID if you have access to all the block headers in the file, means you need the whole file ), this was very well possible for the track ID i am suggesting above !! The p2p app could scan the file when hashing it, and if this hash is new ( dont ask me what exactly they are hashing, i guess only filename, size, etc. ) they could decide to read the track ID, load the 10 blocks after block #200 and verify the ID ;-) !!

Any comments on the idea of track ID are well appreciated.



More information about the Matroska-devel mailing list