[Matroska-devel] Re: Dirac Video Codec

Steve Lhomme steve.lhomme at free.fr
Tue Oct 5 13:14:31 CEST 2004

Arioch /BDV/ a écrit :

>  PB> Matroska.  Besides localizing errors, Matroska also has the possibility
>  PB> of adding CRC32 data at different levels. 
> Alas this is almost undocumented in EBML docs :-((((

EBML docs ? Where ? ;)
Actually the CRC-32 is a feature of EBML, not just Matroska.


It may not be crystal clear, but it's there. To understand EBML you also 
need to understand what global elements are, like Void and CRC-32.

>  PB>  I have only seen it done at the Cluster level, but this lets you know 
>  PB> if a Cluster has been damaged, even if all of the data appears to be 
>  PB> valid. 
> Just a crazy idea:
> I'm afraid that calculating MD5 or CRC (or maybe some stronger signature, like SHA) would take a lot o CPU.
> I wonder, if there was some kind of spots, say signed  would be some small `sub-stream` made of every 16th byte of Cluster.

To "sign" only some parts of the element ? Well that could be. Even 
though right now there wouldn't be much use for that. Especially because 
it can be done by a global element.

Parts of a Matroska file you'd like to protect are probably the Segment 
Info and Track Entry. And you would probably add error recovery for the 
whole thing, not just a part of it.

>  PB>  And while it has not been implemented yet, there is also the 
>  PB> possibility of adding error recovery data in the same way to be able 
>  PB> to actually recover damaged data.
> I think such a 'brackets', are to be more standardised in EBML.

That's an interresting idea.

That could be a special EBML element like Void and CRC-32 that could be 
found anywhere in an EBML stream (global element). And that would also 
have a special type that would be almost like "sub-elements" but it 
would not change the level of the lower element. There would be a real 
level and a virtual level (the actual level in the file). This way you 
can put it anywhere and any EBML parser would give you the correct level 
for your data, even though physically they are one level below.

That could be added to libebml without too much problems.

> for exampel here is one more possible application:
> Let's imagine EBML-based archiver format, instead of current 7z, zip, rar.

Actually I have thought about an EBML system that would be similar to 
.tar but called .kar. It would simply be the Matroska attachment system, 
maybe with some permissions added (could be backported to Matroska). But 
that format doesn't include compression.

> Then we can imagine passworded encryption of some of the data.
> And it will give as flexibility (may be too big indeed) as, for example:
>   ....[CRC slot  (file header - name, date, size etc) [password encryption (file data) ] ]...
> or ...[CRC slot [password encryption slot  (header)(data) ] ]....
> or even ...[password1 encryption (header)(data).. [password2 (header)(data)] (h)(d) ]....
> 1st and 2nd line gives You 2 modes of password protected archives, i met in different archivers.
> Some programs can show file info but will not de-compress until password is given.
> Other even does not allwo You to list the password-protected files.

That's up to the application, not the format.

> 3rd line is some crazy method, that might be used by some installers, but would be too complex for usual user.
> It is more of theory than of real-life practical example.

Theory is usually the starting point of new things :)

> I just thought that good EBML library, would have some callbacks or some gates to master application, so application would provide decoder (here - password-using decrypter) and thus will generate derived EBML sub-stream.
> Of course, nothing stops me from doing RAM streams by the means of application itself, and then give this sub-stream to another instance of EBML parser - but that is not so cute :-) 
> And that will not use any context, gathered why parsing main EBML stream. For example if some tag (class id in terms of EBML?) will have different meaning when put inside different outer classes-containers.

Yes, we actually need something like that in Matroska too. I would 
really like to have a good DRM solution in Matroska. And that implies 
encrypting some parts of the stream. And since we don't want to define 
and cover all encryption system, that would be up to a 3rd party code to 
decrypt the content, with the informations given in the file...

That's somehow what is done in the (unused yet) Signature system you can 
see in the EBML specs.

> Sorry for my poor English, that just soem thoughts into the air, so i do not know if they worth reading such a misty attempt to explain :-)

No worries, it's interresting and understandable.

More information about the Matroska-devel mailing list