[Matroska-devel] Re: Encryption

Joseph Ashwood ashwood at msn.com
Tue Dec 20 01:55:56 CET 2005

I guess I should point out first that I am dropping the nym ivanburnin, at 
this point the design will be easily identifiable by anyone that has seen my 
previous work, and all questionable stuff is gone from the plan.

"Steve Lhomme" <steve.lhomme at free.fr> wrote in message 
news:43A5A3B7.8090803 at free.fr...
> ivanburnin at hushmail.com wrote:
>> I've got some spare cycles now so I'm looking into the encryption.
>> I've got a few questions though. A link was poste before without
>> much information, is there a document with deeper information? Has
>> anyone ever implemented encryption for Matroska files? (if so
>> everything should be compatible) And lastly, would anyone mind if I
>> changed things radically?
> Jory has a DirectShow encryption filter, I'm not sure if it uses the
> built-in Matroska one.

I'll take a look at it, because honestly I couldn't find enough 
specification to actually make a final product, so I was asking for any 
complete implementations to see if anyone had actually done it yet.

>> At this point I'm considering something along the lines of an
>> encryption header that contains
>> {integer identifier, UTF-8 String name, blob of bits to pass to the
>> initializer} this would occur once per encrypter per file and only
>> for the encryptor(s) that are used in that file, the name would be
>> the name of the method (used for universal identification),
>> identifier would be an in file identification number to allow for
>> multiple encryptors per file.
> I don't see a radical change with what we have now.

There shouldn't be any radical change there, it was just to establish a 
dynamic naming system so that encryption/DRM methods don't have to be 
registered, and the core specification can safely ignore them.

>> Then each encrypted segment (for some paring specific definition of
>> segment that does not necessarily have anything to do with segment
>> in the video sense) is composed of
>> {integer identifier, blob of bits to pass to the decryptor}
> Where would you store this ? With each frame/block ? Or in the
> encryption private data of the track ?

I was thinking that this should actually be a wrapper around a block. Since 
it will only add a few bytes the overhead is minimal, and it will prevent a 
lot of mistakes that are very easy and very subtle. In fact I had hoped to 
make this a generic wrapper, so that each level and each track can be 
encrypted seperately if desired. The reason for this is that in some 
instances the various taggings/tracks/etc will leak information, and the 
encryption should be able to prevent this.

>> I would also propose that 0x00000000 be defined as the null
>> encryptor, allowing it to be used for the sake of sanity when
>> writing certain processors. It is also important to note that the
>> identifier for a given encryptor/decryptor can change across files
>> and that the per file header is the only dependable source for this
>> information. A smaller sized integer would be acceptable, I simply
>> chose 32-bits because it is commonly and quickly available.
> That's definitely something to store in one of the binary fields of
> ContentEncryption.

The binary field identifier (I see no reason it needs to be an int actually) 
and the blob of bits are the only necessary field for the encryption, by 
dropping all others it'll save space in the worst case.

>> This moves an enormous quantity of the cryptography decision out of
>> the core Matroska document, creating small supplimental documents
>> for each type. I would of course be writing a number of these in
>> order to build a baseline that people could depend on.
> Yes, that's how codec works (track or chapters). The same can be done
> for encryption with many different ones just defined by their
> ContentEncAlgo.

I was hoping to avoid this. By forcing all encryptors to register it'll only 
slow down the abilities of encryption designers, this is exactly what the 
dynamic naming was designed to get rid of. Instead of having what amounts to 
an authorized list the list would self-create and self-manage, and the 
Matroska group could generally completely ignore their presence.

>> This format allows me to work very quickly, for other competing
>> designers to work quickly, and security. There are a lot of
>> implementation details that I haven't covered, for example there
>> should be a version number embedded in there, and a definition for
>> the signature that is mentioned briefly in the current spec, I
>> haven't addressed either of these. Actually binary format is
>> relatively unimportant from a security standpoint, and can be
>> addressed by someone more familiar with the inner workings than I
>> am.
> We can add an encryption page the same way there is a codec page and a
> chapter codec page.

Sounds good. If the design seems reasonable what I'd like to see is the 
addition of an EncryptionInformation or EncryptionHeader (we could also call 
it TranfsormInformation or TransformHeader to allow for compressors as well) 
which would represent it's own grouping section (looking at the diagram 
page) alongside Meta Seek Information, Segment Information, Track, etc. This 
would contain the dynamic naming that I began with. The creating a 
Transformed/Encrypted chunk tag, would force processing of the encryption 
before containing information would be read.

So I guess the proposed format would be (following the format layout in the 
spec doc given for SimpleBlock):

0x00-0x1F Indentifier for Handler
0x20-0x3F contained Type identifier
0x40-?       Blob of bits to be handled by Handler

Contained type is a convenience data, I can see no information that it would 
leak. Post-transform data will be parsable exactly as the original data 
(e.g. bit-for-bit identical)*, (exclusive) or the Handler will return an 
error code. In the case of nested encryptions Type Identifier contains the 
Type Identifier of the encryption transform. It is the Handler's 
responsability to notify the user of the nature of the error if any 
notification is necessary.

For the Encryption Header:

0x00-0x1F Identifier
0x20-?       Dynamically sized UTF-8 string Handler Indentifier
?-??           blob of bits passed to identifier initializer

I'm open to changing the format to one using tags for each field, which 
would probably be preferrable, but as I've said before format doesn't really 
matter that much to security in this. Handler Identifier is case sensitive, 
except for the Null Encryptor discussed later.

Initializer only needs to return success or failure. It is the Initializers 
responsability to notify the user of the nature of the error if any 
notification is to take place.

Duplicate hander identifier strings MUST be allowed. Each track may have 
it's own identifier, for finer grained access control, even if they use the 
same handler.

In the case of an Initializer error playback should be attempted with any 
available information, specifically if a track fails to be decodable other 
tracks should be used.

0x00000000 is defined as the null encryption transform, the specific 
transform from the blob of bits is:

output[k] = blob[k]

for all valid k. Where blob is the blob of bits from the EncryptedData, and 
output is the output to be generated. Any attempts to use a duplicate 
identifier (e.g. assigning 0x00000000 to anything other than "NullTransform" 
or "NullEncryption" case insensitive) is a parsing error, recovery is up to 
the application, a suggested process is to attempt decryption of a portion 
encrypted with the identifier and check for errors, first one to not return 
an error is used (note: this will not work in all cases, it is possible that 
a given transform will only decrypt something once, and as such the test 
should be stored decrypted in memory until it is used).

I know a lot of this information goes beyond the formatting that is typical 
for Matroska, and in fact much of it can be moved to seperate locations, but 
there are a huge number of considerations.

Any comments are

* A short note on why this is written so convolutedly. I was going to say 
"bit-for-bit" identical to the original, but then I realized that with 
encryption algorithms like Chameleon available, this will not necessarily be 
the case, and the only assurance is that it MUST be parsable exactly as if 
it was the data before encryption. 

More information about the Matroska-devel mailing list