[Matroska-devel] Representation of payload for SeekHead entries

Steve Lhomme slhomme at matroska.org
Sat Nov 19 14:17:15 CET 2011


OK, here I am. I haven't read the whole thread but I think I
understand where the confusion is about the Seek ID.

It's in binary, so the content format has nothing to do with how you
interpret an integer, even though by some stretching you know it
represents an integer.

But in Matroska and EBML in general, Element IDs are not exactly
integers. They are 1, 2, 3 or 4 bytes. Unlike integers you can't use a
"4 bytes" to represent a "2 bytes" ID. I think most parsers would not
be able to cope with it.

So in the end the ID is stored in the exact same binary form it
appears in the file/stream. That makes it a lot easier to compare.

On Fri, Nov 18, 2011 at 9:58 PM, Matthew Heaney
<matthewjheaney at gmail.com> wrote:
> On Fri, Nov 18, 2011 at 3:39 PM, Moritz Bunkus <moritz at bunkus.org> wrote:
>>
>> The only thing that still comes to mind is that reading an element ID
>> and an element size is not the same, nor is the result the same from
>> the application's perspective (the "application" being the layer above
>> EBML, in this case Matroska). If an EBML parser reads a single byte
>> "0x81" as an element ID then it has to pass "0x81" to the layer above.
>> If it reads that same single byte "0x81" as the element's size then it
>> only passes "0x01" to the layer above.
>
> Well in my case, it passes 0x01 to the layer above.
>
>
>> a) The byte sequence "0x40 01" represents a different EBML ID than the
>> byte sequence "0x81" does or
>
>> b) An EBML parser has to normalize element IDs to their shortest
>> possible representation before passing it upstream in which case "0x40
>> 01" and "0x81" would be the same ID.
>
> This (b) is my assumption.  (The WebM parser passes 0x01 upstream.)
>
>
>> If the WebM parser already normalizes upon reading then I'd say just
>> leave it like it is. Accept as much weird cases as possible but only
>> write the byte sequences explicitly listed in the specs.
>
> Agreed.
>
>
>>> Can the value for a Cluster ID to be represented in the stream using
>>> more than 4 bytes?  Forget about what the Matroska spec says.  Is it
>>> valid, for example, for a Cluster ID to be represented as 0x01 00 00
>>> 00 0F 34 B6 75", if the EBML header says that element IDs are 8 bytes
>>> or less?
>>
>> Valid to what? Either I should forget about the specs in which case I
>> don't have any basis to decide whether or not something is valid or I
>> can say it is valid (or not) according to the specs ;) Just
>> nitpicking.
>
> There is no such thing as "according to the specs".  Specs don't exist
> in some Platonic realm: they are written and interpreted by humans,
> and so there can be ambiguity in their meaning and interpretation.
>
> My argument (perhaps incorrect) is that the values listed in the spec
> itself are non-normalized, and that in an actual file, an ID having
> any representation consistent with the max length value in the EBML
> header is valid.  IMHO it would be dangerous for a parser to make any
> other assumption, but that's just me.  8^)
>
> Thanks for the info.
>
> Regards,
> Matt
>
> <mailto:matthewjheaney at google.com>
> _______________________________________________
> Matroska-devel mailing list
> Matroska-devel at lists.matroska.org
> http://lists.matroska.org/cgi-bin/mailman/listinfo/matroska-devel
> Read Matroska-Devel on GMane: http://dir.gmane.org/gmane.comp.multimedia.matroska.devel
>



-- 
Steve Lhomme
Matroska association Chairman



More information about the Matroska-devel mailing list