[Matroska-devel] Representation of payload for SeekHead entries

Moritz Bunkus moritz at bunkus.org
Fri Nov 18 09:37:13 CET 2011


Hey,

On Thu, Nov 17, 2011 at 19:37, Matthew Heaney <matthewjheaney at gmail.com> wrote:

> I had a question about the representation of the payload for SeekHead
> entry items.
>
> The payload for a SeekHead entry has an ID and a position, each
> wrapped in their own little container.  It looks something like this:

Each seekhead is a container (an EBML master), but the SeekID and the
position are simple elements, not themselves containers/masters.
Though I guess you're using the term "container" in a different way
than I would in this context: for me a container in the Matroska/EBML
context is an element that contains more EBML/Matroska elements = an
EBML master.

Each EBML element consists of the holy trinity: ID, content size,
content. ID and content can be read as EBML variable-length unsigned
integers, that is true. How the content is read and interpreted
depends on the ID. That much you probably know.

For the SeekID element the type is a binary, hence you do a dumb
"read(buffer, element_size)" call on the file and slurp in a number of
bytes. If the buffer contains e.g. the bytes "0x1F 43 B6 75" then you
know that this seek head refers to a cluster element.

> The spec says that the SeekPos has type "unsigned int".  Does this
> also mean that this is a normal, 2's complement integer, in network
> byte order?

The specs say ( http://www.matroska.org/technical/specs/index.html ):

> Data

>> Integers are stored in their standard big-endian form (no UTF-like encoding), only the size may differ from their usual form (24 or 40 bits for example).

>> The Signed Integer is just the big-endian representation trimmed from some 0x00 and 0xFF where they are not meaningful (sign). For example -2 can be coded as 0xFFFFFFFFFFFFFE or 0xFFFE or 0xFE and 5 can be coded 0x000000000005 or 0x0005 or 0x05.

It translates into "in big endian notation, omit the leading 0x00
bytes, and store the rest". Upon reading read it as a big endian
integer, e.g.

val = 0
"size" times:
  val = (val << 8) | read_byte

Note that the value you get by reading this is relative to the start
of the current segment.

> Does it have the same representation as the SeekID
> payload value, or some different representation?  How is "uint"
> different from "binary"?

The reason a binary is used is because the storage of an EBML ID is
different than the storate of an unsigned integer (UTF-8 like variable
length encoding vs known length, omit leading 0s). However, in
practice an the storage of both look the same.

Also you cannot read a uint like "read(&value, size)".

Kind regards,
mosu



More information about the Matroska-devel mailing list