[Matroska-devel] Mappings for HEVC/H.265 in Matroska

Jan Ekstrom jeebjp at gmail.com
Thu Sep 12 15:15:00 CEST 2013


It has been a while, but I think it is time to start wrapping up the
first version of HEVC-in-Matroska. I have gotten access to a version
of 14496-15, 3rd edition (Incheon output document W13478, last
version), which has been noted from multiple sources to be what is
going to end up at an FDIS ballot soon. Thus, unless something really
surprising happens, this should be how "HEVC-in-MP4" is going to be
implemented. And given that many people on this list wanted to have
conformity regarding the extradata between 14496-15 and Matroska, now
we should finally be able to gain it.

As a disclaimer, there of course is a possibility that the extradata
format might change during the time before the FDIS ballot, but the
version field in the extradata should generally help us with that
whatever the outcome finally is. Also there have been no major edits
of the related parts since early June, and no new output documents
were created at the Vienna meeting.

For reference, here is the related document:
http://fushizen.eu/random/W13478_1_pt15-3rd_FDIS_clean.pdf

The HEVC/H.265 specification can be found here:
http://www.itu.int/rec/T-REC-H.265-201304-I

Information on the current DivX mapping of HEVC can be found at their
mkvtoolnix github repository:
https://github.com/jaya-divx/mkvtoolnix/blob/master/src/common/hevc.cpp#L204

Comments are very much welcome! Also, one of the things I have not
touched upon is intra refresh, as I have no idea how that in general
would work in Matroska. In general this should be quite similar to how
DivX has done its mapping with the official release of DivX 10, but I
have not checked their output too hard, so I cannot say if the two
mappings match up 1:1. Do also excuse me for not formatting this text
by hand for the mailing list, so certain parts might not look as good
as they should.

Regards,
Jan Ekström


Draft Specification for HEVC/H.265 elementary streams in Matroska:

CodecID: V_MPEGH/ISO/HEVC

General Limitations Regarding SimpleBlock Usage:
(TODO: Similar area for Block usage?)

* Each NAL unit shall be written with the start code stripped, and
prefixed by an unsigned big endian integer that contains the length of
the given NAL unit in bytes. The size of this prefix in bytes shall be
set by the value of the lengthSizeMinusOne field plus 1 of the
currently active CodecPrivate structure (the available sizes are 1, 2
and 4 bytes).

* A SimpleBlock shall contain one or more VCL NAL units that make up a
single picture (be it a field or a frame), as well as possible other
NAL units depending on the limitations imposed upon the stream by the
CodecPrivate structure.

* The 'keyframe' (random access point) flag shall only be set when the
following conditions are satisfied:

- The contained VCL NAL unit(s) signal an IRAP picture as defined in
ISO/IEC 23008-2 and ITU-T Recommendation H.265.

- The contained VCL NAL unit(s) and all the following VCL NAL units in
coding order can be correctly decoded by feeding the decoder the
parameter sets included in the related CodecPrivate structure, and, in
case the CodecPrivate structure permits, all the possible parameter
sets included in the SimpleBlock as well as all the following
SimpleBlocks.

* The 'invisible' flag shall be set if the picture is a non-displayed
reference picture.

Contents of CodecPrivate:

Generally follows the definition of HEVCDecoderConfigurationRecord in
ISO/IEC 14496-15, but until the 3rd edition of 14496-15 gets
officially finalized, the version entry (configurationVersion) shall
be written as 0 (zero), not 1 (one) as 14496-15 defines it.

After the official release of 14496-15, 3rd ed., the extradata format
can be checked and possibly updated, and the configurationVersion can
finally be bumped to to match the 14496-15 specification.

A reader must not attempt attempt to decode this record or the
stream(s) to which it applies if the version number is unrecognised.
Compatible extensions to this record will not change the configuration
version code, and readers should be prepared to ignore unrecognised
data beyond the definition of the data they understand.

As the 14496-15 specification defines the implicit default value for
array_completeness depending on the ID of the stream, this Matroska
mapping shall define the implicit default as 0.

Since this is a bit-based document, all values are left-most bit
first, and thus big endian if they take more than one byte of space.
All fields that contain straight-out copies of HEVC data shall of
course follow ISO/IEC 23008-2 and ITU-T Recommendation H.265.

Since the decoder configuration NAL units have a length variable
available for them in the CodecPrivate syntax, they shall be written
without the start code, and without the prefix used when muxing into
SimpleBlocks.

It is recommended that the NAL unit arrays in the CodecPrivate
structure be in the order VPS, SPS, PPS, SEI.

CodecPrivate Syntax:

(as of document W13478)

// The CodecPrivate syntax shall follow the
// syntax of HEVCDecoderConfigurationRecord
// defined in ISO/IEC 14496-15.
//
// The number zero (0) shall be written to
// the configurationVersion variable until
// official finalization of 14496-15, 3rd ed.
//
// After its finalization, this field and the
// following CodecPrivate structure shall
// follow the definition of the
// HEVCDecoderConfigurationRecord in 14496-15.

unsigned int(8)  configurationVersion;
unsigned int(2)  general_profile_space;
unsigned int(1)  general_tier_flag;
unsigned int(5)  general_profile_idc;
unsigned int(32) general_profile_compatibility_flags;
unsigned int(48) general_constraint_indicator_flags;
unsigned int(8)  general_level_idc;
bit(4) reserved = ‘1111’b;
unsigned int(12) min_spatial_segmentation_idc;
bit(6) reserved = ‘111111’b;
unsigned int(2)  parallelismType;
bit(6) reserved = ‘111111’b;
unsigned int(2)  chromaFormat;
bit(5) reserved = ‘11111’b;
unsigned int(3)  bitDepthLumaMinus8;
bit(5) reserved = ‘11111’b;
unsigned int(3)  bitDepthChromaMinus8;
bit(16) avgFrameRate;
bit(2)  constantFrameRate;
bit(3)  numTemporalLayers;
bit(1)  temporalIdNested;
unsigned int(2) lengthSizeMinusOne;
unsigned int(8) numOfArrays;
for (j=0; j < numOfArrays; j++) {
  bit(1) array_completeness;
  unsigned int(1)  reserved = 0;
  unsigned int(6)  NAL_unit_type;
  unsigned int(16) numNalus;
  for (i=0; i< numNalus; i++) {
    unsigned int(16) nalUnitLength;
    bit(8*nalUnitLength) nalUnit;
  }
}

CodecPrivate Semantics:

The CodecPrivate semantics for HEVC shall generally follow the
semantics of the HEVCDecoderConfigurationRecord, and an implementor
should hold the current version of ISO/IEC 14496-15 as the up-to-date
definition for the semantics of the CodecPrivate structure.

The following are the semantics of the fields as of document W13478:

general_profile_space, general_tier_flag,
general_profile_idc, general_profile_compatibility_flags,
general_constraint_indicator_flags, general_level_idc,
and min_spatial_segmentation_idc:
  These fields contain the matching values for the fields
general_profile_space, general_tier_flag, general_profile_idc,
general_profile_compatibility_flag[ i ] for i from 0 to 31, inclusive,
the 6 bytes starting with the byte containing the
general_progressive_source_flag, general_level_idc, and
min_spatial_segmentation_idc as defined in ISO/IEC 23008-2 and ITU-T
Recommendation H.265, for the stream to which this CodecPrivate record
applies.

parallelismType:
  Indicates the type of parallelism that is used to meet the
restrictions imposed by min_spatial_segmentation_idc when the value of
min_spatial_segmentation_idc is greater than 0.

  Value 1 indicates that the stream to which this CodecPrivate record
applies supports slice based parallel decoding. Value 2 indicates that
the stream to which this CodecPrivate record applies supports tile
based parallel decoding. Value 3 indicates that the stream to which
this CodecPrivate record applies supports entropy coding
synchronization based parallel decoding. Value 0 indicates that the
stream supports mixed types of parallel decoding or that the
parallelism type is unknown.

chromaFormat:
  Contains the chroma_format indicator as defined by the
chroma_format_idc parameter in ISO/IEC 23008-2 and ITU-T
Recommendation H.265, for the stream to which this CodecPrivate record
applies.

bitDepthLumaMinus8:
  Contains the luma bit depth indicator as defined by the
bit_depth_luma_minus8 parameter in ISO/IEC 23008-2 and ITU-T
Recommendation H.265, for the stream to which this CodecPrivate record
applies.

bitDepthChromaMinus8:
  Contains the chroma bit depth indicator as defined by the
bit_depth_chroma_minus8 in ISO/IEC 23008-2 and ITU-T Recommendation
H.265, for the stream to which this CodecPrivate record applies.

avgFrameRate:
  Gives the average frame rate in units of frames/(256 seconds), for
the stream to which this CodecPrivate record applies. Value 0
indicates an unspecified average frame rate.

constantFrameRate:
  Equal to 1 indicates that the stream to which this CodecPrivate
record applies is of constant frame rate. Value 2 indicates that the
representation of each temporal layer in the stream is of constant
frame rate. Value 0 indicates that the stream may or may not be of
constant frame rate.

numTemporalLayers:
  Greater than 1 indicates that the stream to which this CodecPrivate
record applies is temporally scalable and the contained number of
temporal layers (also referred to as temporal sub-layer or sub-layer
in ISO/IEC 23008-2 and ITU-T Recommendation H.265) is equal to
numTemporalLayers. Value 1 indicates that the stream is not temporally
scalable. Value 0 indicates that it is unknown whether the stream is
temporally scalable.

temporalIdNested:
  Equal to 1 indicates that all SPSs that are activated when the
stream to which this CodecPrivate record applies is decoded have
sps_temporal_id_nesting_flag as defined in ISO/IEC 23008-2 and ITU-T
Recommendation H.265 equal to 1 and temporal sub-layer up-switching to
any higher temporal layer can be performed at any sample. Value 0
indicates that at least one of the SPSs that are activated when the
stream to which this CodecPrivate record applies is decoded has
sps_temporal_id_nesting_flag as defined in ISO/IEC 23008-2 and ITU-T
Recommendation H.265 equal to 0.

lengthSizeMinusOne:
  The value of this field plus 1 indicates the length in bytes of the
NALUnitLength field in an HEVC video sample in the stream to which
this CodecPrivate record applies. For example, a size of one byte is
indicated with a value of 0. The value of this field shall be one of
0, 1, or 3 corresponding to a length encoded with 1, 2, or 4 bytes,
respectively.

numArrays:
  Indicates the number of arrays of NAL units of the indicated type(s).

array_completeness:
  When equal to 1 indicates that all NAL units of the given type are
in the following array and none are in the stream; when equal to 0
indicates that additional NAL units of the indicated type may be in
the stream; With Matroska the implicit default for this flag is 0.

NAL_unit_type:
  indicates the type of the NAL units in the following array (which
must be all of that type); it takes a value as defined in ISO/IEC
23008-2 and ITU-T Recommendation H.265; it is restricted to take one
of the values indicating a VPS, SPS, PPS, or SEI NAL unit;

numNalus:
  Indicates the number of NAL units of the indicated type included in
the CodecPrivate record for the stream to which this CodecPrivate
record applies. The SEI array must only contain SEI messages of a
‘declarative’ nature, that is, those that provide information about
the stream as a whole. An example of such an SEI could be a user-data
SEI.

nalUnitLength:
  Indicates the length in bytes of the NAL unit.

nalUnit:
  Contains an SPS, PPS, VPS or declarative SEI NAL unit, as specified
in ISO/IEC 23008-2 and ITU-T Recommendation H.265.


More information about the Matroska-devel mailing list