(per RFC6190) Media Type name: video Media subtype name: H264-SVC Required parameters: none OPTIONAL parameters: In the following definitions of parameters, "the stream" or "the NAL unit stream" refers to all NAL units conveyed in the current RTP session in SST, and all NAL units conveyed in the current RTP session and all NAL units conveyed in other RTP sessions that the current RTP session depends on in MST. profile-level-id: A base16 [RFC4648] (hexadecimal) representation of the following three bytes in the sequence parameter set or subset sequence parameter set NAL unit specified in [H.264]: 1) profile_idc; 2) a byte herein referred to as profile-iop, composed of the values of constraint_set0_flag, constraint_set1_flag, constraint_set2_flag, constraint_set3_flag, constraint_set4_flag, constraint_set5_flag, and reserved_zero_2bits, in bit- significance order, starting from the most-significant bit, and 3) level_idc. Note that reserved_zero_2bits is required to be equal to 0 in [H.264], but other values for it may be specified in the future by ITU-T or ISO/IEC. The profile-level-id parameter indicates the default sub- profile, i.e., the subset of coding tools that may have been used to generate the stream or that the receiver supports, and the default level of the stream or the one that the receiver supports. The default sub-profile is indicated collectively by the profile_idc byte and some fields in the profile-iop byte. Depending on the values of the fields in the profile-iop byte, the default sub-profile may be the same set of coding tools supported by one profile, or a common subset of coding tools of multiple profiles, as specified in Subsection G.7.4.2.1.1 of [H.264]. The default level is indicated by the level_idc byte, and, when profile_idc is equal to 66, 77, or 88 (the Baseline, Main, or Extended profile) and level_idc is equal to 11, additionally by bit 4 (constraint_set3_flag) of the profile-iop byte. When profile_idc is equal to 66, 77, or 88 (the Baseline, Main, or Extended profile) and level_idc is equal to 11, and bit 4 (constraint_set3_flag) of the profile-iop byte is equal to 1, the default level is Level 1b. Table 13 lists all profiles defined in Annexes A and G of [H.264] and, for each of the profiles, the possible combinations of profile_idc and profile-iop that represent the same sub-profile. Table 13. Combinations of profile_idc and profile-iop representing the same sub-profile corresponding to the full set of coding tools supported by one profile. In the following, x may be either 0 or 1, while the profile names are indicated as follows. CB: Constrained Baseline profile, B: Baseline profile, M: Main profile, E: Extended profile, H: High profile, H10: High 10 profile, H42: High 4:2:2 profile, H44: High 4:4:4 Predictive profile, H10I: High 10 Intra profile, H42I: High 4:2:2 Intra profile, H44I: High 4:4:4 Intra profile, C44I: CAVLC 4:4:4 Intra profile, SB: Scalable Baseline profile, SH: Scalable High profile, and SHI: Scalable High Intra profile. Profile profile_idc profile-iop (hexadecimal) (binary) CB 42 (B) x1xx0000 same as: 4D (M) 1xxx0000 same as: 58 (E) 11xx0000 B 42 (B) x0xx0000 same as: 58 (E) 10xx0000 M 4D (M) 0x0x0000 E 58 00xx0000 H 64 00000000 H10 6E 00000000 H42 7A 00000000 H44 F4 00000000 H10I 6E 00010000 H42I 7A 00010000 H44I F4 00010000 C44I 2C 00010000 SB 53 x0000000 SH 56 0x000000 SHI 56 0x010000 For example, in the table above, profile_idc equal to 58 (Extended) with profile-iop equal to 11xx0000 indicates the same sub-profile corresponding to profile_idc equal to 42 (Baseline) with profile-iop equal to x1xx0000. Note that other combinations of profile_idc and profile-iop (not listed in Table 13) may represent a sub-profile equivalent to the common subset of coding tools for more than one profile. Note also that a decoder conforming to a certain profile may be able to decode bitstreams conforming to other profiles. If profile-level-id is used to indicate stream properties, it indicates that, to decode the stream, the minimum subset of coding tools a decoder has to support is the default sub- profile, and the lowest level the decoder has to support is the default level. If the profile-level-id parameter is used for capability exchange or session setup, it indicates the subset of coding tools, which is equal to the default sub-profile, that the codec supports for both receiving and sending. If max-recv- level is not present, the default level from profile-level-id indicates the highest level the codec wishes to support. If max-recv-level is present, it indicates the highest level the codec supports for receiving. For either receiving or sending, all levels that are lower than the highest level supported MUST also be supported. Informative note: Capability exchange and session setup procedures should provide means to list the capabilities for each supported sub-profile separately. For example, the one-of-N codec selection procedure of the SDP Offer/Answer model can be used (Section 10.2 of [RFC3264]). The one-of-N codec selection procedure may also be used to provide different combinations of profile_idc and profile-iop that represent the same sub-profile. When there are many different combinations of profile_idc and profile-iop that represent the same sub-profile, using the one-of-N codec selection procedure may result in a fairly large SDP message. Therefore, a receiver should understand the different equivalent combinations of profile_idc and profile-iop that represent the same sub-profile, and be ready to accept an offer using any of the equivalent combinations. If no profile-level-id is present, the Baseline Profile without additional constraints at Level 1 MUST be implied. max-recv-level: This parameter MAY be used to indicate the highest level a receiver supports when the highest level is higher than the default level (the level indicated by profile-level-id). The value of max-recv-level is a base16 (hexadecimal) representation of the two bytes after the syntax element profile_idc in the sequence parameter set NAL unit specified in [H.264]: profile-iop (as defined above) and level_idc. If (the level_idc byte of max-recv-level is equal to 11 and bit 4 of the profile-iop byte of max-recv-level is equal to 1) or (the level_idc byte of max-recv-level is equal to 9 and bit 4 of the profile-iop byte of max-recv-level is equal to 0), the highest level the receiver supports is Level 1b. Otherwise, the highest level the receiver supports is equal to the level_idc byte of max-recv-level divided by 10. max-recv-level MUST NOT be present if the highest level the receiver supports is not higher than the default level. max-recv-base-level: This parameter MAY be used to indicate the highest level a receiver supports for the base layer when negotiating an SVC stream. The value of max-recv-base-level is a base16 (hexadecimal) representation of the two bytes after the syntax element profile_idc in the sequence parameter set NAL unit specified in [H.264]: profile-iop (as defined above) and level_idc. If (the level_idc byte of max-recv-level is equal to 11 and bit 4 of the profile-iop byte of max-recv-level is equal to 1) or (the level_idc byte of max-recv-level is equal to 9 and bit 4 of the profile-iop byte of max-recv-level is equal to 0), the highest level the receiver supports for the base layer is Level 1b. Otherwise, the highest level the receiver supports for the base layer is equal to the level_idc byte of max-recv-level divided by 10. max-mbps, max-fs, max-cpb, max-dpb, and max-br: The common properties of these parameters are specified in [RFC6184]. max-mbps: This parameter is as specified in [RFC6184]. max-fs: This parameter is as specified in [RFC6184]. max-cpb: The value of max-cpb is an integer indicating the maximum coded picture buffer size in units of 1000 bits for the VCL HRD parameters and in units of 1200 bits for the NAL HRD parameters. Note that this parameter does not use units of cpbBrVclFactor and cpbBrNALFactor (see Table A-1 of [H.264]). The max-cpb parameter signals that the receiver has more memory than the minimum amount of coded picture buffer memory required by the signaled highest level conveyed in the value of the profile-level-id parameter or the max-recv-level parameter. When max-cpb is signaled, the receiver MUST be able to decode NAL unit streams that conform to the signaled highest level, with the exception that the MaxCPB value in Table A-1 of [H.264] for the signaled highest level is replaced with the value of max-cpb (after taking cpbBrVclFactor and cpbBrNALFactor into consideration when needed). The value of max-cpb (after taking cpbBrVclFactor and cpbBrNALFactor into consideration when needed) MUST be greater than or equal to the value of MaxCPB given in Table A-1 of [H.264] for the highest level. Senders MAY use this knowledge to construct coded video streams with greater variation of bitrate than can be achieved with the MaxCPB value in Table A-1 of [H.264]. Informative note: The coded picture buffer is used in the Hypothetical Reference Decoder (HRD, Annex C) of [H.264]. The use of the HRD is recommended in SVC encoders to verify that the produced bitstream conforms to the standard and to control the output bitrate. Thus, the coded picture buffer is conceptually independent of any other potential buffers in the receiver, including de-interleaving, re-multiplexing, and de-jitter buffers. The coded picture buffer need not be implemented in decoders as specified in Annex C of [H.264]; standard-compliant decoders can have any buffering arrangements provided that they can decode standard- compliant bitstreams. Thus, in practice, the input buffer for video decoder can be integrated with the de- interleaving, re-multiplexing, and de-jitter buffers of the receiver. max-dpb: This parameter is as specified in [RFC6184]. max-br: The value of max-br is an integer indicating the maximum video bitrate in units of 1000 bits per second for the VCL HRD parameters and in units of 1200 bits per second for the NAL HRD parameters. Note that this parameter does not use units of cpbBrVclFactor and cpbBrNALFactor (see Table A-1 of [H.264]). The max-br parameter signals that the video decoder of the receiver is capable of decoding video at a higher bitrate than is required by the signaled highest level conveyed in the value of the profile-level-id parameter or the max-recv-level parameter. When max-br is signaled, the video codec of the receiver MUST be able to decode NAL unit streams that conform to the signaled highest level, with the following exceptions in the limits specified by the highest level: o The value of max-br (after taking cpbBrVclFactor and cpbBrNALFactor into consideration when needed) replaces the MaxBR value in Table A-1 of [H.264] for the highest level. o When the max-cpb parameter is not present, the result of the following formula replaces the value of MaxCPB in Table A-1 of [H.264]: (MaxCPB of the signaled level) * max-br / (MaxBR of the signaled highest level). For example, if a receiver signals capability for Main profile Level 1.2 with max-br equal to 1550, this indicates a maximum video bitrate of 1550 kbits/sec for VCL HRD parameters, a maximum video bitrate of 1860 kbits/sec for NAL HRD parameters, and a CPB size of 4036458 bits (1550000 / 384000 * 1000 * 1000). The value of max-br (after taking cpbBrVclFactor and cpbBrNALFactor into consideration when needed) MUST be greater than or equal to the value MaxBR given in Table A-1 of [H.264] for the signaled highest level. Senders MAY use this knowledge to send higher-bitrate video as allowed in the level definition of SVC, to achieve improved video quality. Informative note: This parameter was added primarily to complement a similar codepoint in the ITU-T Recommendation H.245, so as to facilitate signaling gateway designs. No assumption can be made from the value of this parameter that the network is capable of handling such bitrates at any given time. In particular, no conclusion can be drawn that the signaled bitrate is possible under congestion control constraints. redundant-pic-cap: This parameter is as specified in [RFC6184]. sprop-parameter-sets: This parameter MAY be used to convey any sequence parameter set, subset sequence parameter set, and picture parameter set NAL units (herein referred to as the initial parameter set NAL units) that can be placed in the NAL unit stream to precede any other NAL units in decoding order and that are associated with the default level of profile-level-id. The parameter MUST NOT be used to indicate codec capability in any capability exchange procedure. The value of the parameter is a comma (',') separated list of base64 [RFC4648] representations of the parameter set NAL units as specified in Sections 7.3.2.1, 7.3.2.2, and G.7.3.2.1 of [H.264]. Note that the number of bytes in a parameter set NAL unit is typically less than 10, but a picture parameter set NAL unit can contain several hundreds of bytes. Informative note: When several payload types are offered in the SDP Offer/Answer model, each with its own sprop- parameter-sets parameter, then the receiver cannot assume that those parameter sets do not use conflicting storage locations (i.e., identical values of parameter set identifiers). Therefore, a receiver should buffer all sprop-parameter-sets and make them available to the decoder instance that decodes a certain payload type. sprop-level-parameter-sets: This parameter MAY be used to convey any sequence, subset sequence, and picture parameter set NAL units (herein referred to as the initial parameter set NAL units) that can be placed in the NAL unit stream to precede any other NAL units in decoding order and that are associated with one or more levels different than the default level of profile-level-id. The parameter MUST NOT be used to indicate codec capability in any capability exchange procedure. The sprop-level-parameter-sets parameter contains parameter sets for one or more levels that are different than the default level. All parameter sets targeted for use when one level of the default sub-profile is accepted by a receiver are clustered and prefixed with a three-byte field that has the same syntax as profile-level-id. This enables the receiver to install the parameter sets for the accepted level and discard the rest. The three-byte field is named PLId, and all parameter sets associated with one level are named PSL, which has the same syntax as sprop-parameter-sets. Parameter sets for each level are represented in the form of PLId:PSL, i.e., PLId followed by a colon (':') and the base64 [RFC4648] representation of the initial parameter set NAL units for the level. Each pair of PLId:PSL is also separated by a colon. Note that a PSL can contain multiple parameter sets for that level, separated with commas (','). The subset of coding tools indicated by each PLId field MUST be equal to the default sub-profile, and the level indicated by each PLId field MUST be different than the default level. Informative note: This parameter allows for efficient level downgrade or upgrade in SDP Offer/Answer and out-of-band transport of parameter sets, simultaneously. in-band-parameter-sets: This parameter MAY be used to indicate a receiver capability. The value MAY be equal to either 0 or 1. The value 1 indicates that the receiver discards out-of-band parameter sets in sprop- parameter-sets and sprop-level-parameter-sets, therefore the sender MUST transmit all parameter sets in-band. The value 0 indicates that the receiver utilizes out-of-band parameter sets included in sprop-parameter-sets and/or sprop-level-parameter- sets. However, in this case, the sender MAY still choose to send parameter sets in-band. When the parameter is not present, this receiver capability is not specified, and therefore the sender MAY send out-of-band parameter sets only, or it MAY send in-band-parameter-sets only, or it MAY send both. packetization-mode: This parameter is as specified in [RFC6184]. When the mst-mode parameter is present, the value of this parameter is additionally constrained as follows. If mst-mode is equal to "NI-T", "NI-C", or "NI-TC", packetization-mode MUST NOT be equal to 2. Otherwise, (mst-mode is equal to "I-C"), packetization-mode MUST be equal to 2. sprop-interleaving-depth: This parameter is as specified in [RFC6184]. sprop-deint-buf-req: This parameter is as specified in [RFC6184]. deint-buf-cap: This parameter is as specified in [RFC6184]. sprop-init-buf-time: This parameter is as specified in [RFC6184]. sprop-max-don-diff: This parameter is as specified in [RFC6184]. max-rcmd-nalu-size: This parameter is as specified in [RFC6184]. mst-mode: This parameter MAY be used to signal the properties of a NAL unit stream or the capabilities of a receiver implementation. If this parameter is present, multi-session transmission MUST be used. Otherwise (this parameter is not present), single- session transmission MUST be used. When this parameter is present, the following applies. When the value of mst-mode is equal to "NI-T", the NI-T mode MUST be used. When the value of mst-mode is equal to "NI-C", the NI-C mode MUST be used. When the value of mst-mode is equal to "NI-TC", the NI-TC mode MUST be used. When the value of mst-mode is equal to "I-C", the I-C mode MUST be used. The value of mst-mode MUST have one of the following tokens: "NI-T", "NI-C", "NI-TC", or "I-C". All RTP sessions in an MST MUST have the same value of mst- mode. sprop-mst-csdon-always-present: This parameter MUST NOT be present when mst-mode is not present or the value of mst-mode is equal to "NI-T" or "I-C". This parameter signals the properties of the NAL unit stream. When sprop-mst-csdon-always-present is present and the value is equal to 1, packetization-mode MUST be equal to 1, and all the RTP packets carrying the NAL unit stream MUST be STAP-A packets containing a PACSI NAL unit that further contains the DONC field or NI-MTAP packets with the J field equal to 1. When sprop-mst-csdon-always-present is present and the value is equal to 1, the CS-DON value of any particular NAL unit can be derived solely according to information in the packet containing the NAL unit. When sprop-mst-csdon-always-present is present in the current RTP session, it MUST be present also in all the RTP sessions the current RTP session depends on and the value of sprop-mst- csdon-always-present is identical for the current RTP session and all the RTP sessions on which the current RTP session depends. sprop-mst-remux-buf-size: This parameter MUST NOT be present when mst-mode is not present or the value of mst-mode is equal to "NI-T". This parameter MUST be present when mst-mode is present and the value of mst- mode is equal to "NI-C", "NI-TC", or "I-C". This parameter signals the properties of the NAL unit stream. It MUST be set to a value one less than the minimum re- multiplexing buffer size (in NAL units), so that it is guaranteed that receivers can reconstruct NAL unit decoding order as specified in Subsection 6.2.2. The value of sprop-mst-remux-buf-size MUST be an integer in the range of 0 to 32767, inclusive. sprop-remux-buf-req: This parameter MUST NOT be present when mst-mode is not present or the value of mst-mode is equal to "NI-T". It MUST be present when mst-mode is present and the value of mst-mode is equal to "NI-C", "NI-TC", or "I-C". sprop-remux-buf-req signals the required size of the re- multiplexing buffer for the NAL unit stream. It is guaranteed that receivers can recover the decoding order of the received NAL units from the current RTP session and the RTP sessions the current RTP session depends on as specified in Section 6.2.2, when the re-multiplexing buffer size is of at least the value of sprop-remux-buf-req in units of bytes. The value of sprop-remux-buf-req MUST be an integer in the range of 0 to 4294967295, inclusive. remux-buf-cap: This parameter MUST NOT be present when mst-mode is not present or the value of mst-mode is equal to "NI-T". This parameter MAY be used to signal the capabilities of a receiver implementation and indicates the amount of re-multiplexing buffer space in units of bytes that the receiver has available for recovering the NAL unit decoding order as specified in Section 6.2.2. A receiver is able to handle any NAL unit stream for which the value of the sprop-remux-buf-req parameter is smaller than or equal to this parameter. If the parameter is not present, then a value of 0 MUST be used for remux-buf-cap. The value of remux-buf-cap MUST be an integer in the range of 0 to 4294967295, inclusive. sprop-remux-init-buf-time: This parameter MAY be used to signal the properties of the NAL unit stream. The parameter MUST NOT be present if mst-mode is not present or the value of mst-mode is equal to "NI-T". The parameter signals the initial buffering time that a receiver MUST wait before starting to recover the NAL unit decoding order as specified in Section 6.2.2 of this memo. The parameter is coded as a non-negative base10 integer representation in clock ticks of a 90-kHz clock. If the parameter is not present, then no initial buffering time value is defined. Otherwise, the value of sprop-remux-init-buf-time MUST be an integer in the range of 0 to 4294967295, inclusive. sprop-mst-max-don-diff: This parameter MAY be used to signal the properties of the NAL unit stream. It MUST NOT be used to signal transmitter or receiver or codec capabilities. The parameter MUST NOT be present if mst-mode is not present or the value of mst-mode is equal to "NI-T". sprop-mst-max-don-diff is an integer in the range of 0 to 32767, inclusive. If sprop-mst-max-don-diff is not present, the value of the parameter is unspecified. sprop- mst-max-don-diff is calculated same as sprop-max-don-diff as specified in [RFC6184], with decoding order number being replaced by cross-session decoding order number. sprop-scalability-info: This parameter MAY be used to convey the NAL unit containing the scalability information SEI message as specified in Annex G of [H.264]. This parameter MAY be used to signal the contained layers of an SVC bitstream. The parameter MUST NOT be used to indicate codec capability in any capability exchange procedure. The value of the parameter is the base64 [RFC4648] representation of the NAL unit containing the scalability information SEI message. If present, the NAL unit MUST contain only one SEI message that is a scalability information SEI message. This parameter MAY be used in an offering or declarative SDP message to indicate what layers (operation points) can be provided. A receiver MAY indicate its choice of one layer using the optional media type parameter scalable-layer-id. scalable-layer-id: This parameter MAY be used to signal a receiver's choice of the offers or declared operation points or layers using sprop- scalability-info or sprop-operation-point-info. The value of scalable-layer-id is a base16 representation of the layer_id[ i ] syntax element in the scalability information SEI message as specified in Annex G of [H.264] or layer-ID contained in sprop- operation-point-info. sprop-operation-point-info: This parameter MAY be used to describe the operation points of an RTP session. The value of this parameter consists of a comma-separated list of operation-point-description vectors. The values given by the operation-point-description vectors are the same as, or are derived from, the values that would be given for a scalable layer in the scalability information SEI message as specified in Annex G of [H.264], where the term scalable layer in the scalability information SEI message refers to all NAL units associated with the same values of temporal_id, dependency_id, and quality_id. In this memo, such a set of NAL units is called an operation point. Each operation-point-description vector has ten elements, provided as a comma-separated list of values as defined below. The first value of the operation-point-description vector is preceded by a '<', and the last value of the operation-point- description vector is followed by a '>'. If the sprop- operation-point-info is followed by exactly one operation- point-description vector, this describes the highest operation point contained in the RTP session. If there are two or more operation-point-description vectors, the first describes the lowest and the last describes the highest operation point contained in the RTP session. The values given by the operation-point-description vector are as follows, in the order listed: - layer-ID: This value specifies the layer identifier of the operation point, which is identical to the layer_id that would be indicated (for the same values of dependency_id, quality_id, and temporal_id) in the scalability information SEI message. This field MAY be empty, indicating that the value is unspecified. When there are multiple operation- point-description vectors with layer-ID, the values of layer-ID do not need to be consecutive. - temporal-ID: This value specifies the temporal_id of the operation point. This field MUST NOT be empty. - dependency-ID: This values specifies the dependency_id of the operation point. This field MUST NOT be empty. - quality-ID: This values specifies the quality_id of the operation point. This field MUST NOT be empty. - profile-level-ID: This value specifies the profile-level-idc of the operation point in the base16 format. The default sub-profile or default level indicated by the parameter profile-level-ID in the sprop-operation-point-info vector SHALL be equal to or lower than the default sub-profile or default level indicated by profile-level-id, which may be either present or the default value is taken. This field MAY be empty, indicating that the value is unspecified. - avg-framerate: This value specifies the average frame rate of the operation point. This value is given as an integer in frames per 256 seconds. The field MAY be empty, indicating that the value is unspecified. - width: This value specifies the width dimension in pixels of decoded frames for the operation point. This parameter is not directly given in the scalability information SEI message. This field MAY be empty, indicating that the value is unspecified. - height: This value gives the height dimension in pixels of decoded frames for the operation point. This parameter is not directly given in the scalability information SEI. This field MAY be empty, indicating that the value is unspecified. - avg-bitrate: This value specifies the average bitrate of the operation point. This parameter is given as an integer in kbits per second over the entire stream. Note that this parameter is provided in the scalability information SEI message in bits per second and calculated over a variable time window. This field MAY be empty, indicating that the value is unspecified. - max-bitrate: This value specifies the maximum bitrate of the operation point. This parameter is given as an integer in kbits per second and describes the maximum bitrate per each one-second window. Note that this parameter is provided in the scalability information SEI message in bits per second and is calculated over a variable time window. This field MAY be empty, indicating that the value is unspecified. Similarly to sprop-scalability-info, this parameter MAY be used in an offering or declarative SDP message to indicate what layers (operation points) can be provided. A receiver MAY indicate its choice of the highest layer it wants to send and/or receive using the optional media type parameter scalable-layer-id. sprop-no-NAL-reordering-required: This parameter MAY be used to signal the properties of the NAL unit stream. This parameter MUST NOT be present when mst-mode is not present or the value of mst-mode is not equal to "NI-T". The presence of this parameter indicates that no reordering of non-VCL or VCL NAL units is required for the decoding order recovery process. sprop-avc-ready: This parameter MAY be used to indicate the properties of the NAL unit stream. The presence of this parameter indicates that the RTP session, if used in SST, or used in MST combined with other RTP sessions also with this parameter present, can be processed by a [RFC6184] receiver. This parameter MAY be used with RTP sessions with media subtype H264-SVC. Encoding considerations: This media type is framed and binary; see Section 4.8 of RFC 4288 [RFC4288]. Security considerations: See Section 8 of RFC 6190. Published specification: Please refer to RFC 6190 and its Section 13. Additional information: none File extensions: none Macintosh file type code: none Object identifier or OID: none Person & email address to contact for further information: Ye-Kui Wang, yekui.wang&huawei.com Intended usage: COMMON Restrictions on usage: This media type depends on RTP framing, and hence is only defined for transfer via RTP [RFC3550]. Transport within other framing protocols is not defined at this time. Interoperability considerations: The media subtype name contains "SVC" to avoid potential conflict with RFC 3984 and its potential future replacement RTP payload format for H.264 non-SVC profiles. Applications that use this media type: Real-time video applications like video streaming, video telephony, and video conferencing. Author: Ye-Kui Wang, yekui.wang&huawei.com Change controller: IETF Audio/Video Transport working group delegated from the IESG.