NHK Highlight VVC Advances for 8K
At IBC this year, NHK showed an interesting demonstration of 8K video encoding using one of the latest features in the VVC codec (H.266), Multi-Layer Coding to highlight VVC advances for 8k. The fundamental concept is that one broadcast stream acts as a ‘base layer’ (an HD layer in the NHK demo case). Additional layers add the information needed for 4K, and a further layer boosts that up to 8K. All three layers are required to show 8K. (also see this video report from IBC)
There are several advantages to this approach:
- Broadcasters can use a single originating stream for all three resolutions
- Each layer only carries the additional information, so it minimizes bandwidth use compared to sending separate complete HD, 4K, and 8K streams.
- The different layers can be sent using different transport systems. e.g. the base layer could be sent by terrestrial broadcasting, satellite or cable with the enhancement layers delivered over the internet.
- The technology can also be used to deliver additional features such as special access (e.g., subtitles or signing for improved accessibility) or for ‘picture in picture’ features that can be viewed or not according to the user’s wishes.
- Enhancement layers can send the information needed to enhance the spatial resolution, extra color or dynamic range data, or higher refresh rates.
The demo at IBC used three layers, but more layers could be used for different purposes, although NHK’s encoder has only been tested for up to three. NHK confirmed that the Amsterdam demo content was encoded using a software encoder based on the VVenC technology from the Fraunhofer HHI. As far as NHK is aware, there are no chips yet that can perform the encoding. However, some hardware systems for real-time encoding are under development by companies such as MainConcept, SpinDigital, and NEC.
The content at IBC was encoded with a single pass, although NHK is confident that it could achieve higher quality with multi-pass encoding, but at the cost of a longer encoding process. Adding the extra frames means that multi-layer encoding takes longer than a single layer. However, because most of the coding tools (including inter-layer prediction) required for the Multilayer Main 10 profile are mandatorily implemented for the Main 10 profile because multilayer encoding was ‘built-in’, the extra processing should not be a significant burden, it is thought.
Each layer can be configured with different parameters, such as block size and RD optimization, but the Coding (GOP) structure needs to be the same between the three layers. NHK confirmed that temporal scalability (i.e., faster frame rates) could be implemented with the additional layer and enhanced bit depths and spatial resolution. The frames are allocated ‘temporal IDs’ depending on the reference structure, allowing the decoder to choose which frames to decode.
Because the multi-layer approach is built into the specification, compliant decoders will automatically be able to process the different layers, and no additional processing is needed. (Strictly speaking, NHK confirmed the enhancement layer is built using the reconstructed base layer as a reference, so there is no separate ‘combination stage’).
Enhancement Layers are Not New
The idea of using enhancement layers is not a new one. The same approach is used for scalable video coding, where multiple layers are sent if there is available bandwidth to provide data and higher image quality. This technique was used as far back as H264/MPEG-4 AVC and was standardized more than 15 years ago. The method could have been used as far back as MPEG-2 but had not been implemented partly because of a loss in coding efficiency and increased decoder complexity.
VVC, on the other hand, from the start of its development, had its high-level syntax specified to optimize for multi-layer streams and for ‘random access’ – that is to say – when starting the stream from an arbitrary frame other than the very beginning of the video stream. Different codecs started with single-layer versions, then multi-layer capabilities were added later.
VVC also has changed how a stream of mixed resolutions is handled. In HEVC, the spatial resolution of a video bitstream can only change at an instantaneous decoding refresh (IDR) picture or equivalent, as illustrated by the upper half of Fig. 3 below. VVC also allows the spatial resolution to change at inter-coded pictures (for a more detailed look at the issues of decoding IDR and clean random access (CRA) frames), look here.
Streams with Mixed Resolutions
VVC has a feature called reference picture resampling (RPR). When a resolution change happens, “the decoding process of a picture may refer to one or more previous reference pictures with a different spatial resolution for inter-picture prediction, and consequently, a resampling of the reference pictures for the operation of the inter-picture prediction process may be applied.” Compared to forcing the insertion of an IDR picture to switch resolution, allowing inter-picture prediction from reference pictures of different resolutions improves coding efficiency. It mitigates the problem of bit rate spikes associated with IDR & CRA pictures (collectively known as intra-random access point (IRAP) pictures).
The RPR feature in VVC provides the resampling functionality between the current picture and its temporal reference pictures, so no additional signal-processing features are needed to support the inter-layer prediction of spatial scalability. Instead, spatial scalability is achieved with high-level syntax changes to the single-layer coding design.
The scalability-specific HLS in VVC has been designed to be significantly more straightforward than in the multi-layer extensions of HEVC. Additionally, the decoded picture buffer management design and the definitions of the levels for the profiles have been made so that the same design applies to both single-layer and multi-layer bitstreams, which enables single-layer decoders to be easily adapted to support the decoding of multi-layer bitstreams.
LC-EVC Also Uses Multi-layers
Layered techniques are also used in the approach used for MPEG-5 Part 2 (LC-EVC – low complexity enhancement video coding). LC-EVC differs from the VVC approach as it uses a different compression method for the enhancement layer.
The LC-EVC system will be used by Brazil in its next-generation broadcast system and was recently tested during the World Cup. The Brazilian media company, Globo, produced a standard signal in 1080i Rec.709 SDR but broadcast it with an enhancement layer to allow appropriate equipment to create a 1080P 10-bit HDR 10 signal with Rec. 2020 color encoding.
InterDigital and Technicolor also use a variation on the idea for the latter’s ‘Advanced HDR’ (SL-HDR) technology which sends an SDR stream along with SL-HDR dynamic data in a single stream. If the equipment does not exploit the metadata, it is dropped, but if it knows about the technology, the SDR base signal can be enhanced to HDR.
VVC also has extra features to allow the different layers to be used for different camera angles, similar to the HEVC multiview extension. However, it does not include the specific tools for coding depth maps as introduced with the HEVC 3D video coding extension (3D-HEVC). It is expected that depth maps can be compressed much better with the essential tools of VVC at lower bit rates than pictures.
8K and VVC Multi-Layer Coding
At IBC, one of the impressive features of the NHK demonstration was the level of data transmission needed for the different signals. The HD layer was encoded at just 1Mbps, with the 4K/UltraHD layer adding a further 9 Mbps (total 10 Mbps) and 8K adding another 25Mbps for a total of 35Mbps using VVC. That’s an impressively low bit rate for what looked like well-encoded content to us (although a trade show codec demonstration is never the same as a full test including particularly challenging content).
NHK told us that it did not regard the demonstration’s quality as optimal and that it really regards 50Mbps as the benchmark for 8K/60P.
Different Transport Methods
A key feature of the multi-layer concept is using different transmission channels for each layer. In particular, broadcasters using a terrestrial transmission system (and many people worldwide rely on DTT transmissions) can enhance the DTT layer with additional content or quality boosting using OTT/internet technology.
ATSC 3.0 allows MIMO technologies to support simultaneous vertical and horizontal antennas, allowing one layer to be transmitted over each but allowing a fall back to a single layer where the complete specification is not implemented.
Overall, the VVC specification seems to have the potential to significantly boost TV quality and add new features and services while minimizing the bandwidth of the transmission systems used.