Tests of Live VVC 8K 60fps Encoding Detailed & Demoed at IBC 2023
At IBC 2023, Spin Digital and Intel (both companies that are 8K Association members) presented a paper and a demo on 8K 60-fps 10-bit HDR live encoding using the VVC/H.266 codec.
The VVC live encoder developed by Spin Digital achieves a 24% reduction in bitrate compared to HEVC while maintaining quality. The tests using Intel’s 4th generation Xeon® Scalable Processors were able to perform real-time 8K 60fps 10-bit HDR encoding at a low latency (1 to 3 seconds).
As the paper highlights, 8K 60 fps 10-bit video generates a data rate of 48 Gbps when uncompressed. Up to now, all of the live broadcasting so far demonstrated in services or tests has run using the HEVC/H.265 codec, but the authors point out that HEVC is reaching the point where simply adding more computation will result in only marginal compression gains. That means that a new codec is needed.
Spin Digital worked closely with Intel to achieve maximum performance on the previously mentioned 4th gen. Xeon® CPU. The target platform was a dual socket 4th gen. Xeon CPU (aka Sapphire Rapids) which allows up to 60 cores (120 threads with Hyper-Threading Technology) and up to 120 cores in double socket systems (240 threads). To exploit the platform efficiently the encoder has to make use of wide Single Instruction Multiple Data (SIMD) instructions and all the CPU cores (120 cores, 240 threads).
The Encoder Implementation
The architecture (at a high level) is shown in the diagram below and there are four main processing stages: input capture, lookahead, VVC encoding and output muxing and streaming. The paper looks at the way that parallelism can be used – an essential step to fully exploit the resources of architectures with a large number of CPU cores. The authors found that more parallelism can be exploited by adding picture partitions or by increasing the buffer size to two seconds from the default of one second, resulting in higher efficiency without a significant impact on quality. The extra latency is not an issue with use cases such as HLS and DASH streaming and even with broadcasting (which needs 3 – 11 seconds) the latency could be acceptable.
A Complete Live Encoding Framework
Encoding live signals means quite a lot of different processes and the paper lists:
- input capture based on 12G SDI,
- pre-processing (scaling, colour conversion, tone mapping),
- core VVC (and HEVC) encoding,
- audio encoding (MPEG-H Audio, AAC),
- lookahead analysis, Constant Bitrate (CBR) control with HRD model and Variable Bitrate (VBR) control,
- perceptually optimised encoding, and
- streaming for HTTP (HLS, DASH) or TS-over-IP delivery (RTP, SRT, RIST, Zixi).
To meet all these requirements, a complete Live Encoding Framework has been developed.
The group compared seven video sequences (from Fraunhofer HHI, PSNC, NHK, Unigine, The Explorers) with a range of hardware and software encoders using HEVC, AV1 and VVC. The format was 7680×4320 pixels, 60 fps, 4:2:0, 10-bit, SDR/HDR, and BT.709/BT.2020.
The video encoders were compared in terms of compression efficiency, encoding complexity and encoding performance. The Bjøntegaard Delta (BD)-rate method was used to compute compression efficiency. It computes the average bitrate increase produced by a test encoder referred to a baseline encoder at the same quality. SpinHEVC was selected as the baseline encoder.
Four quality metrics were used to understand the results:
- Peak Signal-to-Noise Ratio (PSNR),
- Perceptually Weighted PSNR (XPSNR)
- Multi-Scale Structural Similarity (MS-SSIM)
- Video Multi-method Assessment Function (VMAF).
The PSNR, XPSNR, and MS-SSIM metrics were calculated using the luma and chroma components (note that VMAF only considers the luma component). The results for each of the metrics was presented in the paper, but we have picked just one, for MS-SSIM. The charts were broadly similar for each of the metrics although there were differences.
BD-Rate based on MS-SSIM and CPU utilisation time for 8K video relative to the HEVC baseline (SpinHEVC – fast)
The encoding performance, in terms of frames per second (fps), was measured at three different bitrates (20, 40, and 80 Mbps) for the proposed VVC encoder on the two target platforms: a 3rd gen. 2x Intel Xeon Platinum 8368 with 2x 38 cores (Ice Lake – ICX), and a 4th gen Intel Xeon Platinum 8480+ with 2x 56 cores (Sapphire Rapids – SPR). The VVC encoder was also compared to the baseline HEVC encoder.
PSNR-bitrate (left) and VMAF-bitrate (right) curves for BerlinSeqs (one of the most challenging clips)
When running SpinVVC – fast on the ICX server, the paper reported that the encoding speed was below the target real-time frame rate. By using the additional resource of the SPR server it achieved a speed of 70 fps at around 30 Mbps (1.3x speedup) (70 fps is needed for 60fps live encoding). When adding extra parallelism, spatial or temporal, the encoder was able to better exploit the CPU resources to produce 8Kp60 VVC live video at 40 Mbps.
The VVC live encoder was demonstrated at the IBC trade show at the Spin Digital booth (Hall 1, B.32).
On the encoder side, the live encoder ran on a server with 2x Intel Xeon Platinum 8480+ processors (2x 56 cores). On the playback side, the VVC streams were live decoded and rendered using Spin Digital’s media player (Spin Player VVC) running on a compact PC with an Intel Core i9-13900K CPU. The PC was connected to a Sony 8K TV (XR-75Z9J) via HDMI 2.1.
In conclusion, the paper highlighted that the VVC encoder achieved compression efficiency gains of 24% (BD-rate VMAF) at the cost of 44% more CPU resources when compared to a highly optimised 8K HEVC baseline.
The encoder, running on a 4th Generation Intel Xeon Scalable CPU server, achieved the performance required for 8Kp60 10-bit HDR live encoding at 40 Mbps, making it a viable choice for live 8K applications over constrained bandwidths, such as terrestrial broadcasting or internet streaming. For other applications where higher bitrates are allowed, optimised HEVC encoders have proven to still be a good live encoding option for delivering high-quality 8Kp60 10-bit HDR video.
The on-site demonstration highlighted that the techniques for real-time encoding presented in the paper are ready to be used in live streaming and broadcasting applications.
As future work lines, quality and performance enhancements will be implemented in the 8K VVC encoder and tested on state-of-the-art computing platforms including, among others, perceptually-optimized encoding, live ABR streaming, and content-aware live encoding.
The full paper can be downloaded here (registration required)
More information about the demonstration can be found here.