AMD Adds Media Accelerator for Live Video Applications
In the week before NAB, AMD made an interesting announcement about new media acceleration hardware and we were able to get a briefing about what it enables.
For those that don’t follow the chip industry closely, just over a year ago AMD completed the acquisition of Xilinx, a key developer of special Field Programmable Gate Arrays (FPGAs) and Application Specific Integrated Circuits (ASICs). These chips have been used by developers of products both in the early stage of markets (when volumes may be lower and requirements less clear) and in professional applications where volumes will never rise to the levels needed for high volume semiconductors of the type that might be used in consumer devices. The combination of AMD’s CPUs and GPUs with this kind of custom chip should lead to some interesting developments in the next few years.
One of the first products to come out after the acquisition is the MA35D Media Acclerator Card. This card is intended to be used in servers and data centers to support a growing trend which is the huge increase in live streaming with reduced latency. The technology is likely to be important in meeting new needs in live streaming.
From one-to-many to many-to-many
In recent years, the main use for live streaming has been to relay one or two streams of live content such as sports events or concerts in a ‘one-to-many’ architecture. As has often been pointed out, this often leads to significant latency in encoding. Sometimes latency is introduced for editorial reasons – e.g. to avoid bad language or for other kinds of censorship, but often it is irritating for sports content. For example, an app may indicate a score before the live stream. Anyway, ‘traditional’ TCP-based architectures feeding into Content Delivery Networks (CDNs) are economic and practical for this kind of application. The new board has a latency down to 8ms for a 4K stream.

However, in recent years, much of the growth in internet-delivered content is multi-stream or bi-directional video content. Increasingly, now that many have got equipped and used to Zoom and other video services, there is a desire for video interactivity and there latency is a real problem. You may also need to transcode between different codecs on the input and output side.

Technically, this has been possible for some time but as demand for this kind of service and for other low-latency services such as cloud gaming and online betting, it is essential that these operations can be supported economically, both in terms of the hardware and the energy and running costs. That’s where the new AMD MA35D card comes in.

The card has two accelerator chips and that means that the board can transcode 32 streams of 1080P60 at a cost of $1,595 – or $50 per stream. The chips are built using a efficient 5nm level process. The half-height, half-length card also only takes 1W per stream. Because of the low power consumption and small size AMD told use that you could put 8 or 10 boards in a single chassis and support up to 256 streams of 1080P60 AV1 in one server. In a data center, the servers could also be stacked up for more support and very cost-effective scaling.
AMD highlighted to us that it has an existing product, the Alveo U30 that is used already by AWS. However, the new board is much more powerful, with less power consumption and lower cost per feed.
You Can Create Output up to 8K @ 60fps
Combining multiple lower resolution streams can be a compelling use case if they are combined to create a single 8K60P output. That output could include a main UltraHD image with surrounding videos for extra information or social feeds. Equally, you could support multiple video streams for cloud gaming applications. Moving the video encoding to a separate media accelerator can also remove some workload from a GPU, allowing better response times, frame rates or higher image quality visuals.

The chips support different codecs from H.264/5 to AV1 and VP9 and the boards have been developed over more than three years with customers. It has also been specifically designed for infrastructure applications rather than being a re-purposed consumer device. As discussed when we spoke to Spin Digital about its live VVC encoding, with the most recent codecs, there is a big computational load to achieve low bitrates and AMD has done a lot of work to help the board balance different streams with dynamic configuration so that resources can be applied where they can best be used. There is a frame by frame feedback loop in the system to evaluate encoding and transcoding quality for optimum results. With the dynamic configuration, there is less need to ‘configure for the worst case’ as was often done in previous generations.
AMD will be at NAB to show its new board in action.