Upscaling Reviewed (Part 1)
Currently, the production of native 8K content is still in the ramp-up stage, so getting the best viewing experience is often about making the most of ‘up-scaling’ technology, that is to say, taking lower resolution content and matching it to the resolution of the display. That technology has been through several distinct phases in TVs. We are planning a later article looking at some of the practical challenges of upscaling in TVs, but in this ‘part 1’, we’ll look at the process from a theoretical point of view.
- Mathematical algorithms
- Bicubic & multi-frequency peaking
- Lookups and mapping
- AI-based upscaling
- Deep-learning upscaling
The first upscaling methods were simple and based on averaging, bicubic, or other mathematical functions. These work to ‘fill in the gaps’ between known pixels. In other words, you spread the available pixel data out over the new resolution area. That will mean some gaps in between.
The crudest method is to take the ‘nearest neighbor’ pixel and duplicate it, but that makes the poor image look bigger – not better. (for a more detailed look at how this might be done, look here.
This approach doesn’t improve the image. At the very least, you need to average or calculate a suitable value for the missing pixel using the pixels you have. Approaches that use this kind of method include bilinear filtering or interpolation (Wikipedia has more on this here). Other approaches use more sophisticated algorithms, such as bicubic interpolation.
However, this upscaling tends to be quite blurry, especially when the content has already been compressed, as it is with video data. More sophisticated upscaling algorithms process the data differently at different spatial frequencies to preserve detail and sharpness. Some sophisticated tools can do a surprisingly good job of this, but they tend to be complex. In the case of video, there is only limited time and computing power available to process the data.
AI and Lookups Allow a Breakthrough
The big breakthrough in upscaling came from AI and machine learning. In this process, neural networks are fed huge numbers of high-resolution images and corresponding low-resolution equivalents. The system learns to recognize patterns in the downscaled images and substitute the high-resolution image segments that most closely match the downscaled image. This process avoids breaking one of the fundamental laws of signal processing: you can’t create information that isn’t there initially. You effectively combine the information from the low-resolution image with a high-resolution database to get excellent results.
How does the system know if the image by this process is accurate to the original? Well, in the initial training, you can compare the images created with the original ‘ground truth’ image and use a mathematical technique to compare them. A traditional way of doing this is with the Mean Square Error (MSE) process. You can see how well the image is scaled by minimizing the MSE. However, this process does not consider the structure or shapes of the original image, and the results don’t match well to perceived visual quality as several different distortions, some of which are pretty obvious, can create the same MSE score.
Scientists at the University of Texas at Austin developed a different measurement called the Structural Similarity Index nearly 20 years ago (Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,”) The developers won an Emmy Award for the work which was a significant breakthrough in objective image quality assessment.
Upscalers Can be Trained to Create Better Looking Images
Upscalers can be trained to be more accurate and look better to the human visual system. Further developments of this approach balances the ‘looks better but less accurate’ processing with ‘less good looking but more accurate MSE’ approaches. We can then find a ‘sweet spot’ for the processing.
One advantage of video upscaling over still image upscaling is that sometimes there are multiple frames of images, with slight movement between the frames. That, effectively, allows the process to use more samples of the image to create a higher resolution image. However, the processing takes more time that way, so it wouldn’t be suitable, for example, for use on gaming content (and cloud gaming is seen as a significant way for TVs to be used in the future by many analysts such as Omdia).
Optimize and Battle it Out
You can also optimize the scaling based on an analysis of the content. For example, you may have a specific process for upscaling people or faces, and this more specialized algorithm can do a better job. Another example is the upscaling of images of grass. These are often important in sports and other content, and if the video processor can recognize that it is dealing with grass, it can optimize the output.
Generative Adversarial Networks (GANs) can be used to train the AI upscaling systems. One network is trying to upscale the image, while a second network analyzes the upscaled image to see if it ‘looks real’. This feedback helps the upscaling system develop more accurate upscaling with better and more detailed textures and even sharper images that look more realistic. You can find an excellent discussion of this process here.
Part 2 will be publsihed shorty.