Skip to Main Content
February 6, 2024

Apple Vision Pro Boosts 8K Capture

Unless you never access the technology media (and that may exclude you from reading this!), you will know that Apple has finally released its XR headset, the Vision Pro, in the US. (So the rest of the world will have to wait for now!). There have been a number of reports of Apple creating 8K immersive video content. The first four titles are:

  • Wild Life “Rhinos”
  • Adventure “Highlining”
  • Prehistoric Planet Immersive  “Pterosaur Beach”
  • “Alicia Keys Rehearsal Room”

These titles are captured with Apple’s Spatial Audio and can be viewed in 3D on the Vision Pro headset. We dug a bit more into the format used for the video.

Capturing with a headset or iPhone

The Vision Pro can be used to capture stereo 3D content with the inbuilt cameras. The video is reported to be stored using the MV-HEVC (Multi-View High-Efficiency Video Coding) format at around 130 Mbps. These images are just 1080P format at 30fps. An iPhone 15 Pro/Pro Max or greater can also be used to capture this quality of stereo 3D video.

This kind of content will give you a 3D experience but will only show the scene from the single viewpoint of the camera that created the content – impressive, according to reports, but relatively limited. Much more interesting to us is the 180 degree immersive content.

Capturing 180 Degree content

Now, to create truly immersive content, where the viewer can move around and select their own viewing direction, you need a means to capture two different (left eye/right eye) 180-degree views of the scene and that needs a special lens system. One such is the Canon RF 5.2, a dual fish-eye lens which is designed to work with the EOS R5 or EOS R5C camera bodies, which both support 8K Raw capture. At IBC in 2023, Canon also added support for the EOS R6II body, which is a lower cost device, but which captures only in 4K resolution, downscaled from 6K.

Canon’s EOS R5 captures in 8K Raw for high quality stereo 3D viewing.

As well as these Canon options, there are dedicated cameras such as the Kandao 8K VR Cam (DU1104). Unlike the Canon model, the Kandao camera uses a separate camera sensor for each eye. It can also support live streaming of VR180 content.

Having captured the video, there will be a need to edit and post-process and important editors such as Final Cut Pro (of course) and daVinci Resolve or Adobe Premiere have support for stereo 3D formats.

When you have this kind of dual lens 8K system, the image from each lens is captured on one side of the sensor, so you have dual 4K x 4K capture. (If you are new to this kind of technology, there is a useful introductory guide here)

Viewing the Images

Now, Apple has not released official ‘field of view’ figures for the Vision Pro, but those that have dug into it suggest that it might be around 100 degrees, slightly less than the 110 degrees of the Quest 3 from Meta. If we assume that is the case, then, based on the idea that around 4K is available for each eye using a dual lens and MV-HEVC, each eye will be receiving around 2133 pixels (3840 x 100/180)1. That also assumes that the capture of the 180 degree video is reasonably linear. If there are just over 2K pixels in the image, then the video will have to be upscaled to match the 4K per eye resolution of the Vision Pro headset. So the visual quality is not likely to be optimal, but should still be impressive.

By our calculation, to create video content that would match the resolution of the headset would need resolution of around 7K per eye – so around 14K total if the images were side by side on a single sensor. (That figure is also confirmed by a blog post from Tiledmedia that was posted when the headset was announced. Tiledmedia is a specialist technology company in VR180 and VR360). On the other hand, that level of technology is quite possible using dual 8K cameras in a stereo pair.

Send Some or Send All?

We’re assuming that the current approach uses a simple ‘send everything and let the headset select which portion to show’ approach. There is an alternative approach under development by organizations such as the Fraunhofer HHI and Tiledmedia. In this approach, a server stores the full video and data is sent from the headset to identify the area of interest based on the position of the headset and of the direction of gaze. A ‘window’ from the full video is transmitted to the headset rather than the whole image. This requires both rapid data transmission with very low latency and extremely fast encoding of the video which are both challenges, but the arrival of 5G technologies and better semiconductors might well help remove those barriers over time.

Tiledmedia has technology to only transmit a portion of the image, reducing bandwidth but also allowing higher quality within a given bandwidth.

Volumetric Video

Stereo 3D can be a compelling experience, but in the longer term, researchers are working to develop volumetric video, where a 3D space is mapped entirely in video. In this case, the user is free to move about in a 3D space to view the scene from different viewing points and directions. However, that’s still much in the R&D phase.

  1. Since we published this article, iFixit has done a ‘deep dive‘ into the Vision Pro display and has estimated the resolution as 3660 px by 3200 px (albeit with the corners ‘chopped off’ by the optics. The new numbers don’t change the idea that the 2K window on the two 4K images will need to be upscaled. ↩︎
0 0 votes
Article Rating
Notify of
Inline Feedbacks
View all comments
Would love your thoughts, please comment.x