Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

AWE 2024 VR – Hypervision, Sony XR, Big Screen, Apple, Meta, & LightPolymers

22 July 2024 at 16:45

Introduction

Based on information gathered at SID Display Week and AWE, I have many articles to write based on the thousands of pictures I took and things I learned. I have been organizing and editing the pictures.

As its name implies, Display Week is primarily about display devices. My major takeaway from that conference is that many companies work on full-color MicroLEDs with different approaches, including quantum dot color conversion, stack layers, and single emitter with color shifting based on current or voltage.

AWE moved venues from the Santa Clara Convention Center in Silicon Valley to the larger Long Beach Convention Center south of LA. More than just a venue shift, I sensed a shift in direction. Historically, at AWE, I have seen many optical see-through AR/MR headsets, but there seem to be fewer optical headsets this year. Instead, I saw many companies with software running on VR/Passthrough AR headsets, primarily on the Meta Quest 3 (MQ3)and Apple Vision Pro (AVP).

This article was partly inspired by Hypervision’s white paper discussing whether micro-OLEDs or small LCDs were the best path to 60 pixels per degree (PPD) with a wide FOV combined with the pictures I captured through Hypervision’s HO140 (140° diagonal FOV per eye) optics at AWE 2024. I have taken thousands of pictures through various headsets, and the Hypervision picture stood out in terms of FOV and sharpness. I have followed Hypervision since 2021 (see Appendix: More on Hypervision).

I took my first pictures at AWE through the Sony XR (SXR) Headset optics. At least subjectively, in a short demo, the SXR’s image quality (sharpness and contrast) seemed higher than that of the AVP, but the FOV was smaller. I had on hand (thousands) of pictures I had taken through the Big Screen Beyond (BSB), AVP, Meta Quest Pro (MQP), and Meta Quest 3 (MQ3) optics with the same camera and lens, plus a few of the Hypervision HO140 prototype. So, I decided to make some comparisons between various headsets.

I also want to mention LightPolymers’ new Quarter Waveplate (QWP) and Polarization technologies, which I first learned about from a poster in the Hypervision AWE booth. In April 2024, the two companies announced a joint development grant. They offer an alternative to the plastic film QWP and Polarizers, where 3M dominates today.

Hypervision’s HO140 Display

Based on my history of seeing Hypervision’s 240° prototypes for the last three years, I had, until AWE 2024, largely overlooked their single display 140° models. I had my Canon R5 (45Mp with 405mp ” 3×3 sensor pixel shift mode”) and tripod with me at AWE this year, so I took a few high-resolution pictures through the optics of the HO140. Below are pictures of the 240° (left) and 140° (right) prototypes in the Hypervsion Booth. Hypervision is an optics company and not a headset maker and the demos are meant to show off their optics.

When I got home and looked at the pictures through the HO140, I was impressed by the overall image quality of the HO140, after having taken thousands of pictures through the Apple Vision Pro (with Micro-OLED displays) and Meta’s Quest Pro, Quest 3 (both with mini-LCD displays), the Big Screen Beyond. It usually takes me considerable time and effort, as well as multiple reshoots, to find the “sweet spot” for the other devices, but I got good pictures through the HO140 with minimal effort and only a few pictures, which suggests a very large sweet spot in Hypervision’s optical design. The HO140 is a prototype of unknown cost that I am comparing to production products. I only have this one image to go by and not a test pattern.

The picture below is from my Canon R5, with a 16mm lens netting a FOV of 97.6° horizontal by 73.7° vertical. It was shot at 405mp and then reduced to 45mp to avoid moiré effects due to the “beat frequencies” between the camera sensor and the display devices with their color subpixels. All VR optics pincushion, which causes the pixel sizes to vary across the display and increases the chance of getting moiré in some regions.

The level of sharpness throughout the HO140’s image relative to other VR headsets suggests that it could support a higher-resolution LCD panel with a smaller pixel size if it existed. Some significant chroma aberrations are visible in the outer parts of the image, but these could be largely corrected in software.

Compared to other VR-type headsets I have photographed, I was impressed by how far out into the periphery of the FOV the image maintains sharpness while supporting a significantly larger FOV than any other device I have photographed. What I can’t tell without being able to run other content, such as test patterns, is the contrast of the display and optics combination.

I suggest also reading Hypervision’s other white papers on their Technology & Research page. Also, if you want an excellent explanation of pancake optics, I recommend Arthur Rabner’s, CTO of Hypervision, one-hour and 25-minute presentation on YouTube.

Sony XR (SXR)

Mechanical Ergonomics

AWE was my first time trying the new Sony XR (SXR) headset. In my CES 2024 coverage, I wrote about the ergonomic features I liked in Sony XR (and others compared to Apple Vision Pro). In particular, I liked the headband approach with the flip-up display, and my brief try with the Sony headset at AWE seemed to confirm the benefits of this design choice (which is very similar to the Lynx R1 headset), at least from the ergonomics perspective relative to the Apple Vision Pro.

Still, the SXR is still pretty big and bulky, much more so than the AVP or Lynx. Having only had a short demo, I can’t say how comfortable it will be in extended use. As was the case for the HO140, I couldn’t control the content.

“Enterprise” Product

Sony has been saying that this headset primarily aims at “enterprise” (= expensive high-end) applications, and they partner with Siemens. It is much more practical than the Apple Vision Pro (AVP). The support on the head is better; it supports users wearing their glasses, and the display/visor flips up so you can see the real world directly. There is air circulation to the face and eyes. The headset also supports adjustment of the distance from the headset to the eyes. The headset allows peripheral vision but does have a light shield for full VR operation. The headset is also supposed to support video passthrough, but that capability was not demonstrated. As noted in my CES article, the SXR headset put the pass-through cameras in a much better position than the AVP.

Display Devices and Image Quality

Both the AVP and SXR use ~4K micro-OLED display devices. While Sony does the OLED Assembly (applying the OLED and packaging) for its headset and the AVP’s display devices, the AVP reportedly uses a custom silicon backplane designed by Apple. The SXR’s display has ~20% smaller 6.3-micron pixels than the AVP’s 7.5-micron. The device size is also smaller. The size factors of the SXR favor higher angular resolution and a smaller FOV, as is seen with the SXR.

The picture below was taken (handheld) with my 45MP Canon R5 camera with a 16mm lens like the HO140, but because I couldn’t use a tripod, I couldn’t get a 405MP picture with the camera’s sensor shifting. I was impressed that I got relatively good images handheld, which suggests the optics have a much larger sweet spot than the AVP, for example. To get good images with the AVP requires my camera lens to be precisely aligned into the relatively small sweep spot of the AVP’s optics (using a 6-degree-of-freedom camera rig on a tripod). I believe the Apple Vision Pro’s small sweet spot and the need for eye-tracking-based lens correction, and not just for foveated rendering, are part of why the AVP has to be uncomfortably clamped against the user’s face.

Given that I was hand-holding both the headset and camera, I was rather surprised that the pictures came out so well (click on the image to see it in higher, 45mp resolution).

At least in my brief demo, the SXR’s optics image quality seems better than the AVP’s. The images seem sharper with lesser chroma (color) aberrations. The AVP seems heavily dependent on eye tracking to correct optics problems with the optics, but it does not always succeed.

Much more Eye Refief (enabling eye glasses) but lower FOV

I was surprised by how much eye relief the SXR optics afforded compared to the AVP and BSB, which also use Micro-OLED microdisplays. Typically, the requirement for high magnification of the micro-OLED pixels compared to LCD pixels inherently makes eye relief more difficult. The SXR magnifies less, resulting in a smaller FOV, but also makes it easier optically for them to support more eye relief. But note, taking advantage of the greater eye relief will further reduce the FOV. The SXR headset has a smaller FOV than any other VR-type headset I have tried recently.

Novel Sony controllers were not a hit

While I will credit Sony for trying something new with the controllers, I didn’t like finger trackpad and ring color are great solutions. I talked with several people who tried them, and no one seemed to like either controller. It is hard to judge control devices in a short demo; you must work with them for a while. Still, they didn’t make a good first impression.

VR Headset “Shootout” between AVP, MQP, Big Screen Beyond, Hypervision, and Sony XR

I have been shooting VR headsets with the Canon R5 with a 16mm lens for some time and built up a large library of pictures. For the AVP, Big Screen Beyond (BSB), and Meta Quest Pro (MQP), I had both the the headset and the camera locked down on tripods so I could center the lens in the sweet spot of the optics. For the Hypervision, while the camera and headset were on tripods, my camera was only on a travel tripod without my 6-degree-of-freedom rig and the time to precisely locate the headset’s optical sweet spot. The SXR picture was taken with my hand holding the headset and the camera.

Below are through-the-optics pictures of the AVP, BSB, MQP, Hypervision HO140, and SXR headsets, all taken with the same camera and lens combination and scaled identically. This is not a perfect comparison as the camera lens does not work identically to the eye (which also rotates), but it is reasonably close. The physically shorter and simpler 16mm prime (non-zoom) lens lets it get inside the eye box of the various headsets for the FOV it can capture.

FOV Comparison (AVP, SXR, BSB, HO140, MQ3/MQP)

While companies will talk about the number of horizontal and vertical pixels of the display device, the periphery of the display’s pixels are cut off by the optics, which tend to be circular. All the VR headset optics have a pincushion distortion, which results in higher resolution in the sweet spot (optical center), which is always toward the nose side and usually above the center for VR headsets.

In the figure below, I have overlaid the FOV of the left eye for the headsets on top of the picture HO140 image. I had to extrapolate somewhat on the image circles on the top and bottom as the headset FOVs exceeded the extent of the camera’s FOV. The HO140 supports up to a 2.9″ diagonal LCD (that does not exist yet), but they currently use a 2.56″ 2160×2160 Octagonal BOE LCD and are so far beyond the FOV of my camera lens that I used their information.

As can be seen, the LCD-based headsets of Hypervision and Meta typically have larger FOV than the micro-OLED-based headsets of AVP, Meta, and Sony. However, as will be discussed, the micro-OLED-based headsets have smaller pixels (angularly and on the physical display device).

Center Pixels (Angular Size in PPD)

Due to handholding the SXR and having pixels smaller than the AVP, I couldn’t get a super-high-resolution (405 mp) image from the center of the FOV and didn’t have the time to use a longer focal length lens to show the pixel boundaries. The SXR has roughly the same number of pixels as the AVP but a smaller FOV, so its pixels are angularly smaller than the AVP’s. I would expect the SXR to be near 60 pixels per degree (PPD) in the center of the FOV. The BSB has about the same FOV as the AVP but has a ~2.5K micro-OLED compared to the AVP’s ~4K; thus, the BSB pixels in the center are about 1.5x bigger (linearly). The Hypervision’s display has a slightly smaller center pixel pitch than the MQP (and MQ3) but with a massively bigger FOV.

The MQP (and the very similar MQ3) rotate the display device. To make it easier to compare the pixel pitches, I included a rotated inset of the MQP pixels to match the alignment of the other devices. Note that the pictures below are all “through the optics” and thus include the headset’s optical magnification. I have given the angular resolution in PPD for each headset. I have indicated the angular resolution (in pixels-per-degree, PPD) for each of the headset’s center pixels. For the center pixels pictures below, I used a 28mm lens to get more magnification to see sub-pixel detail for the AVP, BSB, and MQP. I only took 16mm lens pictures of the HO140 and, therefore, rescaled the image based on the different focal lengths of the lens.

The Micro-OLED base headsets require significantly more optical magnification than the LCD models. For example, the AVP has 3.2x (linearly) smaller display device pixels than the MQP, but after optics, the pixels are ~1.82x smaller. As a specific example, the AVP magnifies the display by ~1.76 more than the MQP.

Outer Pixels

I capture pixels from a similar (very approximately) distance from the optical center of the lens. The AVP’s “foveated rendering” makes it look worse than it is, but you can still see the pixel grid with the others. Of the micro-OLED headsets, the BSB and SXR seem to do the best regarding sharpness in the periphery. The Hypervision HO140 pixels seem much less distorted and blurry than any of the headsets, including the MQP and MP3, which have much smaller FOVs.

Micro-OLED vs. Mini-LCD Challenges

Micro-OLEDs are made by applying OLEDs on top of a CMOS substrate. CMOS transistors provide a high current per unit area, and all the transistors and circuitry are underneath the OLED pixels, so it doesn’t block light. These factors enable relatively small pixels of 6.3 to 10 microns. However, CMOS substrates are much more expensive per unit area, and modern semiconductor FABs limit of CMOS devices is about 1.4-inch diagonal (ignoring expensive and low-yielding “reticle stitched” devices).

A basic issue with OLEDs is that the display device must provide the power/current to drive each OLED. In the case of LCDs, only a small amount of capacitance has to be driven to change the pixel, after which there is virtually no current. The table on the right (which I discussed in 2017) shows the transistor mobility and the process requirements for the transistors for various display backplanes. The current need for an emitting display device like OLEDs and LEDs requires crystalline silicon (e.g., CMOS) or much larger thin-film transistors on glass. There are also issues of the size and resistivity of the wires used to provide the current and heat issues.

The OLED’s requirement for significant current/power limits how small the pixels can get on a given substrate/technology. Thin-film transistors have to be physically big to supply the current. For example, the Apple Watch Ultra Thin Film transistor OLED display has 326 PPI (~78 microns), which is more than 10x larger linearly (100x the area) than the Apple Vision Pro’s pixel, even though both are “OLEDs.”

Another issue caused by trying to support large FOVs with small devices is that the higher magnification reduces eye relief. Most of the “magnification” comes from moving the device closer to the eye. Thus, LCD headsets tend to have more eye relief. Sony’s XR headset is an exception because it has enough eye relief for glasses but does so with a smaller FOV than the other headsets.

Small LCDs used in VR displays have different challenges. They are made on glass substrates, and the transistors and circuitry must be larger. Because they are transmissive, this circuitry in the periphery of each pixel blocks light and causes more of a screen door effect. The cost per unit area is much lower than that of CMOS, and LCD devices can be much larger. Thus, less aggressive optical magnification is required for the same FOV with LCDs.

LCDs face a major challenge in making the pixels smaller to support higher resolution. As the pixels get smaller, the size of the circuitry relative to the pixel size becomes bigger, blocking more light and causing a worse screen door effect. To make the pixels smaller, they must develop higher-performance thin-film transistors and lower resistance interconnection to keep blocking too much light. This subject is discussed in an Innolux Research Paper published by SPIE in October 2023 (free to download). Innolux discusses how to go from today’s typical “small” LCD pixel of 1200 ppi (=~21 microns) to their research device with 2117 ppi (=~12 microns) to achieve a 3840 x 3840 (4K by 4k) display in a 2.56″ diagonal device. Hypervision’s HO140 white paper discusses Innolux’s 2022 research prototype with the same pixel size but with 3240×3240 pixels and a 2.27-inch panel, as well as the current prototype. The current HO140 uses a BOE 2.56″ 2160×2160 panel with 21-micron pixels, as the Innolux panel is not commercially available.

Some micro-OLED and small LCD displays for VR

YouTuber Brad Lynch of SadlyItsBradley, in an X post, listed the PPI of some common VR headset display devices. I have added more entries and the pixel pitch in microns. Many VR panels are not rectangular and may have cut corners on the bottom (and top). The size of the panels given in inches is for the longest diagonal. As you can see, Innolux’s prototypes have significantly smaller pixels, but almost 2x linearly, than the VR LCDs in volume production today:

  • Vive: 3.6″, 1080p, ~360 PPI (70 microns)
  • Rift S*: 5.5″, 1280P, ~530 PPI (48 microns)
  • Valve Index: 3.5″, 1440p, ~600 PPI (42 microns)
  • Quest 2*: 5.5″, 1900p, ~750 PPI (34 microns)
  • Quest 3: ~2.55″ 2064 × 2208, 1050 PPI (24 microns) – Pancake Optics
  • Quest Pro: 2.5″, 1832×1920, ~1050 PPI (24 microns) – Might be BOE 2.48″ miniLED LCD
  • Varjo Aero: 3.2″, 2880p, ~1200 PPI (21 microns)
  • Pico 4: 2.5″, 2160p, 1192 PPI (21 microns)
  • BOE 2.56″ LCD, 2160×2160, 1192 PPI (21 microns) – Used in Hypervision HO140 at AWE 2024
  • Innolux 2023 Prototype 2.56″, 3840×3840, 2117 ppi (12 microns) -Research prototype
  • Apple Vision Pro 1.4″ Micro-OLED, 3,660×3,200, 3386 PPI (7.5 microns)
  • SeeYa 1.03″ Micro-OLED, 2560×2560, 3528 PPI (7.2 microns) – Used in Big Screen Beyond
  • Sony ~1.3″ Micro-OLED, 3552 x 3840, 4032 PPI (6.3 microns) – Sony XR
  • BOE 1.35″ Micro-OLED 3552×3840, 4032 PPI (6.3 microns) – Demoed at Display Week 2024

In 2017, I wrote Near Eye Displays (NEDs): Gaps In Pixel Sizes (table from that article on the right) talks about what I call the pixel size gap between microdisplays (on Silicon) and small LCDs (on glass). While the pixel sizes have gotten smaller for both micro-OLED and LCDs for VR in the last ~7 years, there remains a sizable gap.

Contrast – Factoring the Display and Pancake Optics

Micro-OLEDs at the display level certainly have a better inherent black level and can turn pixels completely off. LCDs work by blocking light using cross-polarization, which results in imperfect blacks. Thus, with micro-OLEDs, a large area of black will look black, whereas with LCDs, it will be dark gray.

However, we are not looking at the displays directly but through optics, specifically pancake optics, which dominate new VR designs today. Pancake optics, which use polarized light and QWP to recirculate the image twice through parts of the optics, are prone to internal reflections that cause “ghosts” (somewhat out-of-focus reflections) and contrast loss.

Using smaller micro-OLEDs requires more “aggressive” optical designs that support higher magnification to support a wide FOV. These more aggressive optical designs can be more prone to being more expensive, less sharp, and loss of polarization. Any loss of polarization in pancake optics will cause a loss of contrast and ghosting. There seems to be a tendency with pancake optics for the stray light to bounce around and end up in the periphery of the image, causing a glow if the periphery of the image is supposed to be black.

For example, the AVP is known to have an outer “glow” when watching movie content on a black background. Most VR headsets default to a “movie or home theater” rather than a background. While it may be for aesthetics, the engineer in me thinks it might help hide the glow. People online suggest turning on some background with the AVP for people bothered by the glow on a black background.

The complaints of outer glow when watching movies seem more prevalent when using headsets micro-OLEDs, but this is hardly scientific. It could be just that the micro-OLEDs have a better black level and make the glow more noticeable, but it might also be caused by their more aggressive optical magnification (something that might be or has been (?) studied). My key point is that it is not as simple as considering the display’s inherent contrast, you have to consider the whole optical system.

LightPolymers’ Alternative to Plastic Films for QWP & Polarizers

LightPolymers has a Lyotropic (water-based) Liquid Crystal (LC) material that can make optical surfaces like QWP and polarizers. Silicon Optix, which the blog broke the news of Meta buying them in December 2021 (Exclusive: Imagine Optix Bought By Meta), was also developing LC-based polarized light control films.

Like Silicon Optix, Light Polymers has been coating plastic films with LCs, but LightPolymers is developing the ability to directly apply their films to flat and curved lenses, which is a potential game changer. In April 2024, LightPolymers and Hypervision announced the joint development of this lens-coating technology and had a poster in their Hypervision’s booth showing it (right)

3M Dominates Polarized Light Plastic Films for Pancake Optics

3M is today the dominant player in polarized light-control plastic films and is even more dominant in these films for pancake optics. At 3M’s SID Display Week booth in June 2024, they showed the ByteDance PICO4, MQP, and MQ3 pancake optics using 3M polarization films. Their films are also used in the Fresnel lens-based Quest 2. It is an open secret (but 3M would not confirm or deny) that the Apple Vision Pro also uses 3M polarization films.

According to 3M:

3M did not invent the optical architecture of pancake lenses. However, 3M was the first company to successfully demonstrate the viability of pancake lenses in VR headsets by combining it with its patented reflective polarizer technology.

That same article supports Kopin’s (now spun out to Lightning Silicon) claims to have been the first to develop pancake optics. Kopin has been demonstrating pancake optics combined with their Micro-OLEDs for years, which are used in Panasonic-ShiftAll headsets.

3M’s 2017 SPIE Paper Folded Optics with Birefringent Reflective Polarizers discusses the use of their films (and also mentions Kopin developments) in cemented (e.g., AVP) and air gap (e.g., MQP and MP3) pancake optics. The paper also discusses how their polarization films can be made (with heat softening) to conform to curved optics such as the AVP.

LightPolymers’ Potential Advantage over Plastic Films

The most obvious drawbacks of plastic films are that they are relatively thick (on the order of 70+ microns per film, and there are typically multiple films per lens) and are usually attached using adhesive coatings. The thickness, particularly when trying to conform to a curved surface, can cause issues with polarized light. The adhesives introduce some scatter, resulting in some loss of polarization.

By applying their LCs directly to the lens, LightPolymer claims they could reduce the thickness of the polarization control (QWP and Polarizers) by as much as 10x and would eliminate the use of adhesives.

In the photos below (taken with a 5x macro lens), I used a knife to slightly separate the edges of the films from the Meta Quest 3’s eye-side and display-side lenses to show them. On the eye-side lens, there are three films, which are thought to be a QWP, absorptive polarizer, and reflective polarizer. On the display-side lens, there are two films, one of which is a QWP, and the other may be just a protective film. In the eye-side lens photo, you can see where the adhesive has bubbled up after separation. The diagram on the right shows the films and paths for light with the MQ3/MQP pancake optics.

Because LighPolymers’ LC coating is applied to each lens, it could also be applied/patterned to improve or compensate for other issues in the optics.

Current State of LightPolymer’s Technology

LightPolymers is already applying its LC to plastic films and flat glass. Their joint agreement with Hypervision involves developing manufacturable methods for directly applying the LC coatings to curved lens surfaces. This technology will take time to develop. LightPolymer business of making the LC materials and then works with partners such as Hypervision to apply the LC to their lenses. They say the equipment necessary to apply the LCs is readily available and low-cost (for manufacturing equipment).

Conclusion

Hypervision has demonstrated the ability to design very wide FOV pancake optics with a large optical sweet spot and maintains a larger area of sharpness than any other design I have seen.

Based on my experience in both Semiconductors and Optics, I think Hypervision makes a good case in their white paper 60PPD: by fast LCD but not by micro OLED, getting to a wide FOV while approaching “retinal” 60PPD is more likely to happen using LCD technology than micro-OLEDs.

Fundamentally, micro-OLEDs are unlikely to get much bigger than 1.4″ diagonally, at least commercially, for many years, if not more than a decade. While they could make the pixels smaller, today’s pancake optics struggle to resolve ~7.5-micron pixels, no less small ones.

On the other hand, several companies, including Innoulux and BOE, have shown research prototypes of 12-micron LCD pixels, or half the (linear) size of today’s LCDs used in VR headsets in high volume. If BOE or Innolux went into production with these displays, it would enable Hypervision’s HO140 to reach about 48 PPD in the center with a roughly 140-degree FOV, and only small incremental changes would get them to 60 PPD with the same FOV.

Appendix: More on Hypervision

I first encountered Hypervision at AWE 2021 with their blended Fresnel lens 240-degree design, but as this blog primarily covered optical AR, it slipped under my radar. Since then, I have been covering Optical and Pass-Through mixed reality, particularly pass-through MR using Pancake Optics. By AR/VR/MR 2023, Hypervsion demonstrated a single lens (per eye) 140-degree and a blended dual lens and display 240-degree FOV (diagonal) Pancake Optics designs.

These were vastly better than their older Fresnel designs and demonstrated Hypervision’s optical design capability. In May 2023, passthrough MR startup Lynx and Hypervision announced they were collaborating. For some more background on my encounters with Hypervision, see Hypervision Background.

Hypervision has been using its knowledge of pancake optics to analyze the Apple Vision Pro’s optical design, which I have reported on in Hypervision: Micro-OLED vs. LCD – And Why the Apple Vision Pro is “Blurry,” Apple Vision Pro Discussion Video by Karl Guttag and Jason McDowall, Apple Vision Pro – Influencing the Influencers & “Information Density,” and Apple Vision Pro (Part 4)—Hypervision Pancake Optics Analysis.

Apple Unveils New Immersive Video Series and Films Coming to Vision Pro

19 July 2024 at 12:03

Apple is launching a new slate of immersive video content for Vision Pro using Apple Immersive Video, a 180-degree video format combining 8K 3D video and spatial audio for immersive experiences.

Premiering on July 18th is Boundless, which features unique travel adventures, starting with a hot air balloon journey over Cappadocia, Türkiye—the episode suitably named ‘Hot Air Balloons’.

In August, Vision Pro users will get to see the next installment of Wild Life, a nature documentary series captured at Kenya’s Sheldrick Wildlife Trust. Coming in September is Elevated, which promises aerial views of stunning landscapes, beginning with Hawaii.

Image courtesy Apple

Additionally, Apple will release an immersive experience featuring The Weeknd, the first scripted short film Submerged by Edward Berger, and exclusive sports content like behind-the-scenes access to the 2024 NBA All-Star Weekend.

A new sports series with Red Bull called Big-Wave Surfing will also be available, showcasing elite surfers tackling massive waves in Tahiti.

Apple says it’s collaborating with Blackmagic Design to enhance the production of Apple Immersive Video with new tools and workflows, including the URSA Cine Immersive camera.

All of these immersive videos will be accessible for free via the Apple TV app in several countries.

The post Apple Unveils New Immersive Video Series and Films Coming to Vision Pro appeared first on Road to VR.

AWE 2024 Panel: The Current State and Future Direction of AR Glasses

29 June 2024 at 23:22

Introduction

At AWE 2024, I was on a panel discussion titled “The Current State and Future Direction of AR Glasses.” Jeri Ellsworth, CEO of Tilt Five, Ed Tang, CEO of Avegant, Adi Robertson, Senior Reporter at The Verge, and I were on the panel, with Jason McDowell, The AR Show, moderating. Jason McDowell did an excellent job of moderation and keeping the discussion moving. Still, with only 55 minutes, including questions from the audience, we could only cover a fraction of the topics we had considered discussing. I’m hoping to reconvene this panel sometime. I also want to thank Dean Johnson, Associate Professor at Western Michigan University, who originated the idea and helped me organize this panel. AWE’s video of our panel is available on YouTube.

First, I will outline what was discussed in the panel. Then, I want to follow up on small FOV optical AR glasses and some back-and-forth discussions with AWE Legend Thad Starner.

Outline of the Panel Discussion

The panel covered many topics, and below, I have provided a link to each part of our discussion and added additional information and details for some of the topics.

  • 0:00 Introductions
  • 2:19 Apple Vision Pro (AVP) and why it has stalled. It has been widely reported that AVP sales have stalled. Just before the conference, The Information reported that Apple had suspended the Vision Pro 2 development and is now focused on a lower-cost version. I want to point out that a 1984 128K Mac 1 adjusted for inflation would cost over $7,000 adjusted for inflation, and the original 1977 Apple 2 4K computer (without a monitor or floppy drive) would cost about $6,700 in today’s dollars. I contend that utility and not price is the key problem with the AVP sales volume and that Apple is thus drawing the wrong conclusion.
  • 7:20 Optical versus Passthrough AR. The panel discusses why their requirements are so different.
  • 11:30 Mentioned Thad Starner and the desire for smaller FOV optical AR headsets. It turns out that Thad Starner attended our panel, but as I later found out, he arrived late and missed my mentioning him. Thad, later questioned the panel. In 2019, I wrote the article FOV Obsession, which discussed Thad’s SPIE AR/VR/MR presentation about smaller FOV. Thad is a Georgia Institute of Technology professor and a part-time Staff Researcher at Google (including on Google Glass). He has continuously worn AR devices since his research work at MIT’s media lab in the 1990s.
  • 13:50 Does “tethering make sense” with cables or wirelessly?
  • 20:40 Does an AR device have to work outside (in daylight)?
  • 26:49 The need to add displays to today’s Audio-AI glasses (ex. Meta Ray-Ban Wayfarer).
  • 31:45 Making AR glasses less creepy?
  • 35:10 Does it have to be a glasses form factor?
  • 35:55 Monocular versus Biocular
  • 37:25 What did Apple Vision Pro get right (and wrong) regarding user interaction?
  • 40:00 I make the point that eye tracking and gesture recognition on the “Apple Vision Pro is magical until it is not,” paraphrasing Adi Robertson, and I then added, “and then it is damn frustrating.” I also discuss that “it’s not truly hands-free if you have to make gestures with your hands.”
  • 41:48 Waiting for the Superman [savior] company. And do big companies help or crush innovation?
  • 44:20 Vertical integration (Apple’s big advantage)
  • 46:13 Audience Question: When will AR glasses replace a smartphone (enterprise and consumer)
  • 49:05 What is the first use case to break 1 million users in Consumer AR?
  • 49:45 Thad Starner – “Bold Prediction” that the first large application will be with small FOV (~20 degrees), monocular, and not centered in the user’s vision (off to the ear side by ~8 to 20 degrees), and monochrome would be OK. A smartphone is only about 9 by 15 degrees FOV [or ~20 degrees diagonally when a phone is held at a typical distance].
  • 52:10 Audience Question: Why aren’t more companies going after OSHA (safety) certification?

Small FOV Optical AR Discussion with Thad Starner

As stated in the outline above, Thad Starner arrived late and missed my discussion of smaller FOVs that mentioned Thad, as I learned after the panel. Thad, who has been continuously wearing AR glasses and researching them since the mid-1990s, brings an interesting perspective. Since I first saw and met him in 2019, he has strongly advocated for AR headsets having a smaller FOV.

Thad also states that the AR headset should have a monocular (single-eye) display and be 8—to 20 degrees on the ear side of the user’s straight-ahead vision. He also suggests that monochrome is fine for most purposes. Thad stated that his team will soon publish papers backing up these contentions.

In the sections below, I went from the YouTube transcript and did some light editing to make what was said more readable.

My discussion from earlier in the panel:

11:30 Karl Guttag – I think a lot of the AR or Optical see-through gets confabulated with what was going on in VR because VR was cheap and easy to make a wide field of view by sticking a cell phone with some cheap Optics in front of your face. You get a wide field of view, and people went crazy about that. I made this point years ago on my blog [2019 article FOV Obsession] was the problem. Thad Starner makes this point: he’s one of our Legends at AWE, and I took that to heart many years ago at SPIE AR/VR/MR 2019.

The problem is that as soon as you say beyond about 30-degree field of view, even projecting forward [with technology advancements], as you go beyond 30-degree field of view, you’re in a helmet, something looking like Magic Leap. And Magic Leap ended up in Nowheresville. [Magic Leap] ended up with 25 to 30% see-through, so it’s not really that good see-through, and yet it’s not got the image quality that you would get of an old display shot right in your eyes. You might you could get a better image on an Xreal or something like that.

People are confabulating too many different specs, so they want a wide field of view. The problem is as soon as you say 50 degrees and then you say, yeah, and I need like spatial recognition, I want to do SLAM, and I want to do this, and I want to do that. You’ve now spiraled into the helmet. I mean, you know, Meta was talking the other day about the other panels and said they’re looking at about 50 grams [for the Meta Ray Bans], and my glasses are 23 grams. You’re out of that as soon as you say 50-degree field of view, you’re over 100 grams and and and and and heading to the Moon as you add more and more cameras and all this other stuff, so I think that’s one of our bigger problems whereas AR really Optical AR.

The experiment we’re going to see played out because many companies are working on adding displays to to so called AI audio glasses. We’re going to see if that works because companies are getting ready to make glasses that have 20—to 30-degree field of view glasses tied into AI and audio stuff.

Thad Starner’s comments and the follow-up discussion during the Q&A at the end of the panel:

AWE Legend Thad Starner Wearing Vuzix’s Ultralight Glasses – After the Panel

49:46 Hi, my name is Thad Starner. I’m Professor Georgia Tech. I’m going to make a bold prediction here that the future, at least the first system to sell over a million units, will be a small field of view monocular, non-line-of-sight display, monochrome is okay now; the reason I say that is number one I’ve done different user studies in my lab that we’ll be publishing soon on this subject but the other thing is that you know our phones which is the most popular interface out there are only 9 degrees by 16 degrees field of view. Putting something outside of the line of sight means that it doesn’t interrupt you while you’re crossing the street or driving or flying a plane, right? We know these numbers, so between 8° and 20 degrees towards the ear and plus or minus 8 degrees, I’m looking at Karl [Guttag] here so he can digest all these things.

Karl – I wrote a whole article about it [FOV Obsession]

Thad – And not having a pixel in line of sight, so now feel free to pick me apart and disagree with me.

Jeri-  I want to know a price point.

Thad, I think the first market will be captioning for the heart of hearing, not for the deaf. Also, possible transcription, not translation; at that price point, you’re talking about making reading glasses for people instead of hearing aids. There’s a lot of pushback against hearing, but reading glasses people tend to do, so I’d say you’re probably in the $200 to $300 range.

Ed – I think your prediction is spot on, minus the color green. The only thing I think is that it’s not going to fly.

Thad – I said monochrome is okay.

Ed – I think the monocular field of view is going to be an entry-level product, and you see, I think you will see products that will fit that category with roughly that field of view with roughly that offset angle [not in the center of view] is what you’re going to see in the beginning. Yeah I agree with that but I don’t I think that’s the first step I think you will see a lot of products after that that’s going to do a lot more than monocular monochrome offset displays, start going to larger field of view binocular I think that will happen pretty quickly.

Adi – It does feel like somebody tries to do that every 18 months, though, like Intel tried to make a pair of glasses that did that. It’s a little bit what North did. I guess it’s just a matter of throwing the idea at the wall because I think it’s a good one until it takes.

I was a little taken aback to have Thad call me out as if I had disagreed with him when I had made the point about the advantages of a smaller FOV earlier. Only after the presentation did I find out that he had arrived late. I’m not sure what comment I made that made Thad think I was advocating for a larger FOV in AR glasses.

I want to add that there can be big differences between what consumers and experts will accept in a product. I’m reminded of a story I read in the early 1980s when there was a big debate between very high-resolution monochrome versus lower-resolution color (back then, you could only have one or the other with CRTs) that the head of IBM’s monitor division said, “Color is the least necessary and most desired feature in a monitor.” All the research suggested that resolution was more important for the tasks people did on a computer at the time, but people still insisted on color monitors. Another example is the 1985 New Coke fiasco, in which Coke’s taste studies proved that people liked New Coke better, but it still failed as a product.

In my experience, a big factor is whether the person is being trained to use the device for enterprise or military use versus whether the user is buying it for their own enjoyment. The military has used monochrome displays on devices, including night vision and heads-up displays for decades. I like to point out that the requirement can change if “If the user paid to use versus is paying to use.” Enterprises and the military care about whether the product gets the job done and pay someone to use the device. The consumer has different criteria. I will also agree that there are cases where the user is motivated to be trained, such as Thad’s hard-of-hearing example.

Conclusion on Small FOV Optical AR

First, I agree with Thad’s comments about the smaller FOV and have stated such before. There are also cases outside of enterprise and industrial use where the user is motivated to be trained, such as Thad’s hard-of-hearing example. But while I can’t disagree with Thad or his studies that show having a monocular monochrome image located outside the line of sight is technically better, I think consumers will have a tougher time accepting a monocular monochrome display. What you can train someone to use differs from what they would buy for themselves.

Thad makes a good point that having a biocular display directly in the line of sight can be problematic and even dangerous. At the same time, untrained people don’t like monocular displays outside the line of sight. It becomes (as Ed Tang said in the panel) a point of high friction to adoption.

Based on the many designs I have seen for AR glasses, we will see this all played out. Multiple companies are developing optical see-through AR glasses with monocular green MicroLEDs, color X-cube-based MicroLEDs, and LCOS-based displays with glass form-factor waveguide optics (both diffractive and reflective).

Report: Apple Focuses on More Affordable Vision Headset Over High-end Follow-up

18 June 2024 at 18:01

At $3,500, Vision Pro is undoubtedly expensive, which many are rightfully hoping will be remedied in a prospective follow-up. Now, according to a report from The Information, Apple may be ditching the ‘Pro’ aspect of its next-gen Vision headsets altogether, instead aiming to release a single “more affordable” device in late 2025.

It’s rumored that Apple was slated to release two headsets: an expensive Pro-style device and a cheaper version targeted more squarely at consumers, much like how the company positions iPhone in its lineup today.

Now, citing an employee at a manufacturer that makes key components for the Vision Pro, The Information reports Apple has suspended work on that high-end follow-up due to slowing sales of the $3,500 Vision Pro.

Image courtesy Apple

There may be hope though, at least for anyone without the budget to shell out what amounts to a good used Honda Civic. According sources both involved in the supply chain and in the manufacturing of the headset, the company is “still working on releasing a more affordable Vision product with fewer features before the end of 2025.”

Granted, it’s important to note that Apple often leaks incorrect information in a bid to nail prospective leakers, so this (and any Apple report for that matter) should be taken with a heaping handful of salt.

This follows Apple’s announcement it was getting set to release Vision Pro outside of the US for the first time, which includes mainland China, Hong Kong, Japan, Singapore, Australia, Canada, France, Germany, and the UK.

– – — – –

Whether it’s “more affordable” or not, there’s a lot Apple can do to appeal to the masses without drastically sacrificing quality. Check out our article on the 6 Things Vision Pro Needs Before It Can Go Mainstream to see how.

The post Report: Apple Focuses on More Affordable Vision Headset Over High-end Follow-up appeared first on Road to VR.

Hypervision: Micro-OLED vs. LCD – And Why the Apple Vision Pro is “Blurry”

14 June 2024 at 19:50

Introduction

The optics R&D  company Hypervision provided a detailed design analysis of the Apple Vision Pro’s optical design in June 2023 (see Apple Vision Pro (Part 4) – Hypervision Pancake Optics Analysis). Hypervision just released an interesting analysis exploring whether Micro-OLEDs, as used by the Apple Vision Pro, or LCDs used by Meta and most others, can support high 60 pixels per degree, angular resolution, and a wide FOV. Hypervision’s report is titled 60PPD: by fast LCD but not by micro OLED.

The optics R&D  company Hypervision provided a detailed design analysis of the Apple Vision Pro’s optical design in June 2023 (see Apple Vision Pro (Part 4) – Hypervision Pancake Optics Analysis). Hypervision just released an interesting analysis exploring whether Micro-OLEDs, as used by the Apple Vision Pro, or LCDs used by Meta and most others, can support high 60 pixels per degree, angular resolution, and a wide FOV. Hypervision’s report is titled 60PPD: by fast LCD but not by micro OLED. I’m going to touch on some highlights from Hypervision’s analysis. Please see their report for more details.

I Will Be at AWE Next Week

AWE is next week. I will be on the PANEL: Current State and Future Direction of AR Glasses at AWE on Wednesday, June 19th, from 11:30 AM to 12:25 PM. I still have a few time slots. If you want to meet, please email meet@kgontech.com.

AWE has moved to Long Beach, CA, south of LA, from its prior venue in Santa Clara. Last year at AWE, I presented Optical Versus Passthrough Mixed Reality, which is available on YouTube. This presentation was in anticipation of the Apple Vision Pro.

An AWE speaker discount code – SPKR24D- provides a 20% discount. You can register for AWE here.

Apple Vision Pro Sharpness Study at AWE 2024 – Need Help

As Hypervision’s analysis finds, plus reports I have received from users, the Apple Vision Pro’s sharpness varies from unit to unit. AWE 2024 is an opportunity to sample many Apple Vision Pro headsets to see how the focus varies from unit to unit. I will be there with my high-resolution camera.

While not absolutely necessary, it would be helpful if you could download my test pattern, located here, and install it on your Apple Vision Pro. If you want to help, contact me via meet@kgontech.com or flag me down at the show. I will be spending most of my time on the Expo floor. If you participate, you can remain anonymous or receive a mention of you or your company at the end of a related article thanking you for your participation. I can’t promise anything, but I thought it would be worth trying.

AVP Burry Image Controversy

My article Apple Vision Pro’s Optics Blurrier & Lower Contrast than Meta Quest 3 was the first to report that the AVP was a little blurry. I compared high-resolution pictures showing the same FOV with the AVP and the Meta Quest 3 (MQ3) in that article.

This article caused controversy and was discussed in many forums and influencers, including Linus Tech Tips and Marquess Brownlee (see Apple Vision Pro—Influencing the Influencers & “Information Density” and “Controversy” of the AVP Being a Little Blurry Discussed on Marques Brownlee’s Podcast and Hugo Barra’s Blog).

I have recently been taking pictures through Bigscreen Beyond’s (BSB) headset and decided to compare it with the same test (above right). In terms of optical sharpness, it is between the AVP and the MQ3. Interestingly, the BSB headset has a slightly lower angular resolution (~32 pixels per degree) than the AVP (~40 ppd) in the optically best part of the lens where these crops were taken. Yet, the text and line patterns look better on the BSB than AVP.

Hypervision’s Correction – The AVP is Not Out of Focus, and the Optics are Blurry

I speculated that the AVP seemed out of focus in Apple Vision Pro’s Optics Blurrier & Lower Contrast than Meta Quest 3. Hypervision corrected me that the softness could not be due to being out of focus. Hypervision has found that sharpness varies from one AVP to the next. The AVP’s best focus nominally occurs with an apparent focus of about 1 meter. Hypervision pointed out that if the headset’s device focus were slightly wrong, it would simply shift the apparent focus distance as the eye/camera would adjust to a small change in focus (unless it was so far off that eye/camera focusing was impossible). Thus, the blur is not a focus problem but rather a resolution problem with the optics.

Hypervision’s Analysis – Tolerances Required Beyond that of Today’s Plastic Optics

The AVP has very aggressive and complex pancake optics for a compact form factor while supporting a wide FOV with a relatively small Micro-OLED. Most other pancake optics have two elements, which mate with a flat surface for the polarizers and quarter waveplates that manipulate the polarized light to cause the light to pass through the optics twice (see Meta example below left). Apple has a more complex three-lens optic with curved polarizers and quarter waveplates (below right).

Based on my studies of how the AVP dynamically adjusts optical imperfections like chroma aberrations based on eye tracking, the AVP’s optics are “unstable” because, without dynamic correction, the imperfections would be seen as much worse.

Hypervision RMS Analysis

Hypervision did an RMS analysis comparing a larger LCD panel with a small Micro-OLED. It should probably come as no surprise that requiring about 1.8x (2.56/1.4) greater magnification makes everything more critical. The problem, as Hypervision points out, is that Micro-OLED on silicon can’t get bigger for many years due to semiconductor manufacturing limitations (reticle limit). Thus, the only way for Micro-OLED designs to support higher resolution and wider FOV is to make the pixels smaller and the optics much more difficult.

Hypervision Monte-Carlo Analysis

Hypervision then did a Monte-Carlo analysis factoring in optical tolerances. Remember, we are talking about fairly large plastic-molded lenses that must be reasonably priced, not something you would pay hundreds of dollars for in a large camera or microscope.

Hypervision’s 140 Degree FOV with 60PPD Approach

Hypervision believes that the only practical path to ~60PPD and ~140-degree FOV is with a 2.56″ LCD display. LCDs’ natural progression toward smaller pixels will enable higher resolution than their optics can support.

Conclusion

Overall, Hypervision makes a good case that current designs with Micro-OLED with pancake optics are already pushing the limits of reasonably priced optics. Using technology with somewhat bigger pixels makes resolving them easier, and having a bigger display makes supporting a wider FOV less challenging.

It might be that the AVP is slightly burry because it is already beyond the limits of a manufacturable design. So the natural question is, if AVP already has problems, how could they support higher resolution and wider FOV?

The size of Micro-OLEDs built on silicon backplanes is limited by a reticle limit of chip size of above ~1.4″ diagonally, at least without resorting to multiple reticle “stitching” (which is possible but not practical for a cost-effective device). Thus, for Micro-OLEDs to increase resolution, the pixels must be smaller, requiring even more magnification out of the optics. Then, increasing the FOV will require even more optical magnification of ever-tinier pixels.

LCDs have issues, particularly with black levels and contrast. Smaller illumination LEDs with local dimming may help, but they have not proven to work as well as micro-OLEDs.

Apple Launches Vision Pro Pre-Orders in UK, Australia, Canada, France & Germany Today

28 June 2024 at 16:28

Apple launched pre-orders for Vision Pro in a number of Asian countries two weeks ago, and customers there are seeing units ship starting today. Today also marks the next slate of regions to get a crack at pre-orders too, with shipment coming in mid-July.

Apple today launched pre-orders for Vision Pro in Australia, Canada, France, Germany, and the UK; availability is set to begin on July 12th in those countries.

The company also revealed local pricing in those regions, with the cheapest 256 GB variant fetching $5,999 (AUD), $4,999 (CAD), €3,999 (EUR) and £3,499 (GBP). The original article announcing Vision Pro’s international launch follows below:

According to Chinese language outlet VRtuoluo, mainland China customers are being offered 30-minute trial slots at Apple’s China-based stores, which covers all 16 provinces and cities covered by the company. On-site demos start there from its June 28th launch date.

Apple first confirmed it was launching in mainland China in late March, underlining the company’s unique access to that country’s domestic market, which behind the US and EU, is the world’s third-largest consumer market.

Notably, Meta was hoping to collaborate with Chinese tech giant Tencent to bring Quest to China late last year, however talks reportedly stalled, ostensibly making Meta’s access to that country as a non-starter. Today, Meta’s social platforms, including Facebook, Instagram, and WhatsApp, are all blocked there.

Meanwhile, Vision Pro is also coming to Hong Kong, Japan, and Singapore today, with similar trial and pre-order schemes available through the official Apple Store in those less strictly-controlled regions. The original article announcing international availability follows below:/vc_column_text]

Original Article (June 10th, 2024): Previously only available in the United States since its initial launch in February, Apple says it’s now bringing Vision Pro to mainland China, Hong Kong, Japan, Singapore, Australia, Canada, France, Germany, and the UK, which includes keyboard support for major world languages used in those countries.

Apple says users in mainland China, Hong Kong, Japan, and Singapore can pre-order Apple Vision Pro on June 13th, with earliest shipments coming June 28th. Customers in Australia, Canada, France, Germany, and the UK can pre-order on June 28th, with availability beginning on July 12th.

Dual Loop band | Image courtesy Apple

Apple says Vision Pro will include DingTalk, Douyin VR Live, Migu Video, Taobao, Tencent Video, and Weibo in China; apps from Yahoo! JAPAN, LIFULL HOME’S, U-NEXT, and Nikkei in Japan; and Singtel CAST, StarHub TV+, and mewatch in Singapore.

Upcoming apps also include MUBI and Soul Spire in the UK; Canal+, Foxar, OQEE, and SeLoger in France; BILD, OTTO, and ZDF in Germany; Classix and Sportsnet in Canada; and Domain in Australia.

This follows earlier reports in March that Apple would indeed be launching in mainland China in addition to a number of the countries mentioned above. In practice, this gives the Cupertino tech giant a critical reach beyond Meta, which is barred from operating its apps and services in mainland China.

Earlier this year it was reported that talks between Meta and Chinese tech giant Tencent had stalled, which may have otherwise opened up some avenue for Meta hardware to launch inside the tightly-controlled country.

The post Apple Launches Vision Pro Pre-Orders in UK, Australia, Canada, France & Germany Today appeared first on Road to VR.

Cogni Trax & Why Hard Edge Occlusion Is Still Impossible (Behind the Magic Trick)

28 May 2024 at 02:33

Introduction

As I wrote in 2012’s Cynics Guide to CES—Glossary of Terms, when you see a demo at a conference, “sometimes you are seeing a “magic show” that has little relationship to real-world use.” I saw the Cogni Trax hard edge occlusion demo last week at SID Display Week 2024, and it epitomized the concept of being a “magic show.” I have been aware of Congi Trax for at least three years (and commented about the concept on Reddit), and I discovered they quoted me (I think a bit out of context) on its website (more on this later in the Appendix).

Cogni Trax has reportedly raised $7.1 million in 3 funding rounds over the last ~7 years, which I plan to show is unwarranted. I contacted Cogni Trax’s CEO (and former Apple optical designer on the Apple Vision Pro), Sajjad Khan, who was very generous in answering questions despite his knowing my skepticism about the concept.

Soft- Versus Hard-Edge Occlusion

Soft Edge Occlusion

In many ways, this article follows up on my 2021 Magic Leap 2 (Pt. 3): Soft Edge Occlusion, a Solution for Investors and Not Users, which detailed why putting an LCD in front of glass results in very “soft” occlusion.

Nobody will notice if you put a pixel-sized (angularly) dot on a person’s glasses. If it did, every dust particle on a person’s glasses would be noticeable and distracting. That is because a dot only a few millimeters from the eye is highly out of focus, and light rays from the real world will go around the dot before they are focused by the eye’s lens. That pixel dot will insignificantly dim several thousand pixels in the virtual image. As discussed in the Magic Leap soft occlusion article, the Magic Leap 2’s dimming pixel will cover ~2,100 pixels (angularly) in the virtual image and have a dimming effect on hundreds of thousands of pixels.

Hard Edge Occlusion (Optical and Camera Passthrough)

“Hard Edge Occlusion” means the precise, pixel-by-pixel light blocking. With camera passthrough AR (such as Apple Vision Pro), hard edge occlusion is trivial; one or more camera pixels are replaced by one or more pixels in the virtual image. Even though masking pixels is trivial with camera passthrough, there is still a non-trivial problem with getting the hard edge masking perfectly aligned to the real world. With passthrough mixel reality, the passthrough camera with its autofocus has focused the real world so it can be precisely masked.

With optical mixed reality hard edge occlusion, the real world must also be brought into focus before it can be precisely masked. Rather than going to a camera, the real world’s light goes to a reflective masking spatial light modular (SLM), typically LCOS, before combining it optically with the virtual image.

In Hard Edge (Pixel) Occlusion—Everyone Forgets About Focus, I discuss Arizona State University’s (ASU) optical solution for hard edge occlusion. Their solution has a set of optics that focuses the real world onto an SLM for masking. Then, a polarizing beam-splitting cube combines the result (with a change in polarization via two passes through a quarter waveplate not shown) after masking with a micro-display. While the ASU patent mentions using a polarizing beam splitter to combine the images, the patent fails to show or mention the need for a quarter waveplate between the SLM and beam splitter to work. One of the inventors, Hong Hua, was an ASU professor and a consultant to Magic Leap, and the patent was licensed to Magic Leap.

Other than being big and bulky, optically, what is wrong with the ASU’s hard edge occlusion includes:

  • It only works to hard edge occlude at a distance set by the focusing. Ano
  • The real world is “flatted” to be at the same focus as the virtual world.
  • Polarization dims the real world by at least 50%. Additionally, viewing a polarized display device (like a typical LCD monitor or phone display) will be at least partially blocked by an amount that will vary with orientation relative to the optics.
  • The real world is dimmed by at least 2x via the polarizing beam splitter.
  • As the eye moves, the real world will move differently than it would with the eye looking directly. You are looking at the real world through two sets of optics with a much longer light path.

While Cogni Trax uses the same principle for masking the real world, it is configured differently and is much smaller and lighter. Both devices block a lot of light. Cogni Trax’s design blocks about 77% of the light, and they claim their next generation will block 50%. However, note that this is likely on top of any other light losses in the optical system.

Cogni Trax SID Display Week 2024 Demo

On the surface, the Cogni Trax demo makes it look like the concept works. The demo had a smartphone camera looking through the Cogni Trax optical device. If you look carefully, you will see that they block light from 4 areas of the real world (see arrow in the inset picture below), a Nike swoosh on top of the shoe, a QR code, the Coke in the bottle (with moving bubbles), and a partially darken the wall to the right to create a shadow of the bottle.

They don’t have a microdisplay with a virtual image; thus, they can only block or darken the real world and not replace anything. Since you are looking at the image on a cell phone and not with your own eyes, you have no sense of the loss of depth and parallax issues.

When I took the picture above, I was not planning on writing an article and missed capturing the whole setup. Fortunately, Robert Scoble put out an X-video that showed most of the rig used to align the masking to the real world. The rig supports aligning the camera and Cogni Trax device with six degrees of freedom. This demo will only work if all the objects in the scene are in a precise location relative to the camera/device. This is the epitome of a canned demo.

One could hand wave that developing SLAM, eye tracking, and 3-D scaling technology to eliminate the need for the rig is a “small matter of hardware and software” (to put it lightly). However, requiring a rig is not the biggest hidden trick in these demos; it is the basic optical concept and its limitations. The “device” shown (lower right inset) is only the LCOS device and part of the optics.

Cogni Trax Gen 1 Optics – How it works

Below is a figure of Congi Trax’s patent that will be used to diagram the light path. I have added some colorization to help you follow the diagram. The dashed-lined parts in the patent for combining the virtual image are not implemented in Cogni Trax’s current design.

The view of the real world follows a fairly torturous path. First, it goes through a polarizer where at least 50% of the light is lost (in theory, this polarizer is redundant due to the polarizing beam splitter to follow, but it is likely used to reduce any ghosting). It then bounces off of the polarizing beam splitter through a focusing element to bring the real world into focus on an LCOS SLM. The LCOS device will change the polarization of anything NOT masked so that on the return trip through the focusing element, it will pass through the polarizing beam splitter. The light then passes through the “relay optics,” then a Quarter Waveplate (QWP), off a mirror, and back through the quarter waveplate and relay optics. The two passes through the “relay optics” have to undo everything done to the light by the two passes through the focusing element. The two passes through the QWP will rotate the polarization of the light so that the light will bounce off the beam splitter and be directed at the eye via a cleanup polarizer. Optionally, as shown, the light can be combined with a virtual image from a microdisplay.

I find it hard to believe that real-world light will go through all that and will behave like nothing other than the light losses from polarization that have happened to it.

Cogni Trax provided a set of diagrams showing the light path of what they call “Alpha Pix.” I edited several of their diagrams together and added some annotations in red. As stated earlier, the current prototype does not have a microdisplay for providing a virtual image. If the virtual display device were implemented, its optics and combiner would be on top of everything else shown.

I don’t see this as a practical solution to hard-edge occlusion. While much less bulky than the ASU design, it still requires polarizing the incoming light and sending it through a torturous path that will further damage/distort real-world light. And this is before they deal with adding a virtual image. There is still the issue that the hard edge occlusion only works if everything being occluded is at approximately the same focus distance. If the virtual display is implemented, it would seem that the virtual image would need to be at approximately the same focus distance for it to be occluded correctly. Then, the hardware and software are required to get everything between the virtual and real world aligned with the eye. Even if the software and eye tracking were excellent, there where will still be a lag with any rapid head movement.

Cogni Trax Waveguide Design / Gen 2

Cogni Trax’s website and video discuss a “waveguide” solution for Gen 2. I found a patent (with excerpts right and below) from Cogni Trax for a waveguide approach to hard-edge occlusion that appears to agree with the diagrams in the video and on the website for their “waveguide.” I have outlined the path for the real world (in green) and the virtual image (in red).

Rather than using polarization, this method uses time-sequential modulation via a single Texas Instrument’s DLP/DMD. The DLP is used during part of the time block/pass light from the real world and as the virtual image display. I have included Figure 1(a), which gives the overall light path; Figures 1(c) and 1(d), which show the time multiplexing; Figure 6(a) with a front view of the design; and Figures 10 (a) and (b) which show a side view of the waveguide with the real world and virtual light paths respectively.

Other than not being polarized, the light follows a more torturous light path that includes a “fixed DMD” to correct for the micro-tilts of the real world by time-multiplexed displaying and masking DMD. In addition to all the problems I had with the Gen 1 design, I find putting the relatively small mirror (120 in Figure 1a) in the middle of the view very problematic as the view over or below the mirror will look very different than the view in the mirror with all the addiction optics. While it can theoretically give more light throughput and not require polarization of the real world, it can only do so by keeping the virtual display times short, which will mean more potential field sequential color breakup and lower color bit depth from the DLP.

Overall, I see Cogni Trax’s “waveguide” design as trading one set of problems for another set of probably worse image problems.

Conclusion

Perhaps my calling hard-edge occlusion a “Holy Grail” did not fully convey its impossibility. The more I have learned, examined, and observed this problem and its proposed solutions, the more clearly it seems impossible. Yes, someone can craft a demo that works for a tightly controlled setup with what is occluded at about the same distance, but it is a magic show.

The Cogni Trax demo is not a particularly good magic show, as it uses a massive 6-axis control rig to position a camera rather than letting the user put on a headset. Furthermore, the demo does not support a virtual display.

Cogni Trax’s promise of a future “waveguide” design appears to me to be at least as fundamentally flawed. According to the publicly available records, Cogni Trax has been trying to solve this problem for 7 years, and a highly contrived setup is the best they have demonstrated, at least publicly. This is more of a university lab project than something that should be developed commercially.

Based on his history with Apple and Texas Instruments, the CEO, Sajjad Khan, is capable, but I can’t understand why he is pursuing this fool’s errand. I don’t understand why over $7M has been invested, other than people blindly investing in former Apple designers without proper technical due diligence. I understand that high-risk, high-reward concepts can be worth some investment, but in my opinion, this does not fall into that category.

Appendix – Quoting Out of Context

Cogni Trax has quoted me in their video on their website as saying, “The Holy Grail of AR Displays.” It is not clear that A) I am referring to Hard Edge Occlusion (and not Cogni Trax) and B) I go on to say, “But it is likely impossible to solve for anything more than special cases of a single distance (flat) real world with optics.” The Audio in the Cogni Trax video from me, which is rather garbled, comes from a MARCH 30, 2021, AR Show, “KARL GUTTAG (KGONTECH) ON MAPPING AR DISPLAYS TO SUITABLE OPTICS (PART 2) at ~48:55 into the video (the occlusion issue is only briefly discussed).

Below, I have cited (with new highlighting in yellow) the section from my blog discussing hard edge occlusion from November 20, 2019, where Cogni Trax got my “Holy Grail” quote. This section of the article discusses the ASU design. This article discussed using a transmissive LCD for soft edge occlusion about 3 years before Magic Leap announced the Magic Leap 2 with such a method in July 2022.

Hard Edge (Pixel) Occlusion – Everyone Forgets About Focus

“Hard Edge Occlusion” is the concept of being able to block the real world with sharply defined edges, preferably to the pixel level. It is one of the “Holy Grails” of optical AR. Not having hard edge occlusion is why optical AR images are translucent. Hard Edge Occlusion is likely impossible to solve optically for all practical purposes. The critical thing most “solutions” miss (including US 20190324274) is that the mask itself must be in focus for it to sharply block light. Also, to properly block the real world, the focusing effect required depends on the distance of everything in the real world (i.e., it is infinitely complex).

The most common hard edge occlusion idea suggested is to put a transmissive LCD screen in the glasses to form “opacity pixels,” but this does not work. The fundamental problem is that the screen is so close to the eye that the light-blocking elements are out of focus. An individual opacity pixel will have a little darkening effect, with most of the light from a real-world point in space going around it and into the eye. A large group of opacity pixels will darken as a blurry blob.

Hard edge occlusion is trivial to do with pass-through AR by essentially substituting pixels. But it is likely impossible to solve for anything more than special cases of a single distance (flat) real world with optics. The difficulty of supporting even the flat-world special case is demonstrated by some researchers at the University of Arizona, now assigned to Magic Leap (the PDF at this link can be downloaded for free) shown below. Note all the optics required to bring the real world into focus onto “SLM2” (in the patent 9,547,174 figure) so it can mask the real world and solve the case for everything being masked being at roughly the same distance. None of this is even hinted at in the Apple application.

I also referred to hard edge occlusion as one of the “Holy Grails” of AR in a comment to a Magic Leap article in 2018 citing the ASU design and discussing some of the issues. Below is the comment, with added highlighting in yellow.

One of the “Holy Grails” of AR, is what is known as “hard edge occlusion” where you block light in-focus with the image. This is trivial to do with pass-through AR and next to impossible to do realistically with see-through optics. You can do special cases if all the real world is nearly flat. This is shown by some researchers at the University of Arizona with technology that is Licensed to Magic Leap (the PDF at this link can be downloaded for free: https://www.osapublishing.org/oe/abstract.cfm?uri=oe-25-24-30539#Abstract). What you see is a lot of bulky optics just to support a real world with the depth of a bookshelf (essentially everything in the real world is nearly flat).

FM: Magic Leap One – Instant Analysis in the Comment Section by Karl Guttag (KarlG) JANUARY 3, 2018 / 8:59 AM

Brilliant Labs Frame AR with AI Glasses & a Little More on the Apple Vision Pro

10 May 2024 at 04:29

Introduction

A notice in my LinkedIn feed mentioned that Brilliant Labs has started shipping its new Frame AR glasses. I briefly met with Brilliant CEO Bobak Tavangar at AWE 2023 (right) and got a short demonstration of its “Monocle” prototype. So, I investigated what Brilliant Labs was doing with its new “Frame.”

This started as a very short article, but as I put it together, I thought it would be an interesting example of making design decisions and trade-offs. So it became longer. Looking at the Frames more closely, I found issues that concerned me. I don’t mean to pick on Brillant Labs here. Any hardware device like the Frames is a massive effort, and they talk like they are concerned about their customers; I am only pointing out the complexities of supporting AI with AR for a wide audience.

While looking at how the Frame glasses work, I came across some information related to the Apple Vision Pro’s brightness (in nits), discussed last time in Apple Vision Pro Discussion Video by Karl Guttag and Jason McDowall. In the same way, the Apple Vision Pro’s brightness is being misstated as “5000 nits,” and the Brilliant Labs Frame’s brightness has been misreported as 3,000 nits. In both cases, the nits are the “potential” out of the display and not “to the eye” after the optics.

I’m also repeating the announcement that I will be at SID’s DisplayWeek next week and AWE next month. If you want to meet, please email meet@kgontech.com.

DisplayWeek (next week) and AWE (next month)

I will be at SID DisplayWeek in May and AWE in June. If you want to meet with me at either event, please email meet@kgontech.com. I usually spend most of my time on the exhibition floor where I can see the technology.

If you want to meet, please email meet@kgontech.com.

AWE has moved to Long Beach, CA, south of LA, from its prior venue in Santa Clara, and it is about one month later than last year. Last year at AWE, I presented Optical Versus Passthrough Mixed Reality, available on YouTube. This presentation was in anticipation of the Apple Vision Pro.

At AWE, I will be on the PANEL: Current State and Future Direction of AR Glasses on Wednesday, June 19th, from 11:30 AM to 12:25 PM.

There is an AWE speaker discount code – SPKR24D , which provides a 20% discount, and it can be combined with Early Bird pricing (which ends May 9th, 2024 – Today as I post this). You can register for AWE here.

Brilliant Labs Monocle & Frame “Simplistic” Optical Designs

Brillian Labs Monocle and Frame used the same basic optical architecture, but it is better hidden in the Frame design. I will start with the Monocle, as it is easier to see the elements and the light path. I was a little surprised that both designs use a very simplistic, non-polarized 50/50 beam splitter with its drawbacks.

Below (left) is a picture of the Monocle with the light path (in green). The Monocle (and Frame) both use a non-polarizing 50/50 beamsplitter. The splitter projects 50% of the display’s light forward and 50% downward to the (mostly) spherical mirror, magnifying the image and moving the apparent focus. After reflecting from the mirror, the light is split again in half, and ~25% of the light goes to the eye. The front project image will be mirrored, with an unmagnified view of the display that will be fairly bright. Front projection or “eye glow” is generally considered undesirable in social situations and is something most companies try to reduce/eliminate in their optical designs.

The middle picture above shows a picture I took of the Monocle from the outside, and you can see the light from the beam splitter projecting forward. Figures 5A and 6 (above right) from Brilliant Labs’ patent application illustrate the construction of the optics. The Monocle is made with two solid optical parts, with the bottom part forming part of the beam splitter and the bottom surface being shaped to form the curved mirror and then mirror coated. An issue with the 2-piece Monocle construction is that the beam splitter and mirror are below eye level, which requires the user to look down to see the image or position the whole device higher, which results in the user looking through the mirror.

The Frame optics work identically in function, but the size and spacing differ. The optics are formed with three parts, which enables Brilliant to position the beam splitter and mirror nearer the center of the user’s line of sight. But as Brilliant Lab’s documentation shows (right), the new Frame glasses still have the virtual (apparent) image below the line of sight.

Having the image below the line of sight reduces the distortion/artifacts of the real world by looking through the beam splitter when looking forward, but it does not eliminate all issues. The top seam of the beam splitter will likely be visible as an out-of-focus line.

The image below shows part of the construction process from a Brilliant Labs YouTube video. Note that the two parts that form the beamsplitter with its 50/50 semi-mirror coating have already been assembled to form the “Top.”

The picture above left is of a prototype taken by Forbes’ author Ben Sin of a Frame prototype from his article Frame Is The Most ‘Normal’ Looking AI Glasses I’ve Worn Yet. In this picture, the 50/50 beam splitter is evident.

Two Types of Birdbath

As discussed in Nreal Teardown: Part 1, Clones and Birdbath Basics and its Appendix: Second Type of Birdbath, there are two types of “birdbaths” used in AR. The Birdbath comprises a curved mirror (or semi-mirror) and a beamsplitter. It is called a “birdbath” because the light reflects out of the mirror. The beamsplitter can be polarized or unpolarized (more on this later). Birdbath elements are often buried in the design, such as the Lumus optical design (below left) with its curved mirror and beam splitter.

From 2023 AR/VR/MR Lumus Paper – A “birdbath” is one element of the optics

Many AR glasses today use the birdbath to change the focus and act as the combiner. The most common of these designs is where the user looks through a 50/50 birdbath mirror to see the real world (see Nreal/Xreal example below right). In this design, a polarised beam splitter is usually used with a quarter waveplate to “switch” the polarization after the reflection from the curved semi-mirror to cause the light to go through the beam splitter on its second pass (see Nreal Teardown: Part 1, Clones and Birdbath Basics for a more detailed explanation). This design is what I refer to as a “Look through the mirror” type of birdbath.

Brilliant Labs uses a “Look through the Beamsplitter” type of birdbath. Google Glass is perhaps the most famous product with this birdbath type (below left). This birdbath type has appeared in Samsung patents that were much discussed in the electronic trade press in 2019 (see my 2019 Samsung AR Design Patent—What’s Inside).

LCOS maker Raontech started showing a look through the beamsplitter reference design in 2018 (below right). The various segments of their optics are labeled below. This design uses a polarizing beam splitter and a quarter waveplate.

Brilliant Labs’ Thin Beam Splitter Causes View Issues

If you look at the RaonTech or Google Glass splitter, you should see that the beam splitter is the full height of the optics. However, in the case of the Frames and Monocle designs (right), the top and bottom beam splitter seams, the 50/50 mirror coating, and the curved mirror are in the middle of the optics and will be visible as out-of-focus blurs to the user.

Pros and Cons of Look-Through-Mirror versus Look-Through-Beamsplitter

The look-through-mirror birdbaths typically use a thin flat/plate beam splitter, and the curved semi-mirror is also thin and “encased in air.” This results in them being relatively light and inexpensive. They also don’t have to deal with the “birefringence” (polarization changing) issues associated with thick optical materials (particularly plastic). The big disadvantage of the look-through-mirror approach is that to see the real world, the user must look through both the beamsplitter and the 50/50 mirror; thus, the real world is dimmed by at least 75%.

The look-through-beamsplitter designs encase the entire design in either glass or plastic, with multiple glued-together surfaces coated or coated with films. The need to encase the design in a solid means the designs tend to be thicker and more expensive. Worse yet, typical injected mold plastics are birefringent and can’t be used with polarized optics (beamsplitters and quartwaveplates). Either heavy glass or higher-cost resin-molded plastics must be used with polarized elements. Supporting a wider FOV becomes increasingly difficult as a linear change in FOV results in a cubic increase in the volume of material (either plastic or glass) and, thus, the weight. Bigger optics are also more expensive to make. There are also optical problems when looking through very thick solid optics. You can see in the Raontech design above how thick the optics get to support a ~50-degree FOV. This approach “only” requires the user to look through the beam splitter, and thus the view of the real world is dimmed by 50% (or twice as much light gets through as the look-through-mirror method).

Pros and Cons Polarized Beam Splitter Birdbaths

Most companies with look-through-mirror and look-through-beamsplitter designs, but not Brilliant Labs, have gone with polarizing beam splitters and then use quarter waveplates to “switch” the polarization when the light reflects off the mirror. Either method requires the display’s light to make a reflective and transmissive pass via the beam splitter. With a non-polarized 50/50 beam splitter, this means multiplicative 50% losses or only 25% of the light getting through. With a polarized beam splitter, once the light is polarized with a 50% loss, with proper use of quarter waveplates, there are no more significant losses with the polarized beamsplitter.

Another advantage of the polarized optics approach is that front-projection can be mostly eliminated (there will be only a little due to scatter). The look-through-mirror method can be accomplished (as discussed in Nreal Teardown: Part 1, Clones and Birdbath Basics) with a second-quarter waveplate and a front polarizer. With the look-through-beamsplitter method, a polarizer before the beamsplitter will block the light that would project forward off the polarized beamsplitter.

As mentioned earlier, using polarized optics becomes much more difficult with the thicker solid optics associated with the look-through-beamsplitter method.

Brilliant Labs Frame Design Decision Options

It seems that at every turn in the decision process for the Frame and Monocle optics, Brilliant Labs chose the simplest and most economical design possible. By not using polarized optics, they gave up brightness and will have significant front projection. Still, they can use much less expensive injection-molded plastic optics that do not require polarizers and quart waveplates. They avoided using more expensive waveguides, which would be thinner but require LCOS or MicroLED (inorganic LED) projection engines, which may be heavier and larger. Although, the latest LCOS and MicroLED engines are getting to be pretty small and light, particularly for a >30-degree FOV (see DigiLens, Lumus, Vuzix, Oppo, & Avegant Optical AR (CES & AR/VR/MR 2023 Pt. 8)).

Frames Brightness to the Eye – Likely >25% of 3,000 nits – Same Problem as Apple Vision Pro Reporting

As discussed in the last article on the Apple Vision Pro (AVP) in the Appendix: Rumor Mill’s 5,000 Nits Apple Vision Pro, reporters/authors constantly make erroneous comparisons of “display-out nits” with one device and to the nits-to-the-eye of other devices. Also, as stated last time, the companies appear to want this confusion by avoiding specifying the nits to the eye as they benefit from reporters and others using display device values.

I could not find an official Brilliant Labs value anywhere, but it seems to have told reporters that “the display is 3,000 nits,” which may not be a lie, but it is misleading. Most articles will dutifully give the “display number” but fail to say that they are “display device nits” and not what the user will see and leave it to the readers to make the mistake, while other reporters will make the error themselves.

Digitrends:

The display on Frame is monocular, meaning the text and graphics are displayed over the right eye only. It’s fairly bright (3,000 nits), though, so readability should be good even outdoors in sunlit areas.

Wearable:

As with the Brilliant Labs Monocle – the clip-on, open-source device that came before Frame – information is displayed in just one eye, with overlays being pumped out at around 3,000 nits brightness.

Android Central in androidcentral’s These AI glasses are being backed by the Pokemon Go CEO, who was at least making it clear that it was the display device numbers, but I still think most readers wouldn’t know what to do with this number. They added the tidbit that the panels were made by Sony, and they discussed pulse with modulation (also known as duty cycle). Interestingly, they talk about a short on-time duty cycle causing problems for people sensitive to flicker. In contrast, VR game fans favor a very short on-time duty cycle, what Brad Lynch of SadlyItsBradly refers to as low-persistence) to reduce blurring.

androidcentral’s These AI glasses are being backed by the Pokemon Go CEO

A 0.23-inch Sony MicroOLED display can be found inside one of the lenses, emitting 3,000 nits of brightness. Brilliant Labs tells me it doesn’t use PWM dimming on the display, either, meaning PWM-sensitive folks should have no trouble using it.

Below is a summary of Sony OLED Microdisplays aimed at the AR and VR market. On it, the 0.23 type device is listed with a max lumence of 3,000 nits. However, from the earlier analysis, we know that at most 25% of the light can get through Brilliant Labs Frame birdbath optics or at most 750 nits (likely less due to other optical losses). This number assumes that the device is driven at full brightness and that Brilliant Labs is not buying derated devices at a lower price.

I can’t blame Brilliant Labs because almost every company does the same in terms of hiding the ball on to-the-eye brightness. Only companies with comparatively high nits-to-the-eye values (such as Lumus) publish this spec.

Sony Specifications related to the Apple Vision Pro

The Sony specifications list a 3.5K by 4 K device. The industry common understanding is that Apple designed a custom backplane for the AVP but then used Sony’s OLED process. Notice the spec of 1,000 cd/m2 (candelas per meter squared = nits) at a 20% duty ratio. While favorable for VR gamers wanting less motion blur, the low on-duty cycle time is also a lifetime issue. The display device probably can’t handle the heat from being driven for a high percentage of the time.

It would be reasonable to assume that Apple is similarly restricted to about a 20% on-duty cycle. As I reported last time in the Apple Vision Pro Discussion Video by Karl Guttag and Jason McDowall, I have measured the on-duty cycle of the AVP to be about 18.4% or close to Sony’s 20% for their own device.

The 5,000 nits cited by MIT Tech Review are the raw displays before the optics, whereas the nits for the MQ2 were those going to the eye. The AVP’s (and all other) pancake optics transmit about 11% (or less) of the light from an OLED in the center. With Pancake optics, there is the polarization of the OLED (>50% loss), a transmissive pass, and a reflective pass through a 50/50 mirror, which starts with at most 12.5% (50% cubed) before considering all the other losses from the optics. Then, there is the on-time-duty cycle of the AVP, which I have measured to be about 18.4%. VR devices want the on-time duty cycle to be low to reduce motion blur with the rapid motion of the head and 3-D game. The MQ3 only has a 10.3% on-time duty cycle (shorter duty cycles are easier with LED-illuminated LCDs). So, while the AVP display devices likely can emit about 5,000 nits, the nits reaching the eye are approximately 5,000 nits x 11% x 18.4% = 100 nits.

View Into the Frame Glasses

I don’t want to say that Brilliant Labs is doing anything wrong or that other companies don’t often do the same. Companies often take pictures and videos of new products using non-functional prototypes because the working versions aren’t ready when shooting or because they look better on camera. Still, I want to point out something I noticed with the pictures of the CEO, Bobak Tavangar (right), that was published in many of the articles in the Frames glasses. I didn’t see the curved mirror and the 50/50 beam splitter.

In a high-resolution version of the picture, I could see the split in the optics (below left) but not the darkened rectangle of the 50/50 mirror. So far, I have found only one picture of someone wearing the Frame glasses from Bobak Tavangar’s post on X. It is of a person wearing what appears to be a functional Frame in a clear prototype body (below right). In the dotted line box, you can see the dark rectangle from the 50/50 mirror and a glint from the bottom curved mirror.

I don’t think Brilliant Labs is trying to hide anything, as I can find several pictures that appear to be functional frames, such as the picture from another Tavangar post on X showing trays full of Frame devices being produced (right) or the Forbes picture (earlier in the Optical section).

What was I hoping to show?

I’m trying to show what the Frame looks like when worn to get an idea of the social impact of wearing the glasses. I was looking for a video of someone wearing them with the Frame turned on, but unfortunately, none have surfaced. From the design analysis above, I know they will project a small but bright image view with a mirror image of the display off of the 50/50 mirror, but I have not found an image showing the working device from the outside looking in.

Exploded View of the Frame Glasses

The figure below is taken from Brilliant Lab’s online manual for the Frame glasses (I edited it to reduce space and inverted the image to make it easier to view). By AR glasses standards, the Frame design is about as simple as possible. The choice of two nose bridge inserts is not shown in the figure below.

There is only one size of glasses, which Brilliant Labs described in their AMA as being between a “medium and large” type frame. They say that the temples are flexible to accommodate many head widths. Because the Frames are monocular, IPD is not the problem it would be with a biocular headset.

AddOptics is making custom prescription lenses for the Frames glasses

Brilliant Labs is partnering with AddOptics to make prescription lenses that can be ‘Precision Bonded’ to Frames using a unique optical lens casting process. For more on AddOptics, see CES 2023 (Part 3) – AddOptics Custom Optics and my short follow-up in Mixed Reality at CES & AR/VR/MR 2024 (Part 2 Mostly Optics).

Bonding to the Frames will make for a cleaner and more compact solution than the more common insert solution, but it will likely be permanent and thus a problem for people whose prescriptions change. In their YouTube AMA, Brilliant Labs said they are working with AddOptics to increase the range of prescription values and support for astigmatism.

They didn’t say anything about bifocal or progressive lens support, which is even more complicated (and may require post-mold grinding). As the virtual image is below the centerline of vision, it would typically be where bifocal and progressive lenses would be designed for reading distance (near vision). In contrast, most AR and VR glasses aim to put the virtual image at 2 meters, considered “far vision.”

The Frame’s basic specs

Below, I have collected the basic specs on the Frame glasses and added my estimate for the nits to the eye. Also shown below is their somewhat comical charging adapter (“Mister Charger”). None of these specs are out of the ordinary and are generally at the low end for the display and camera.

  • Monocular 640×400 resolution OLED Microdisplay
  • ~750nits to the eye (based on reports of a 3,000 Sony Micro-OLED display device)
    • (90% on-time duty cycle using an
  • 20-Degree FOV
  • Weight ~40 grams
  • 1280×720 camera
  • Microphone
  • 6 axis IMU
  • Battery 222mAh  (plus 149mAh top-up from charging adapter)
    • With 80mA typical power consumption when operating 0.580 on standby)
  • CPU nRF52840 Cortex M4F (Nordic ARM)
  • Bluetooth 5.3

Everything in AR Today is “AI”

Brilliant Labs is marketing the frames as “AI Glasses.” The “AI” comes from Brilliant Lab’s Noa ChatGPT client application running on a smartphone. Brillant Labs says the hardware is “open source” and can be used by other companies’ applications.

I’m assuming the “AI” primarily runs on the Noa cell phone application, which then connects to the cloud for the heavy-lifting AI. According to their video by Brillant Labs, while on the Monocle, the CPU only controls the display and peripherals, they plan to move some processing onto the Frame’s more capable CPU. Like other “AI” wearables, I expect simple questions will get immediate responses while complex questions will wait on the cloud.

Conclusions

To be fair, designing glasses and wearable AR products for the mass market is difficult. I didn’t intend to pick on Brilliant Lab’s Frames; instead, I am using it as an example.

With a monocular, 20-degree FOV below the center of the person’s view, the Frames are a “data snacking” type AR device. It is going to be competing with products like the Human AI projector (which is a joke — see: Humane AI – Pico Laser Projection – $230M AI Twist on an Old Scam), the Rabbit R1, Meta’s (display-less) Ray Ban Wayfarer, other “AI” audio glasses, and many AR-AI glasses similar to the Frame that are in development.

This blog normally concentrates on display and optics, and on this score, the Frame’s optics are a “minimal effort” to support low cost and weight. As such, they have a lot of problems, including:

  • Small 20-degree FOV that is set below the eyes and not centered (unless you are lucky with the right IPD)
  • Due to the way the beam 50/50 splitter cuts through the optics, it will have a visible seam. I don’t think this will be pleasant to look through when the display is off (but I have not tried them yet). You could argue that you only put them on “when you need them,” but that negates most use cases.
  • The support for vision correction appears to lock the glasses to a single (current) prescription.
  • Regardless of flexibility, the single-size frame will make the glasses unwearable for many people.
  • The brightness to the eye of probably less than 750 nits is not bright enough for general outdoor use in daylight. It might be marginal if used combined with clip-on sunglasses or if they are used in the shade.

As a consumer, I hate the charger adapter concept. Why they couldn’t just put a USB-C connector on the glasses is beyond me and a friction point for every user. Users typically have dozens of USB-C power cables today, but your device is dead if you forget or lose the adaptor. Since these are supposed to be prescription glasses, the idea of needing to take them off to charge them is also problematic.

While I can see the future use model for AI prescription glasses, I think a display, even one with a small FOV, will add significant value. I think Brillant Labs’s Frames are for early adopters who will accept many faults and difficulties. At least they are reasonably priced at $349, by today’s standards, and don’t require a subscription for basic services without too many complex AI queries requiring the cloud.

Apple Vision Pro Discussion Video by Karl Guttag and Jason McDowall

30 April 2024 at 14:35

Introduction

As discussed in Mixed Reality at CES and the AR/VR/MR 2024 Video (Part 1 – Headset Companies), Jason McDowall of The AR Show recorded over four hours of video discussing the 50 companies I met at CES and AR/VR/MR. The last thing we discussed for about 50 minutes was the Apple Vision Pro (AVP).

The AVP video amounts to a recap of the many articles I have written on the AVP. Where appropriate, I will give links to my more detailed coverage in prior articles and updates rather than rehash that information in this article.

It should be noted that Jason and I recorded the video on March 25th, 2024. Since then, there have been many articles from tech magazines saying the AVP sales are lagging, often citing Bloomberg’s Mark Gurman’s “Demand for demos is down” and Analyst Ming Quo reporting, “Apple has cut its 2024 Vision Pro shipments to 400–450k units (vs. market consensus of 700–800k units or more).” While many reviewers cite the price of the AVP, I have contended that price was not the problem as it was in line with a new high-tech device (adjusted for inflation, it is about the same price as the first Apple II). My criticism focuses on the utility and human factors. In high-tech, the cost is usually a fixable problem with time and effort, and people will pay more if something is of great utility.

I said the Apple Vision Pro would have utility problems before it was announced. See my 2023 AWE Presentation “Optical Versus Passthrough Mixed Reality“) and my articles on the AVP. I’m not about bashing a product or concept; when I find faults, I point them out and show my homework, so to speak, on this blog and in my presentations.

Before the main article, I want to repeat the announcement that I plan to go to DisplayWeek in May and AWE in June. I have also included a short section on YouTube personality/influence Marques Browlee’s Waveform Podast and Hugo Barra’s (former Head of Oculus at Meta) blog article discussing my controversial (but correct) assessment that the Apple Vision Pro’s optics are slightly out of focus/blurry.

DisplayWeek and AWE

I will be at SID DisplayWeek in May and AWE in June. If you want to meet with me at either event, please email meet@kgontech.com. I usually spend most of my time on the exhibition floor where I can see the technology.

AWE has moved to Long Beach, CA, south of LA, from its prior venue in Santa Clara, and it is about one month later than last year. Last year at AWE, I presented Optical Versus Passthrough Mixed Reality, available on YouTube. This presentation was in anticipation of the Apple Vision Pro.

At AWE, I will be on the PANEL: Current State and Future Direction of AR Glasses on Wednesday, June 19th, from 11:30 AM to 12:25 PM with the following panelists:

  • Jason McDowall – The AR Show (Moderator)
  • Jeri Ellsworth – Tilt Five
  • Adi Robertson – The Verge
  • Edward Tang – Avegant
  • Karl M Guttag – KGOnTech

There is an AWE speaker discount code – SPKR24D , which provides a 20% discount, and it can be combined with Early Bird pricing (which ends May 9th, 2024). You can register for AWE here.

“Controversy” of the AVP Being a Little Blurry Discussed on Marques Brownlee’s Podcast and Hugo Barra’s Blog

As discussed in Apple Vision Pro – Influencing the Influencers & “Information Density,” which included citing this blog on Linus Tips, this blog is read by other influencers, media, analysts, and key people at AR/VR/MR tech companies.

Marques Brownlee (MKBHD), another major YouTube personality, Waveform Podcast/WVFRM YouTube channel, discussed (link to the YouTube discussion) my March 1st article on Apple Vision Pro’s Optics Blurrier & Lower Contrast than Meta Quest 3. Marques discussed Hugo Barra’s (former Head of Oculus at Meta) blog’s March 11, 2024 “Hot Take” article (about 1/3rd of the way down) on my blog article.

According to MKBHD and Hugo Barra, my comments about Vision Pro are controversial, but they agree that it would make sense based on my evidence and their experience. My discussion with Jason was recorded before the Waveform Podcast came out. I’m happy to defend and debate this issue.

Outline of the Video and Additional Information

The Video The times in blue on the left of each subsection give the link to the YouTube video section discussing that subject.

00:16 Ergonomics and Human Factors

I wrote about the issues with the AVP’s human factors design in Apple Vision Pro (Part 2) – Hardware Issues Mechanical Ergonomics. In a later article in CES Part 2, I compared the AVP to the new Sony XR headset in the Sony XR (and others compared to Apple Vision Pro) section.

08:23 Lynx and Hypervision

I wrote the article comparing the new Sony XR headset to the AVP mentioned the Lynx R1, first shown in 2021, in this comparison. But I didn’t realize how much they were alike until I saw a post somewhere (I couldn’t find it again) by Lynx’s CEO, Stan Larroque saying how much they were alike. It could be a matter of form following function, but how much they are alike from just about any angle is rather striking.

While on the subject of Lynx and Apple. Lynx used optic by Limbak for the Lynx R1. As I broke in December 2022 Limbak Bought by “Large US Company” (which soon was revealed as Apple) and discussed in more detail in a 2022 Video with Brad Lynch, I don’t like the R1’s Limbak “catadioptric” (combined mirror and refractive) optics. While the R1 optics are relatively thin, like pancake optics, they cause a significant loss of resolution due to their severe distortion, and worse, they have an optical discontinuity in the center of the image unless the eye is perfectly aligned.

In May 2023, Lynx and Hypervision announced that they were working together. In Apple Vision Pro (Part 4)—Hypervision Pancake Optics Analysis, Hypervision detailed the optics of the Apple Vision Pro. That article also discusses the Hypervision pancake optics it was showing at AR/VR/MR 2023. Hypervision demonstrated single pancake optics with a 140-degree FOV (the AVP is about 90 degrees) and blended dual pancake optics with a 240-degree FOV (see below right).

10:59 Big Screen Beyond Compared to AVP Comfort Issues

When I was at the LA SID One Day conference, I stopped by Big Screen Beyond to try out their headset. I wore Big Screen’s headset for over 2 hours and didn’t have any of the discomfort issues I had with the AVP. With the AVP, my eyes start bothering me after about 1/2 hours and are pretty sore by 1 hour. There are likely two major factors: one is that the AVP is applying pressure to the forehead, and the other is that something is not working right optically with the AVP.

Big Screen Beyond has a silicon gel-like custom interface that is 3-D printed based on a smartphone face scan. Like the AVP, they have magnetic prescription inserts. While the Big Screen Beyond was much more comfortable, the face interface has a large contact area with the face. While not that uncomfortable, I would like something that breathed more. When you remove the headset, you can feel the preparation evaporating from where the interface was contacting your face. I can’t imagine anyone wearing makeup being happy (the same with the with the AVP or any headset that presses against the face).

On a side note, I was impressed by Big Screen Beyond’s statement that it is cash flow positive. It is a sign that they are not wildly spending money on frills and that they understand the market they are serving. They are focused on serving dedicated VR gamers who want to connect the headset to a powerful computer.

Related to the Big Screen Beyond interface, a tip I picked up on Reddit is that you can use a silicon face pad made for the Meta Quest 2 or 3 on the AVP’s face interface (see above right). The silicon face pad gives some grip to the face interface and reduces the pressure required to hold the AVP steady. The pad adds about 1mm, but it so happens that I had recently swapped my original AVP face interface for one that is 5mm shorter. Now, I barely need to tighten the headband. A downside to the silicon pad, like the Big Screen Beyond, is that it more or less forms a seal with your face, and you can feel the perspiration evaporating when you remove it.

13:16 Some Basic AVP Information

In the video, I provide some random information about the AVP. I wanted to go into detail here about the often misquoted brightness of the AVP.

I started by saying that I have read or watched many people state that the AVP is much brighter than the Meta Quest 3 (MQ3) or Meta Quest Pro (MQP). They are giving ridiculously high brightness/nits values for the AVP. As I reported in my March 7th, 2024, comments in the article Apple Vision Pro’s Optics Blurrier & Lower Contrast than Meta Quest 3, the AVP outputs to the eye about 100 nits and is only about 5-10% brighter than the MQ3 and ~20% less than the MQP.

Misinformation on AVP brightness via a Google Search

I will explain how this came about in the Appendix at the end. And to this day, if you do a Google search (captured below), it will prominently state that the AVP has a “50-fold improvement over the Meta’s Quest 2, which hits just 100 nits,” citing MIT Technology Review.

Nits are tricky to measure in a headset without the right equipment, and even then, they vary considerably from the center (usually the highest to the periphery).

The 5,000 nits cited by MIT Tech Review are the raw displays before the optics, whereas the nits for the MQ2 were those going to the eye. The AVP’s (and all other) pancake optics transmit about 11% (or less) of the light from an OLED in the center. With Pancake optics, there is the polarization of the OLED (>50% loss), a transmissive pass, and a reflective pass through a 50/50 mirror, which starts with at most 12.5% (50% cubed) before considering all the other losses from the optics. Then, there is the on-time-duty cycle of the AVP, which I have measured to be about 18.4%. VR devices want the on-time duty cycle to be low to reduce motion blur with the rapid motion of the head and 3-D game. The MQ3 only has a 10.3% on-time duty cycle (shorter duty cycles are easier with LED-illuminated LCDs). So, while the AVP display devices likely can emit about 5,000 nits, the nits reaching the eye are approximately 5,000 nits x 11% x 18.4% = 100 nits.

18:59 Computer Monitor Replacement is Rediculous

I wrote a three-part series on why I think monitor replacement by the Apple Vision Pro is ridiculous. Please see Apple Vision Pro (Part 5A) – Why Monitor Replacement is Ridiculous, Part 5B, and Part 5C. There are multiple fundamental problems that neither Apple nor anyone else is close to solving. The slide on the right summarizes some of the big issues.

Nyquist Sampling – Resampling Causes Blurring & Artifacts

I tried to explain the problem in two ways, one based on the frequency domain and the other on the spatial (pixel) domain.

19:29 Frequency Domain Discussion

Anyone familiar with signal processing may remember that a square wave has infinite odd harmonics. Images can be treated like 2-dimensional signals. A series of equally spaced, equal-width horizontal lines looks like a square wave in the vertical dimension. Thus, to represent them perfectly with a 3-D transform requires infinite resolution. Since the resolution of the AVP (or any VR headset) is limited, there will be artifacts such as blurring, wiggling, and scintillation.

As I pointed out in (Part 5A), computers tend to “cheat” and distort text and graphics to fit on the pixel grid and thus sidestep the Nyquist sampling problem that any VR headset must face when trying to make a 2-D image appear still in 3-D space. Those who know signal processing know that the Nyquist rate is 2x the highest frequency component. However, as noted above, horizontal lines have infinite frequency. Hence, some degradation is inevitable, but then we only have to beat the resolution limit of the eye, which, in effect, acts as a low-pass filter. Unfortunately, the AVP’s display is about 2-3x too low linearly (4-9x in two dimensions) in resolution for the artifacts not to be seen by a person with good vision.

22:15 Spatial Domain Discussion

To avoid relying on signal processing theory, in (Part 5A), I gave the example of how a single display pixel can be translated into 3-D space (right). The problem is that a pixel the size of a physical pixel in the headset will always cover parts of four physical pixels. Worse yet, with the slightest movement of a person’s head, how much of each pixel and even which pixels will be constantly changing, causing temporal artifacts such as wiggling and scintillation. The only way to reduce the temporal artifacts is to soften (low pass filter) the image in the resampling process.

23:19 Optics Distortion

In addition to the issues with representing a 2-D image in 3-D space, the AVP’s optics are highly distorting, as discussed in Apple Vision Pro’s (AVP) Image Quality Issues—First Impressions. The optical distortions can be “digitally corrected” but face the same resample issues discussed above.

25:51 Close-Up Center Crop and Foveated Boundary

The figures shown in this part of the video come from Apple Vision Pro’s (AVP) Image Quality Issues – First Impressions, and I will refer you to that article rather than repeat it here.

This image has an empty alt attribute; its file name is 2024-02-AVP-foveated-boundaries-2a-and-2b-copy-1024x428.jpg

28:52 AVP’s Pancake Optics and Comparison to MQ3 and Birdbath

Much of this part of the video is covered in more detail in Apple Vision Pro’s (AVP) Image Quality Issues—First Impressions.

Using Eye Tracking for Optics Has Wider Implications

A key point made in the video is that the AVP’s optics are much more “aggressive” than Meta’s, and as a result, they appear to require dynamic eye tracking to work well. I referred to the AVP optics as being “unstable.” The AVP is constantly pre-correcting for distortion and color based on eye tracking. While the use of eye tracking for Foveated Rendering and control input is much discussed by Apple and others, using eye tracking to correct the optics has much more significant implications, which may be why the AVP has to be “locked” onto a person’s face.

Eye tracking for foveated rendering does not have to be nearly as precise as it is for correction, but using it for optical correction does. This leads me to speculate that the AVP requires the facial interfaces to lock the headset to the face, which is horrible regarding human factors, to support pre-correcting the optics. This follows my rule, “when smart people do something that appears dumb, it is because the alternative was worse.”

Comparison to (Nreal/Xreal) Birdbath

One part not discussed in the video or that article but shown in the associated figure (below) is the similarity of Pancake Optics are similar to Birdbath Optics. Nreal (now Xreal) Birdbath optics are discussed in my Nreal teardown series in Nreal Birdbath Overview.

Both pancake and birdbath optics start by polarizing the image from an OLED microdisplay. They use quarter waveplates to “switch” the polarization, causing it to bounce off a polarizer and then pass through it. They both use a 50/50 coated semi-mirror. They both use a combination of refractive (lens) and reflective (mirror) optics. In the case of the birdbath, the polarizer acts as a beam splitter to the OLED display so it does not block the view out, whereas with pancake optics, everything is inline.

31:34 AVP Color Uniformity Problem

The color uniformity and the fact that the color shift moves around with eye movement were discussed in Apple Vision Pro’s Optics Blurrier & Lower Contrast than Meta Quest 3.

32:11 Comparing Resolution vs a Monitor

In Apple Vision Pro’s Optics Blurrier & Lower Contrast than Meta Quest 3, I compared the resolution of the AVP (below left) to various computer monitors (below right) and the Meta Quest 3.

Below is a close-up crop of the center of the same image shown on the AVP, a 28″ monitor, and the Meta Quest 3. See the article for an in-depth explanation.

33:03 Vision OS 1.1 Change in MacBook mirror processing

I received and saw some comments about my Apple Vision Pro’s Optics Blurrier & Lower Contrast than Meta Quest 3 that Vision OS 1.1 MacBook mirroring was sharper. I had just run a side-by-side comparison of displaying an image from a file on the AVP versus displaying the same image via mirroring a MacBook in Apple Vision Pro Displays the Same Image Differently Depending on the Application. So, I downloaded Vision OS 1.1 to the AVP and reran the same test, and I found a clear difference in the rendering of the MacBook mirroring (but not the display from the AVP file). However, it was not that the MacBook mirror image was shaper per se, but it was less bold. Even in the thumbnails below (click on them to see the full-size images). In the thumbnails below, note how the text looks less bold on the right side of the left image (OS 1.2) versus the right side of the right image.

Below are crops from the two images above, with the OS 1.1 image on the top and OS 1.0 on the bottom. The MacBook mirroring comes from the right sides of both images. Note how much bold the text and lines are in the OS 1.1 crop.

35:57 AVP Passthrough Cameras in the Wrong Location

38:43 AVP’s Optics are Soft/Blurry

As stated in Apple Vision Pro’s Optics Blurrier & Lower Contrast than Meta Quest 3, the AVP optics are a little soft. According to Marquees Brownlee (see above) and others, my statement has caused controversy. I have heard others question my methods, but I have yet to see any evidence to the contrary.

I have provided my photographic evidence (right) and have seen it with my eyes by swapping headsets back and forth with high-resolution content. For comparison, the same image was displayed on the Meta Quest 3, and the MQ3 was clearly sharper. The “blur” on the AVP is similar to what one would see with a Gaussian blur with a radius of about 0.5 to 1 pixel.

Please don’t confuse “pixel resolution” with optical sharpness. The AVP has more pixels per degree, but the optics are a bit out of focus and, thus, a little blurry/soft. One theory is that it is being done to reduce the screen door effect (seeing the individual pixels) and make the images on the AVP look smoother.

The slight blurring of the AVP may reduce the screen door effect as the gap between pixels is thinner on the OLED displays than on the MQ3’s LCDs. But jaggies and scintillation are still very visible on the AVP.

41:41 Closing Discussion: “Did Apple Move the Needle?”

The video wraps up with Jason asking the open-ended question, “Did Apple Move the Needle?” I discuss whether it will replace a cell phone, home monitor(s), laptop on the road, or home TV. I think you can guess that I am more than skeptical that the AVP now or in the future will change things for more than a very small fraction of the people who use cell phones, laptops, and TVs. As I say about some conference demos, “Not everything that would make a great theme park experience is something you will ever want in your home to use regularly.”

Appendix: Rumor Mill’s 5,000 Nits Apple Vision Pro

When I searched the Internet to see if anyone had independently reported on the brightness of the AVP, I got the Google search answer in big, bold letters: “5,000 Nits” (right). Then, I went to the source of this answer, and it was none other than the MIT Technology Review. I then thought they must be quoting the display’s brightness, not the headset’s, but it reports that it is a “50-fold improvement over Meta Quest 2,” which is ridiculous.

I see this all the time when companies quote a spec for the display device, and it gets reported as the headset’s brightness/nits to the eye. The companies are a big part of the problem because most headset makers won’t give a number for the eye’s brightness in their specs. I should note that with almost all headset optics, the peak nits in the center will be much higher than those in the periphery. Through the years, one thing I have found that all companies exaggerate in their marketing is the brightness, either in lumens for projectors or nits for headsets.

An LCOS or DLP display engine can output over a million nits into a waveguide, but that number is so big (almost never given) that it is not confused with the nits to the eye. Nits are a function of light output (measured in Lumens) and the ability to collimate the light (a function of the size of the light source and illumination optics).

The “5,000 nits” source was a tweet by Ross Young of DSCC. Part of the Tweet/X thread is copied on the right. A few respondents understood this could not be the nits to the eye, and a few responders understood that it could not be to the eye. Responder BattleZxeVR even got the part about the duty cycle being a factor, but that didn’t stop many other later responders from getting it wrong.

Citing some other publications that didn’t seem to understand the difference between nits-in versus nits-out:

Quoting from The Daejeon Chronicles (June 2023): Apple Vision Pro Screens: 5,000 Nits of Wholesome HDR Goodness (with my bold emphasis):

Dagogo Altraide of ColdFusion has this to say about the device’s brightness capability:

“The screens have 5,000 nits of peak brightness, and that’s a lot. The Meta Quest 2, for example, maxes out at about 100 nits of brightness and Sony’s PS VR, about 265 nits. So, 5,000 nits is crazy. According to display analyst Ross Young, this 5,000 nits of peak brightness isn’t going to blind users, but rather provide superior contrast, brighter colors and better highlights than any of the other displays out there today.”

Quoting from Mac Rumors (May 2023): Apple’s AR/VR Headset Display Specs: 5000+ Nits Brightness for HDR, 1.41-Inch Diagonal Display and More:

With ~5000 nits brightness or more, the AR/VR headset from Apple would support HDR or high dynamic range content, which is not typical for current VR headsets on the market. The Meta Quest 2, for example, maxes out around 100 nits of brightness and it does not offer HDR, and the HoloLens 2 offers 500 nits brightness. Sony’s PSVR 2 headset has around 265 nits of brightness, and it does have an advertised HDR feature when connected to an HDR display.

The flatpanelshd (June 2023): Apple Vision Pro: Micro-OLEDs with 3800×3000 pixels & 90/96Hz – a paradigm shift did understand that the 5,000 nist was the display device and not to the eye:

DSCC has previously said that the micro-OLED displays deliver over 5000 nits of brightness but a good portion of that is typically lost due to the lenses and the display driving method.

As I wrote in Apple Vision Pro (Part 1) – What Apple Got Right Compared to The Meta Quest Pro, Snazzy Labs had an excellent explanation of the issues with the applications shown by Apple at the AVP announcement (it is a fun and informative video). But in another otherwise excellent video, What Reviewers Aren’t Telling You About Apple Vision Pro, I have to give him credit for recognizing that the MIT Tech Review had confabulated the display’s brightness with the headset’s brightness. But then hazarded a guess that it would be “after the optics, I bet it’s around 1,000 nits.” His guess was “just a bit outside” by about 10x. I do not want to pick on Snazzy Labs, as I love the videos I have seen from them, but I want to point out how much even technically knowledgeable people without a background in optics underestimate the light losses in headset optics.

Apple Vision Pro (AVP), It Begins and iFixit’s “Extreme Unboxing”

4 February 2024 at 06:04

Introduction

Today, I picked up my Apple Vision Pro (AVP) at the Apple Store. I won’t bother you with yet another unboxing video. When you pick it up at the store, they give you a nice custom-made shopping bag for the AVP’s box (left). They give you about a 30-minute guided tour with a store-owned demo headset, and when you are all done with the tour, they give you yours in a sealed box.

iFixit asked if I would help identify some of the optics during their AVP “Extreme Unboxing” (it is Apple; we need a better word for “teardown”). I have helped iFixit in the past with their similar efforts on the Magic Leap One and Meta Quest Pro and readily agreed to help in any way that I could.

iFixit’s “Extreme Unboxing”

As per iFixit’s usual habit, they took the unboxing of a new product to the extreme. They published the first of several videos of their extreme unboxing of the AVP today (Feb. 3rd, 2023). You can expect more videos to follow.

Perhaps the most unexpected thing iFixit showed in the first iFixit video is that the Eyesight (front display) has more than a single lenticular lens in front of the Eyesight’s OLED display. There is a second lens-like element and/or a brightness enhancement film (BEF). BEF films a series of triangular refraction elements that act in one direction, similar to a lenticular lens.

iFixit also showed a glimpse of the AVP’s pancake optics and the OLED microdisplay used for each eye toward the end of the video. The AVP uses pancake optics as described in Apple Vision Pro (Part 4) – Hypervision Pancake Optics Analysis.

Closing

That’s it for today. I mostly wanted to let everyone know about the iFixit extreme unboxing. I have a lot of work to do to analyze the Apple Vision Pro.

Apple Vision Pro Part 6 – Passthrough Mixed Reality (PtMR) Problems

27 September 2023 at 05:09

Introduction

I planned to wrap up my first pass coverage of the Apple Vision Pro (AVP) with my summary and conclusions based on prior articles. But the more I thought about it, Apple’s approach to Passthrough Mixed Reality (PtMR) seems like it will be so egregiously bad that it should be broken out and discussed separately.

Apple Prioritized EyeSight “Gimmick” Over Ergonomics and Functionality

There are some features, particularly surrounding camera passthrough, where there should have been an internal battle between those who wanted the EyeSight™ gimmick and what I would consider more important functionality. The backers of EyeSight must have won and forced the horrible location of the passthrough cameras, optical distortion from the curved glass in front of all the forward-facing cameras and sensors, put a fragile piece of hard-to-replace glass on the front where it can be easily scratched and broken, and added weight to the front were it is least desired. Also, as discussed later, there are negative effects on the human visual system caused by misaligning the passthrough cameras with the eyes.

The negative effects of EyeSight are so bad for so many fundamental features that someone in power with little appreciation for the technical difficulties must have forced the decision (at least, that is the only way I can conceive of it happening).  People inside the design team must have known it would cause serious problems. Supporting passthrough mixed reality (PtMR) is hard enough without deliberately creating problems.

Meta Quest 3 Camera Location

As noted in Meta Quest Pro (Part 1) – Unbelievably Bad AR Passthrough, Meta is locating the soon-to-be-released Quest 3 main passthrough camera closer to the center of view of the eyes. Fixed cameras in front of the eyes won’t be perfect and will still require digital correction for better functional use. It does appear that Meta is taking the PtMR more seriously than it did with the Meta Quest Pro and Quest 2.

I’m going to be looking forward to getting a Meta Quest 3 to test out when it is released soon.

Definitions of AR/VR/MR and PtMR

The terms used to describe mixed reality have been very fluid over the last few years. Before the introduction of Hololens, Augmented reality meant any headset that displayed virtual content on a see-through display. For example, just before Hololens went on sale, Wired in 2015 titled their article (with my bold emphasis): Microsoft Shows HoloLens’ Augmented Reality Is No Gimmick. With the introduction of Hololens, the term “Mixed Reality” was used to distinguish AR headsets with SLAM to lock the virtual to the real world. “AR” headsets without SLAM are sometimes called AR Heads-Up Displays (HUDs), but these get confused with automotive HUDs. Many today refer to a see-through headset without SLAM as “AR” and one with SLAM as “MR,” whereas previously, the terms “AR” covered both with and without SLAM.

Now we have the added confusion of optical see-through (e.x. Hololens) and camera passthrough “Mixed Reality.” While they may be trying to accomplish similar capabilities, they are radically different in their capabilities. Rather than constantly typing “passthrough” before MR, I abbreviated it as PtMR.

In Optical AR, the Virtual Content Augments the Real World – With PtMR, the Real World Augments the Virtual Content

Optical MR prioritizes seeing the real world at the expense of the virtual content. The real world is in perfect perspective, at the correct focus distance, with no limitation by a camera or display on brightness, with zero lag, etc. If done well, there is minimal light blocking and distortion of the real world and little blocking of the real-world FOV.

PtMR, on the other hand, prioritizes virtual image quality at the expense of the real world, both in how things behave in 3-D space (focus perspective) and in image quality.

We are likely many decades away, if ever, from passing what Douglas Lanman of Meta calls their Visual Turing Test (see also the video linked here).

Meta’s demonstrations at Siggraph 2023 of their Flamera with perspective-correct passthrough and Butterscotch with vergence accommodation conflict served to show how far PtMR is from optical passthrough. They can only address each problem individually, each with a large prototype, and even then, there are severe restrictions. The Flamera has a very low-resolution passthrough, and Butterscotch only supports a 50-degree FOV.

It is also interesting that Butterscotch moves back from Half Dome 3’s electronic LCD variable focus to electro-mechanical focusing to address VAC. As reported in Mixed Reality News, “However, the technology presented problems with light transmission and image quality [of the electronic LCD approach], so Meta discarded it for Butterscotch Varifocal at the expense of weight and size.”

All of this work is to try and solve some of the many problems created by PtMR that don’t exist with optical MR. PtMR does not “solve” the issues with optical MR. It just creates a long list of massively hard new problems. Optical AR has issues with the image quality of the virtual world, very large FOV, and hard-edge occlusion (see my article Magic Leap 2 (Pt. 3): Soft Edge Occlusion, a Solution for Investors and Not Users). I often say, “What is hard in optical MR is easy in PtMR and vice versa.”

Demo or Die

Meta and others seem to use Siggraph to show off research work that is far from practical. As stated by Lanman of Meta, of their Flamera and Butterscotch VAC demos at Siggraph 2023, Meta’s Reality Labs has a “Demo or Die” philosophy. They will not be tipping off their competition on concepts they will use within a few years. To be clear, I’m happy to see companies showing off their technical prowess, but at the same time, I want to put it in perspective.

Cosmetic vs. Functional Passthrough PtMR

JayzTwoCents video on the HTC Vive XR Elite has a presentation by Phil on what he calls “3D Depth Projection” (others refer to it as “perspective correct“). In the video (sequence of clips below), Phil demonstrates that because the passthrough video was not corrected in scale, position, and perspective in 3-D space, it deprives him of hand-eye coordination to catch a bottle tossed to him.

As discussed in Meta Quest Pro (Part 1) – Unbelievably Bad AR Passthrough in the section The method in the Madness: MQP prioritizes 3-D spatial over image quality.

Phil demonstrated in the video (and in a sequence of clips below) that with the Meta Quest Pro, even though the image quality is much worse and distorted due to the 3D projection, he can at least catch the bottle.

I would classify the HTC Vive XR Elite as having a Cosmetic Passthrough.” While the image quality is better (but still not very good), it is non-functional. While Meta Quest Pro’s image quality is lousy, it is at least somewhat functional.

Something else to notice in the MQP frame sequence above is that there are both lag and accuracy errors in hand tracking.

Effects on Vision with Long-Term Use

It is less obvious that the human visual system will start adapting to any camera placement and then have to re-adapt after the headset is removed. This was briefly discussed in AVP Part 2 in the section titled Centering correctly for the human visual system, which references Steve Mann in his March 2013 IEEE Spectrum article, “What I’ve learned from 35 years of wearing computerized eyewear.” In the early days with Steve Mann, they had no processing power to attempt to move the effect of the camera images digitally. At the same time, I’m not sure how well the correction will work or how a distorted view will affect people’s visual perception during and after long exposure. As with most visual effects, it will vary from one individual to another.

Meta Flamera Light Field Camera at Siggraph 2023

As discussed in AVP Part 2 and Meta Quest Pro (Part 1) – Unbelievably Bad AR Passthrough, having the passthrough cameras as close as possible to being coaxial to the eyes (among other things) is highly desirable.

To reduce any undesired negative effects on human vision caused by cameras not aligning with the eyes, some devices, such as the Quest 2 and Quest Pro from Meta, use processing to create what I will call “virtual cameras” with a synthesized view for each eye. The farther the physical cameras are from the eye’s location, the larger the correction will be required and the larger the distortion in the final result.

Meta at Siggraph 2023 presented the paper “Perspective-Correct VR Passthrough Without Reprojection” (and IEEE article) and showed their Flamera prototype with a light field camera (right). The figure below shows how the camera receives light rays from the same angle as the eye with the Light Field Passthrough Camera.

Below are a couple of still frames (with my annotations) from the related video that show how, with the Meta Quest 2, the eye and camera views differ (below left), resulting in a distorted image (below right). The distortion/error as the distance from the eye decreases.

It should be noted that while Flamera’s light field camera approach addresses the angular problems of the camera location, it does so with a massive loss in resolution (by at least “n,” where n is the number of light field subviews). So, while interesting in terms of research and highlighting the problem, it is still a highly impractical approach.

The Importance of “Perspective Correct” PtMR

In preparing this article, I returned to a thread on Hacker News about my Meta Quest Pro (Part 1) – Unbelievably Bad AR Passthrough article. In my article, I was trying to explain why there was a “The method in the Madness: MQP prioritizes 3-D spatial over image quality” of why Meta was distorting the image.

Poster Zee2 took exception to my article and seemed to feel I was understating the problem of 3-D perspective. I think Zee2 missed what I meant by “pyrrhic victory.” I was trying to say they were correct to address the 3D depth issue but that doing so with a massive loss in image quality was not the solution. I was not dismissing the importance of perspective-correct passthrough.

Below, I am copying his comment from that thread (with my bold highlighting)), including a quote from my article. Interestingly, Zee2 comments on Varjo having good image quality with its passthrough, but it is not perspective-correct.

I also really don’t know why he [refering to my article] decided to deemphasize the perspective and depth correctness so much. He mentions it here:

>[Quoting Meta Quest Pro (Part 1) – Unbelievably Bad AR Passthrough] In this case, they were willing to sacrifice image quality to try to make the position of things in the real world agree with where virtual objects appear. To some degree, they have accomplished this goal. But the image quality and level of distortion, particularly of “close things,” which includes the user’s hands, is so bad that it seems like a pyrrhic victory.

I don’t think this is even close to capturing how important depth and perspective correct passthrough is.

Reprojecting the passthrough image onto a 3D representation of the world mesh to reconstruct a perspective-correct view is the difference between a novelty that quickly gives people headaches and something that people can actually wear and look through for an extended period of time.

Varjo, as a counterexample, uses incredibly high-resolution cameras for their passthrough. The image quality is excellent, text is readable, contrast is good, etc. However, they make no effort to reproject their passthrough in terms of depth reconstruction. The result is a passthrough image that is very sharp, but is instantly, painfully, nauseatingly uncomfortable when walking around or looking at closeup objects alongside a distant background.

The importance of depth-correct passthrough reprojection (essentially, spacewarp using the depth info of the scene reconstruction mesh) absolutely cannot be understated and is a make or break for general adoption of any MR device. Karl is doing the industry a disservice with this article.

From: Hacker News Meta Quest Pro – Bad AR Passthrough comment by Zee2 

Does the AVP have Cosmetic or Functional PtMR or Something Else?

With the AVP’s passthrough cameras being so poorly located (thanks to EyeSight™), severe distortion would seem inevitable to support functional PtMR. I don’t believe there is some magic (perhaps a pun on Magic Leap) that Apple could employ that Meta couldn’t that would simultaneously support good image quality without serious distortion with the terrible camera placement due to the Eyesight(tm) feature.

So, based on the placement of the cameras, I have low expectations for the functionality of the AVP’s PtMR. The “instant experts” who got to try out the AVP would be more impressed by a cosmetically better-looking passthrough. Since there are no reports of distortion like the MQP, I’m left to conclude that, at least for the demo, they were only doing a cosmetic passthrough.

As I often say, “Nobody will volunteer information, but everyone will correct you.” Thus, it is better to take a position based on the current evidence and then wait for a correction or confirmation from the many developers with AVPs who read this blog.

Conclusion

I’m not discounting the technical and financial power of Apple. But then I have been writing about the exaggerated claims for Mixed Reality products by giant companies such as Google, Meta, and Microsoft, not to mention the many smaller companies, including the over $3B spent by Magic Leap, for the last ten years. The combined sunk cost of about $50B of these companies, not including Apple. As I’m fond of saying, “If all it took were money and smart people, it would already be solved.

Apple doesn’t fully appreciate the difficulties with Passthrough Mixed Reality, or they wouldn’t prioritize the EyeSight gimmick over core capabilities. I’m not saying the AVP would work well for passthrough AR without EyeSight, but it is hard enough without digging big technical holes to support a novelty feature.

Apple Vision Pro (Part 5C) – More on Monitor Replacement is Ridiculous

21 August 2023 at 00:17

Introduction

In this series about the Apple Vision Pro, this sub-series on Monitor Replacement and Business/Text applications started with Part 5A, which discussed scaling, text grid fitting, and binocular overlap issues. Part 5B starts by documenting some of Apple’s claims that the AVP would be good for business and text applications. It then discusses the pincushion distortion common in VR optics and likely in the AVP and the radial effect of distortion on resolution in terms of pixels per degree (ppd).

The prior parts, 5A, and 5B, provide setup and background information for what started as a simple “Shootout” between a VR virtual monitor and physical monitors. As discussed in 5A, my office setup has a 34″ 22:9 3440×1440 main monitor with a 27″ 4K (3840×2160) monitor on the right side, which is a “modern” multiple monitor setup that costs ~$1,000. I will use these two monitors plus a 15.5″ 4K OLED Laptop display to compare to the Meta Quest Pro (MQP) since I don’t have an Apple AVP and then extrapolate the results to the AVP.

My Office Setup: 34″ 22:9 3440×1440 (left) and 27″ 4K (right)

I will be saving my overall assessment, comments, and conclusions about VR for Office Applications for Part 5D rather than somewhat burying them at the end of this article.

Office Text Applications and “Information Density” – Font Size is Important

A point to be made by using spreadsheets to generate the patterns is that if you have to make text bigger to be readable, you are lowering the information density and are less productive. Lowering the information density with bigger fonts is also true when reading documents, particularly when scanning web pages or documents for information.

Improving font readability is not solely about increasing their size. VR headsets will have imperfect optics that cause distortions, focus problems, chromatic aberrations, and loss of contrast. These issues make it harder to read fonts below a certain size. In Part 5A, I discussed how scaling/resampling and the inability to grid fit when simulating virtual monitors could cause fonts to appear blurry and scintillate/wiggle when locked in 3-D space, leading to reduced readability and distraction.

Meta Quest Pro Horizon Worktop Desktop Approach

As discussed in Part 5A, with Meta’s Horizon Desktop, each virtual monitor is reported to Windows as 1920 by 1200 pixels. When sitting at the nominal position of working at the desktop, the center virtual monitor fills about 880 physical pixels of the MQP’s display. So roughly 1200 virtual pixels are resampled into 880 vertical pixels in the center of view or by about 64%. As discussed in Part 5B, the scaling factor is variable due to severe pincushion distortion of the optics and the (impossible to turn off) curved screen effect in Meta Horizons.

The picture below shows the whole FOV captured by the camera before cropping shot through the left eye. The camera was aligned for the best image quality in the center of the virtual monitor.

Analogous to Nyquist sampling, when you scale pixel rendered image, you want about 2X (linearly) the number of pixels in the display of the source image to render it reasonably faithfully. Below left is a 1920 by 1200 pixel test pattern (a 1920×1080 pattern padded on the top and bottom), “native” to what the MQP reports to Windows. On the right is the picture cropped to that same center monitor.

1920×1200 Test Pattern
Through the optics picture

The picture was taken at 405mp, then scaled down by 3X linearly and cropped. When taking high-resolution display pictures, some amount of moiré in color and intensity is inevitable. The moiré is also affected by scaling and JPEG compression.

Below is a center crop from the original test pattern that has been 2x pixel-replicated to show the detail in the pattern.

Below is a crop from the full-resolution image with reduced exposure to show sub-pixel (color element) detail. Notice how the 1-pixel wide lines are completely blurred, and the test is just becoming fully formed at about Arial 11 point (close to, but not the same scale as used in the MS Excel Calibri 11pt tests to follow). Click on the image to see the full resolution that the camera captured (3275 x 3971 pixels).

The scaling process might lose a little detail for things like pictures and videos of the real world (such as the picture of the elf in the test pattern), but it will be almost impossible for a human to notice most of the time. Pictures of the real world don’t have the level of pixel-to-pixel contrast and fine detail caused by small text and other computer-generated objects.

Meta Quest Pro Virtual Versus Physical Monitor “Shootout”

For the desktop “shootout,” I picked the 34” 22:9 and 27” 4k monitors I regularly use (side by side as shown in Part 5A), plus a Dell 15.5” 4K laptop display. An Excel spreadsheet is used with various displays to demonstrate the amount of content that can be seen at one time on a screen. The spreadsheet allows for flexible changing of how the screen is scaled for various resolutions and text sizes, and the number of cells measures the information density. For repeatability, a screen capture of each spreadsheet was taken and then played back in full-screen mode (Appendix 1 includes the source test patterns)

The Shootout

The pictures below show the relative FOVs of the MQP and various physical monitors taken with the same camera and lens. The camera was approximately 0.5 meters from the center of the physical monitors, and the headset was at the initial position at the MQP’s Horizon Desktop. All the pictures were cropped to the size of a single physical or virtual monitor.

The following is the basic data:

  • Meta Quest Pro – Central Monitor (only) ~43.5° horizontal FOV. Used an 11pt font with Windows Display Text Scaling at 150% (100% and 175% also taken and included later)
  • 34″ 22:9 3440×1440 LCD – 75° FOV and 45ppd from 0.5m. 11pt font with 100% scaling
  • 27″ 4K (3840 x 2160) LCD – 56° FOV and 62ppd from 0.5m. 11pt font with 150% scaling (results in text the same size at the 34″ 3440×1400 at 100% – 2160/1440 = 150%)
  • 15.5″ 4K OLED – 32° FOV from 0.5m. Shown below is 11pt with 200% scaling, which is what I use on the laptop (a later image shows 250% scaling, which is what Windows “recommends” and would result in approximately the same size fonts at the 34″ 22:9 at 100%).
Composite image showing the relative FOV – Click to see in higher resolution (9016×5641 pixels)

The pictures below show the MQP with MS Windows display text scaling set to 100% (below left) and 175% (below middle). The 175% scaling would result in fonts with about the same number of pixels per font as the Apple Vision Pro (but with a larger angular resolution). Also included below (right) is the 15.5″ 4K display with 250% scaling (as recommended by Windows).

MQP -11pt scaled=100%
MQP – 11pt scaled=175%
15.5″ – 11pt scale=250%

The camera was aimed and focused at the center of the MQP, the best case for it, as the optical quality falls off radially (discussed in Part 5B). The text sharpness is the same for the physical monitors from center to outside, but they have some brightness variation due to their edge illumination.

Closeup Look at the Displays

Each picture above was initially taken 24,576 x 16,384 (405mp) by “pixel shifting” the 45MP R5 camera sensor to support capturing the whole FOV while capturing better than pixel-level detail from the various displays. In all the pictures above, including the composite image with multiple monitors, each image was reduced linearly by 3X.

The crops below show the full resolution (3x linearly the images above) of the center of the various monitors. As the camera, lines, and scaling are identical, the relative sizes are what you would see looking through the headset for the MQP sitting at the desktop and the physical monitors at about 0.5 meters. I have also included a 2X magnification of the MQP’s images.

With Windows 100% text scaling, the 11pt font on the MQP is about the same size as it is on the 34” 22:9 monitor at 100%, the 27” 4K monitor at 150% scaling, and the 15.5” 4K monitor at 250% scaling. But while the fonts are readable on the physical monitor, they are a blurry mess on the MQP at 100%. The MQP at 150% and 175% is “readable” but certainly does not look as sharp as the physical monitors.

Extrapolating to Apple Vision Pro

Apple’s AVP has about 175% linear pixel density of the MQP. Thus the 175% case gives a reasonable idea of how text should look on the AVP. For comparison below, the MQP’s 175% case has been scaled to match the size of the 34” 22:9 and 27” 4K monitors at 100%. While the text is “readable” and about the same size, it is much softer/blurrier than the physical monitor. Some of this softness is due to optics, but a large part is due to scaling. While the AVP may have better optics and a text rendering pipeline, they still don’t have the resolution to compete on content density and readability with a relatively inexpensive physical monitor.

Reportedly, Apple Vision Pro Directly Rendering Fonts

Thomas Kumlehn had an interesting comment on Part 5B (with my bold highlighting) that I would like to address:

After the VisionPro keynote in a Developer talk at WWDC, Apple mentioned that they rewrote the entire render stack, including the way text is rendered. Please do not extrapolate from the text rendering of the MQP, as Meta has the tech to do foveated rendering but decided to not ship it because it reduced FPS.

From Part 5A, “Rendering a Pixel Size Dot.

Based on my understanding, the AVP will “render from scratch” instead of rendering an intermediate image that is then rescaled as is done with the MQP discussed in Part 5A. While rendering from scratch has a theoretical advantage regarding text image quality, it may not make a big difference in practice. With an ~40 pixels per degree (ppd) display, the strokes and dots of what should be readable small text will be on the order of 1 pixel wide. The AVP will still have to deal with approximately pixel-width objects straddling four or more pixels, as discussed in Part 5A: Simplified Scaling Example – Rendering a Pixel Size Dot.

Some More Evaluation of MQP’s Pancake Optics Using immersed Virtual Monitor

I wanted to evaluate the MQP pancake optics more than I did in Part 5B. Meta’s Horizon Desktop interface was very limiting. So I decided to try out immersed Virtual Desktop software. Immersed has much more flexibility in the resolution, size, placements, and the ability to select flat or curved monitors. Importantly for my testing, I could create a large, flat virtual 4K monitor that could fill the entire FOV with a single test pattern (the pattern is included in Appendix 1).

Unfortunately, while the immersed software had the basic features I wanted, I found it difficult to precisely control the size and positioning of the virtual monitor (more on this later). Due to these difficulties, I just tried to fill the display with the test pattern with only a roughly perpendicular to the headset/camera monitor. It was a painfully time-consuming process, and I never could get the monitor where it seems perfectly perpendicular.

Below is a picture of the whole (camera) FOV taken at 405mp and then scaled down to 45mp. The image is a bit underexposed to show the sub-pixel (color) detail when viewed at full resolution. In taking the picture, I determined that the MQPs pancake optics focus appears to be a “dished,” with the focus in the center slightly different than on the outsides. The picture was taken focusing between the center and outside focus and using f11 to increase the photograph’s depth of focus. For a person using the headset, this dishing of the focus is likely not a problem as their eye will refocus based on their center of vision.

As discussed in Part 5B, the MQP’s pancake optics have severe pincushion distortion, requiring significant digital pre-correction to make the net result flat/rectilinear. Most notably, the outside areas of the display have about 1/3rd the linear pixel per degree of the center.

Next are shown 9 crops from the full-resolution (click to see) picture at the center, the four corners, top, bottom, left, and right of the camera’s FOV.

The main thing I learned out of this exercised is the apparent dish in focus of the optics and the fall off in brightness. I had determine the change in resolution in the studies shown in Part 5B.

Some feedback on immersed (and all other VR/AR/MR) virtual monitor placement control.

While the immersed had the features I wanted, it was difficult to control the setup of the monitors. The software feels very “beta,” and the interface I got differed from most of the help documentation and videos, suggesting it is a work in progress. In particular, I could’t figure out how to pin the screen, as the control for pinning shown in the help guides/videos didn’t seem to exist on my version. So I had to start from scratch on each session and often within a session.

Trying to orient and resize the screen with controllers or hand gestures was needlessly difficult. I would highly suggest immersed look at some of the 3-D CAD software controls of 3-D models. For example, it would be great to have a single (virtual) button that would position the center monitor directly in front and perpendicular to the user. It would also be a good idea to allow separate control for tilt, virtual distance, and zoom/resize while keeping the monitor centered.

It seemed to be “aware” of things in the room which only served to fight what I wanted to do. I was left contorting my wrist to try and get the monitor roughly perpendicular and then playing with the corners to try and both resized and center the monitor. The interface also appears to conflate “resizing” with moving the monitor closer. While moving the virtual monitor closer or resizing affect the size of everything, the effect will be different when the head moves. I would have a home (perpendicular and center) “button,” and then left-right, up-down, tilt, distance, and size controls.

To be fair, I wanted to set up the screen for a few pictures, and I may have overlooked something. Still, I found the user interface could be vastley better for the setting up the monitors, and the controller or gesture monitor size and positioning were a big fail in my use.

BTW, I don’t want to just pick on immersed for this “all-in-one” control problem. I have found it a pain on every VR and AR/MR headset I have tried that supports virtual monitors to give the user good simple intuitive controls for placing the monitors in the 3D space. Meta Horizons Desktop goes to the extreme of giving no control and overly curved screens.

Other Considerations and Conclusions in Part 5D

This series-within-a-series on the VR and the AVP use as an “office monitor replacement” has become rather long with many pictures and examples. I plan to wrap up this series within the series on the AVP with a separate article on issues to consider and my conclusions.

Appendix 1: Test Patterns

Below is a gallery of PNG file test patterns used in this article. Click on each thumbnail to see the full-resolution test pattern.

22:9 3440×1440 100% 11pt
MQP 1920×1200 100% 11pt
MQP 1920×1200 150% 11pt
MQP 1920×1200 175% 11pt
4K 150% 11pt
4K 200% 11pt
4K 250% 11pt
MQP 1920×1200 “Tuff Test” on Black
MQP 3840×2160 “immersed” lens test

Appendix 2: Some More Background Information

More Comments on Font Sizes with Windows

As discussed in Appendix 3: Confabulating typeface “points” (pt) with With Pixels – A Brief History, at font “point” is defined as 1/72nd of an inch (some use 1/72.272 or thereabout – it is a complicated history). Microsoft throws the concept of 96 dots per inch (dpi) as 100%. But it is not that simple.

I wanted to share measurements regarding the Calibri 11pt font size. After measuring it on my monitor with a resolution of 110 pixels per inch (PPI), I found that it translates to approximately 8.44pt (8.44/72 inches). However, when factoring in the monitor PPI of 110 and Windows DPI of 96, the font size increases to ~9.67pt. Alternatively, when using a monitor PPI of 72, the font size increases to ~12.89pt. Interestingly, if printed assuming a resolution of 96ppi, the font reaches the standard 11pt size. It seems Windows apply some additional scaling on the screen. Nevertheless, I regularly use the 11pt 100% font size on my 110ppi monitor, which is the Windows default in Excel and Word, and it is also the basis for the test patterns.

How pictures were shot and moiré

As discussed in 5A’s Appendix 2: Notes on Pictures, some moiré issues will be unavoidable when taking high-resolution pictures of a display device. As noted in that Appendix, all pictures in Lens Shootout were taken with the same camera and lens, and the original images were captured at 405 megapixels (Canon R5 “IBIS sensor shift” mode) and then scaled down by 3X. All test patterns used in this article are included in the Appendix below.

Apple Vision Pro (Part 5B) – More on Monitor Replacement is Ridiculous.

10 August 2023 at 03:21

Introduction – Now Three Parts 5A-C

I want to address feedback in the comments and on LinkedIn from Part 5A about whether Apple claimed the Apple Vision Pro (AVP) was supposed to be a monitor replacement for office/text applications. Another theory/comment from more than one person is that Apple is hiding the good “spatial computing” concepts so they will have a jump on their competitors. I don’t know whether Apple might be hiding “the good stuff,” but it would seem better for Apple to establish the credibility of the concept. Apple is, after all, a dominant high-tech company and could stomp any competitor.

Studying the MQP’s images in more detail, it was too simplistic to use the average pixels per degree (ppd), given by dividing the resolution into the FOV of the MQP (and likely the AVP).

As per last time, since I don’t have an AVP, I’m using the Meta Quest Pro (MQP) and extrapolating the results to the AVP’s resolution. I will show a “shootout” comparing the text quality of the MQP to existing computer monitors. I will then wrap up with miscellaneous comments and my conclusions.

I have also included some discussion of Gaze-Contingent Ocular Parallax (GCOP) from some work by Stanford Computational Imaging Labs (SCIL) that a reader of this blog asked about. These videos and papers suggest that some amount of depth perception is conveyed to a person by the movement of each eye in addition to vergence (biocular disparity) and accommodation (focus distance).

I’m pushing out a set of VR versus Physical Monitor “Shootout” pictures and some overall conclusions to Part 5C to discuss the above.

Yes, Apple Claimed the AVP is a Monitor Replacement and Good for High-Resolution Text

Apple Vision Pro Concept

In Apple Vision Pro (Part 5A) – Why Monitor Replacement is Ridiculous, I tried to lay a lot of groundwork for why The Apple Vision Pro (AVP), and VR headsets in general, will not be a good replacement for a monitor. I thought it was obvious, but apparently not, based on some feedback I got.

So to be specific and quote directly from Apple’s WWDC 2023 presentation (YouTube transcript) with timestamps with my bold emphasis added and in-line comments about resolution are given below:

1:22:33 Vision Pro is a new kind of computer that augments reality by seamlessly blending the real world with the digital world.

1:31:42 Use the virtual keyboard or Dictation to type. With Vision Pro, you have the room to do it all. Vision Pro also works seamlessly with familiar Bluetooth accessories, like Magic Trackpad and Magic Keyboard, which are great when you’re writing a long email or working on a spreadsheet in Numbers.

Seamless makes many lists of the most overused high-tech marketing words. Marketeers seem to love it because it is both imprecise, suggests it works well, and unfalsifiable (how do you measure “seamless?”). Seamlessly was used eight times in the WWDC23 to describe the AVP and by Meta to describe the Meta Quest Pro (MQP) twice at Meta Connect 2022. From Meta Quest Pro (Part 1) – Unbelievably Bad AR Passthrough, Meta also used “seamless” to describe the MQP’s MR passthrough:

Apple claims the AVP is good for text-intensive “writing a long email or working on a spreadsheet in numbers.”

1:32:10 Place your Mac screen wherever you want and expand it–giving you an enormous, private, and portable 4K display. Vision Pro is engineered to let you use your Mac seamlessly within your ideal workspace. So you can dial in the White Sands Environment, and use other apps in Vision Pro side by side with your Mac. This powerful Environment and capabilities makes Apple Vision Pro perfect for the office, or for when you’re working remote.

Besides the fact that it is not 4K wide, it is stretching those pixels over about 80 degrees so that there are only about 40 pixels per degree (ppd), much lower than typically with a TV or movie theater. There are the issues discussed in Part 5A that if you are going to make the display stationary in 3-D, the virtual monitor must be inscribed in the viewable area of the physical display with some margin for head movement, and content must be resampled, causing a loss of resolution. Movies are typically in a wide format, whereas the AVP’s FOV is closer to square. As discussed in Apple Vision Pro (Part 3) – Why It May Be Lousy for Watching Movies On a Plane, you have the issue that the AVP’s horizontal ~80° FOV where movies are designed for about 45 degrees.

Here, Apple claims that the “Apple Vision Pro; perfect for the office, or for when you’re working remote.”

1:48:06 And of course, technological breakthroughs in displays. Your eyes see the world with incredible resolution and color fidelity. To give your eyes what they need, we had to invent a display system with a huge number of pixels, but in a small form factor. A display where the pixels would disappear, creating a smooth, continuous image.

The AVP’s expected average of 40ppd is well below the angular resolution “where the pixels would disappear.” It is below Apple’s “retinal resolution.” If the AVP has a radial distortion profile similar to the MQP (discussed in the next section), then the center of the image will have about 60ppd or almost “retinal.” But most of the image will have jaggies that a typical eye can see, particularly when they move/ripple causing scintillation (discussed in part 5A).

1:48:56 We designed a custom three-element lens with incredible sharpness and clarity. The result is a display that’s everywhere you look, delivering jaw-dropping experiences that are simply not possible with any other device. It enables video to be rendered at true 4K resolution, with wide color and high dynamic range, all at massive scale. And fine text looks super sharp from any angle. This is critical for browsing the web, reading messages, and writing emails.

WWDC 2023 video at 1:56:08 with Excel shown

As stated above, the video will not be a “true 4K resolution.” Here is the claim, “fine text looks super sharp from any angle,” which is impossible with resampled text onto 40ppd displays.

1:56:08 Microsoft apps like Excel, Word, and Teams make full use of the expansive canvas and sharp text rendering of Vision Pro.

Here again, is the claim that there will be “sharp text” in text-intensive applications like Excel and Word.

I’m not sure how much clearer it can be that Apple was claiming that the AVP would be a reasonable monitor replacement, used even when a laptop display is present. Also, they were very clear that the AVP would be good for heavily text-based applications.

Meta Quest Pro (likely AVP) Pincushion Distortion and its Affect on Pixels Per Degree (ppd)

While I was aware, as discussed in Meta Quest Pro (Part 1) – Unbelievably Bad AR Passthrough, that the MQP, like almost all VR optics, had a signification pincushion distortion, it didn’t quantify the amount of distortion and its effect on the angular resolution aka ppd. Below is the video capture from the MQP developers app on the left, and the resultant image is seen through the optics (middle).

Particularly note above how small the white wall to the left of the left bookcase is relative to its size after the optics; it looks more than 3X wide.

For a good (but old) video explaining how VR headsets map source pixels into the optics (among other concepts), I recommend watching How Barrel Distortion Works on the Oculus Rift. The image on the right shows how equal size rings in the display are mapped into ever-increasing width rings after the optics with a severe pincushion distortion.

Mapping Pixels Per Degree (ppd)

I started with a 405mp camera picture through the MQP optics (right – scaled down 3x linearly), where I could see most of the FOV and zoom in to see individual pixels. I then picked a series of regions in the image to evaluate. Since the pixels in the display device are of uniform size, any size change in their size/spacing must be due to the optics.

The RF16f2.8 camera lens has a known optical barrel distortion that was digitally corrected by the camera, so the camera pixels are roughly linear. The camera and lens combination has a horizontal FOV of 98 degrees and 24,576 pixels or ~250.8ppd.

The MQP display processing pre-compensates for the optics plus adds a cylindrical curvature effect to the virtual monitors. These corrections change the shape of objects in the image but not the physical pixels.

The cropped sections below demonstrate the process. For each region, 8 by 8 pixels were marked with a grid. The horizontal and vertical width of the 8 pixels was counted in terms of the camera pixels. The MQP display is rotated by about 20 degrees to clear the nose of the user, so the rectangular grids are rotated. In addition to the optical distortion in size, chroma aberrations (color separation) and focus worsen with increasing radii.

The image below shows the ppd at a few selected radii. Unlike the Oculus Rift video that showed equal rings, the stepping between these rings below is unequal. The radii are given in terms of angular distance from the optical center.

The plots below show the ppd verse radius for the MQP (left); interestingly, the relationship turns out to be close to linear. The right-hand plot assumes the AVP has a similar distortion profile and FOV, the l but three times the pixels, as reported. It should be noted that ppd is not the only factor affecting resolution; other factors include focus, chroma aberrations, and contrast which worsen with increasing radii.

The display on the MQP is 1920×1800 pixels, and the FOV is about 90° per eye diagonally across a roughly circular image, which works out to about 22 to 22.5 ppd. The optical center has about 1/3rd higher ppd with the pincushion distortion optics. For the MPQ Horizon Desktop application shown, the center monitor is mostly within the 25° circle, where the ppd is at or above average.

Gaze-Contingent Ocular Parallax

While a bit orthogonal to the discussion of ppd and resolution, Gazed-Contingent Ocular Parallax (GCOP) is another issue that may cause problems. A reader, VR user, claims to have noticed GCOP brought to my attention the work of the Stanford Computational Imaging Lab’s (SCIL) work in GCOP. SCIL has put out Multiple videos and articles, including Eye Tracking Revisited by Gordon Wetzstein and Gaze-Contingent Ocular Parallax Rendering for Virtual Reality (associated paper link). I’m a big fan of Wetzstein’s general presentations; per his usual standard, his video explains the concept and related issues well.

The basic concept is that because the center of projection (where the image land on the retina) and center of rotation of the eye are different, the human visual system can detect some amount of 3-D depth in each eye. A parallax and occlusion difference occurs when the eye moves (stills from some video sequences below). Since the eyes constantly move and fixate (saccades), depth can be detected.

GCOP may not be as big a factor as vergence and accommodation. I put it in the category of one of the many things that can cause people to perceive that they are not looking at the real world and may cause problems.

Conclusion

The marketing spin (I think I have heard this before) on VR optics is that they have “fixed foveated optics” in that there is a higher resolution in the center of the display. There is some truth that severe pincushion optical distortion improves the pixel density in the center, but it makes a mess of the rest of the display.

While MQP’s optics have a bigger sweet spot, and the optical quality falls off less rapidly than the Quest 2’s Fresnel optics, they are still very poor by camera standards (optical diagram for the 9-element RF16f2.8 lens, a very simple camera lens, used to take the main picture on the right). VR optics must compromise due to space, cost, and, perhaps most importantly, supporting a very wide FOV.

With a monitor, there is only air between the eye and the display device with no loss of image quality, and there is no need to resample the monitor’s image when the user’s head moves like there is with a VR virtual monitor.

As the MQP other pancake optics and most, if not all, other VR optics have major pincushion distortion; I fully expect the AVP will also. Regardless of the ppd, however, the MQP virtual monitor’s far left and right sides become difficult to read due to other optical problems. The image quality can be no better than its weakest link. If the AVP has 3X the pixels and roughly 1.75x the linear ppd, the optics must be much better than the MQP to deliver the same small readable text that a physical monitor can deliver.

Apple Vision Pro (Part 5A) – Why Monitor Replacement is Ridiculous

5 August 2023 at 17:53

Introduction

As I wrote in Apple Vision Pro (Part 1) regarding the media coverage of the Apple Vision Pro, “Unfortunately, I saw very little technical analysis and very few with deep knowledge of the issues of virtual and augmented reality. At least they didn’t mention what seemed to me to be obvious issues and questions.

I have been working for the last month on an article to quantify why it is ridiculous to think that a VR headset, even one from Apple, will be a replacement for a physical monitor. In writing the article, if felt the need to include a lot of background material and other information as part of the explanation. As the article was getting long, I decided to break it into two parts, this being the first part.

The issues will be demonstrated using the Meta Quest Pro (MQP) because that is the closest headset available, and it also claims to be for monitor replacement and uses similar pancake optics. I will then translate these results to the higher, but still insufficient, resolution of the Apple Vision Pro (AVP). The AVP will have to address all the same issues as the MQP.

Office applications, including word processing, spreadsheets, presentations, and internet browsing, mean dealing with text. As this article will discuss, text has always been treated as a special case with some “cheating” (“hints” for grid fitting) to improve sharpness and readability. This article will also deal with resolution issues with trying to fit a virtual monitor in a 3-D space.

I will be for this set of articles suspending my disbelief in many other human factor problems caused by trying to simulate a fixed monitor in VR to concentrate on the readability of text.

Back to the Future with Very Low Pixels Per Degree (ppd) with the Apple Vision Pro

Working on this article reminded me of lessons learned in the mid-1980s when I was the technical leader of the TMS34010, the first fully programmable graphics processor. The TMS340 development started in 1982 before an Apple Macintosh (1984) or Lisa (1983) existed (and they were only 1-bit per pixel). But like those products, my work on the 34010 was influenced by Xerox PARC. At that time, only very expensive CAD and CAM systems had “bitmapped graphics,” and all PC/Home Computer text was single-size and monospaced. They were very low resolution if they had color graphics (~320×200 pixels). IBM introduced VGA (640×480) and XGA (1024×768) in 1987, which were their first IBM PC square pixel color monitors.

The original XGA monitor, considered “high resolution” at the time, had a 16” diagonal and 82ppi, which translated 36 to 45 pixels per degree (ppd) from 0.5 meters to 0.8 meters (typical monitor viewing distance), respectively. Factoring in the estimated FOV and resolutions, the Apple Vision Pro is between 35 and 40 ppd or about the same as a 1987 monitor.

So it is time to dust off the DeLorean and go Back to the Future of the mid-1980s and the technical issues with low ppd displays. Only it is worse this time because, in the 1980s, we didn’t have to resample/rescale everything in 3-D space when the user’s head moves to give the illusion that the monitor isn’t moving.

For more about my history in 1980s computer graphics and GPUs, see Appendix 1: My 1980s History with Bitmapped Fonts and Multiple Monitors.

The question is, “Would People?” Not “Could People?” Use an Apple Vision Pro (AVP) as a Computer Monitor

With their marketing and images (below), Apple and Meta suggest that their headsets will work as a monitor replacement. Yes, they will “work” as a monitor if you are desperate and have nothing else, but having multiple terrible monitors is not a solution many people will want. These marketing concepts fail to convey that each virtual monitor will have low effective resolution forcing the text to be blown up to be readable and thus have less content per monitor. They also fail to convey that the text looks grainy and shimmers (more on this in a bit).

Meta Quest Pro (left) and Apple Vision Pro (right) have similar multiple monitor concepts.

Below is a through-the-lens picture of MQP’s Horizons Virtual Desktop. t was taken through the left eye’s optics with the camera centered for best image quality and showed more of the left side of the binocular FOV. Almost all the horizontal FOV for the left eye is shown in the picture, but the camera slightly cuts off the top and bottom.

MQP Horizon Desktop – Picture via the Left Eye Optics (camera FOV 80°x64°)

Below for comparison is my desktop setup with a 34” 22:9 3440×1400 monitor on the left and a 27” 4K monitor on the right. The combined cost of the two monitors is less than $1,000 today. The 22:9 monitor display setting is 100% scale (in Windows display settings) and has 11pt fonts in the spreadsheet. The righthand monitor is set for 150% scaling with 11pt fonts netting fonts that are physically the same size.

My office setup – 34” 22:9 3440×1440 (110 PPI) widescreen (left) & 27” 16:9 4K (163 PPI) Monitor (right)

Sitting 0.5 to 0.8 meters away (typical desktop monitor distance), I would judge the 11pt font on either of the physical monitors as much more easily readable than the 11pt font on the Meta Quest Pro with the 150% scaling, even though the MQP’s “11pt” is angularly about 1.5x bigger (as measured via the camera). The MQP’s text is fuzzier, grainier, and scintillates/shimmers. I could over six times the legible text on the 34” 22:9 monitor and over four times on the 27” 4K as the MQP. With higher angular resolution, the AVP will be better than the MQP but still well below the amount of legible text.

Note on Window’s Scaling

In Window, 100% means a theoretical 96 dots per inch. Windows factors in the information reported by the monitor to it (in this case, from the MQP’s software) give a “Scale and Layout” recommendation (right). The resolution reported to Windows by the MQP’s Horizon’s virtual monitor is 1920×1200, and the recommended scaling was 150%. This setting is what I used for most pictures other than for the ones called out as being at 100% or 175%.

For more on the subject of how font “points” are defined, see Appendix 3: Confabulating typeface “points” (pt) with With Pixels – A Brief History.

Optics

I’m not going to go into everything wrong with VR optics, and this article deals with being able to read text in office applications. VR optics have a lot of constraints in terms of cost, space, weight, and wide FOV. While pancake optics are a major improvement over the more common Fresnel lenses, to date, they still are poor optically (we will have to see about the AVP).

While not bad in the center of the FOV, they typically have severe pincushion distortion and chroma (color) aberrations. Pancake optics are more prone to collecting and scattering light, causing objects to glow on dark backgrounds, contrast reduction, and ghosts (out-of-focus reflection). I discussed these issues with Pancake Optics in Meta (aka Facebook) Cambria Electrically Controllable LC Lens for VAC. With computer monitors, there are no optics to cause these problems.

Optical Distortion

As explained in Meta Quest Pro (Part 1) – Unbelievably Bad AR Passthrough, the Meta Quest Pro rotates the two displays for the eyes ~20° to clear the nose. The optics also have very large pincushion distortion. The display processor on the MQP pre-corrects digitally for the display optics’ severe pincushion distortion. This correction comes at some loss of fidelity in the resampling process.

The top right image shows the video feed to the displays. The distortion and rotation have been digitally corrected in the lower right image, but other optical problems are not shown (see through-the-lens pictures in this art cle).

There is also an optical “cropping” of the left and right eye displays, indicated by the Cyan and Red dashed lines, respectively. The optical cropping shown is based on my observations and photographs.

The pre-distortion correction is certainly going to hurt the image quality. It is likely that the AVP, using similar pancake optics, will have similar needs for pre-correction. Even though the MQP displays are rotated (no word on the AVP), there are so many other transforms/rescalings, including the transforms in 3-D space required to make the monitor(s) appear stationary, that if the rotation is combined with them (rather than done as a separate transform), the rotation o the display’s effect on resolution may be negligible. The optical quality distortion and the loss of text resolution, when transformed in 3-D space, are more problematic.

Binocular Overlap and Rivalry

One of the ways to improve the overall FOV with a biocular system is to have the FOV of the left and right eye only partially overlap (see figure below). The paper Perceptual Guidelines for Optimizing Field of View in Stereoscopic Augmented Reality Displays and the article Understanding Binocular Overlap and Why It’s Important for VR Headsets discuss the issues with binocular overlap (also known as “Stereo Overlap”). Most optical AR/MR systems have a full or nearly full overlap, whereas VR headsets often have a significant amount of partial overlap.

Partial overlap increases the total FOV when combining both eyes. The problem with partial overlap occurs at the boundary where one FOV ends in the middle of the other eye’s FOV. One eye sees the image fade out to black, whereas the other sees the image. This is a form of Biocular Rivalry, and it is left to the visual cortex to sort out what is seen. The visual cortex will mostly sort it out in a desirable way, but there will be artifacts. Most often, the visual cortex will pick the eye that appears brighter (i.e., the cortex picks one and does not average), but there can be problems with the transition area. Additionally, where one is concentra ing can affect what is seen/perceived.

In the case of the MQP, the region of binocular overlap is slightly less than the width of the center monitor in Meta’s Horizon’s Desktop when viewed from the starting position. Below left shows the view through the left eye when centering the monitor in the binocular FOV.

When concentrating on a cell in the center, I didn’t notice a problem, but when I took in the whole image, I could see these rings, particularly in the lighter parts of the image.  

The Meta Quest 2 appears to have substantially more overlap. On the left is a view through the left eye with the camera positioned similarly to the MQP (above left). Note how the left eye’s FOV overlaps the hole central monitor. I didn’t notice the transition “rings” with the Meta Quest 2 as I did with the MQP.

Binocular overlap is not one of those things VR companies like to specify; they would rather talk about the bigger FOV.

In the case of the AVP, it will be interesting to see the amount of binocular overlap in their optics and if it affects the view of the virtual monitors. One would like the overlap to be more than the width of a “typical” virtual monitor, but what does “typical” mean if the monitors can be of arbitrary size and positioned anywhere in 3-D space, as suggested in the AVP’s marketing material?

Inscribing a virtual landscape-oriented monitor uses about half of the vertical pixels of the headset.

The MQP’s desktop illustrates the basic issues of inscribing a virtual monitor into the VR FOV while keeping the monitor stationary. There is some margin for allowing head movement without cutting off the monitor, which would be distracting. Additionally, the binocular overlap cutting off the monitor is discussed above.

As discussed in more detail, the MQP uses a 16:10 aspect ratio, 1920×1200 pixel “virtual monitors” (the size it reports to Windows). The multiple virtual monitors are mapped into the MQP’s 1920×1800 physical display. Looking straight ahead, sitting at the desktop, you see the central monitor and about 30% of the two side monitors.

The center monitor’s center uses about 880 pixels, or about half of the 1800 vertical pixels of the QP’s physical display. The central monitor behaves about 1.5 meters (5 f et) away or about 2 to 3 times the distance of a typical computer monitor. This makes “head zooming” (leaning in to make the image bigger) ineffective.

Apple’s AVP has a similar FOV and will have similar limitations in fitting virtual moni ors. There is the inevitable compromise between showing the whole monitor with some latitude user moving their head while avoiding cutt ng off the monitor the sides of the monitor.

Simplified Scaling Example – Rendering a Pixel Size Dot

The typical readable text has a lot of high-resolution, high contra t, and features that will be on the order of one pixel wide, such as the stroke and dot in the letter “i.” The problems with drawing a single pixel size dot in 3-D space illustrate some of the problems.

Consider drawing a small circular dot that, after all the 3-D transforms, is the size of about one pixel. In the figure below, the pixel boundaries are shown with blue lines. The four columns below in the figure below show a few of an infinite number of relationships between a rendered dot and the pixel grid.

The first row shows the four dots relative to the grid. The nearest pixel is turned on in the second row based on the centroid. In row three, a simple average is used to draw the pixel where the average of 4 pixels should equal the brightness of one pixel. The fourth row shows a low-pass filter of the virtual dots. The fifth row renders the pixels based on the average value of the low-pass filtered version of the dots.

The centroid method is the sharpest and keeps the size of the dot the same, but the location will tend to jump around with the slightest head movement. If many dots formed an object, the shape would appear to wriggle. With the simple average, the “center of mass” is more accurate than the centroid method, but the dot changes shape dramatically based on alignment/movement. The average of the low-pass filter method is better in terms of center of mass, and the shape changes less based on alignment, but now a single pixel size circle is blurred out over 9 pixels.

There are many variations to resampling/scaling, but they all make tradeoffs. A first-order tradeoff is between wiggling (changing in shape and location), with movement versus sharpness. A big problem with text when rendered low ppd displays, including the Apple Vision Pro, is that many features, from periods to the dots of letters to the stroke width of small text fonts, will be close to 1 pixel.

Scaling text – 40+ Years of Computer ont Grid Fitting (“Cheating”) Exposed

Since the beginning, personal computers have dealt with low pixels-per-inch monitors, translating into low pixels per degree based on typical viewing distances. Text is full of fine detail and often has perfectly horizontal and vertical strokes that, even with today’s higher PPI monitors, cause pixel alignment issues. Text is so important and so common that it gets special treatment. Everyone “cheats” to make text look better.

The fonts need to be recognizable without making them so big that the eye has to move a lot to read words and make content less dense with less information on a single screen. Big fonts produce less content per display and more eye movement, making the muscles sore.

In the early to mid-1980, PCs moved rough-looking fixed space to proportionally spaced text and carefully hand-crafted fonts, and only a few font sizes were available. Font edges are also smoothed (antialiased) to make it look better. Today, most fonts are rendered from a model with “hints” that help the fonts look better on a pixel grid. TrueType, originally developed by Apple as a workaround to paying royalties to Adobe, is used by both Apple and MS Windows and includes “Hints” in the font definitions for grid fitting (see: Windows hinting and Apple hinting).

Simplistically, grid fitting tries to make horizontal and vertical strokes of a font land on the pixel grid by slightly modifying the shape and location (vertical and horizontal spacing) of the font. Doing so requires less smoothing/antialiasing without making the font look jagged. This works because computer monitor pixels are on a rectangular grid, and in most text applications, the fonts are drawn in horizontal rows.

Almost all font rending is grid fits, just some more than others (see from 2 07 Font rendering philosophies of Windows & Mac OS X). Apple (and Adobe) have historically tried to keep the text size and spacing more accurate at some loss in font sharpness and readability on low PPI monitors (an easy solution for Apple as they expect you to buy a higher PPI monitor). MS Windows with ClearType and Apple with their LCD font smoothing have options to try and improve fonts further by taking advantage of LCDs with side-by-side red-green-blue subpixels.

But this whole grid fitting scheme falls apart when the monitors are virtualized. Horizontal and vertical strokes transform into diagonal lines. Because grid fitting won’t work, the display of a virtual monitor needs to be much higher in angular resolution than a physical monitor to show a font of the same size with similar sharpness. Yet today and for the foreseeable future, VR displays are much lower resolution.

For more on the definition of font “Points” and their history with Windows and Macs, see Appendix 3: Confabulating typeface “points” (pt) with With Pixels – A Brief History.

Rendering Options: Virtual Monitors Fixed in 3-D Space Breaks the “Pixel Grid.”

The slightest head movement means that everything has to be re-rendered. The “grid” to which you want to render text is not the virtual monitor but that of the headset’s display. There are at least two main approaches:

  1. Re-render everything from scratch every frame – This will give the best theoretical image quality but is very processor intensive and will not be supported by most legacy applications. Simply put, these applications are structured to draw in terms of physical pixels of a fixed size and orientation rather than everything drawn virtually.
  2. Render to a “higher” resolution (if possible) and then scale to the headset’s physical pixels.
    • One would like the rendering to be at least 2X (linearly, 4X the pixels) of the physical pixels of the headset covering the same area to keep from having significant degradation in image quality after the scaling-down process.
    • The higher-resolution virtual image transformed onto the surface (which might be curved itself) of the virtual monitor in 3-D space. Virtual monitor processing can become complex if the user can put multiple monitors here, there, and everywhere that can be viewed from any angle and distance. The rendering resolution needed for each virtual monitor depends on the virtual distance from the eye.
    • Even with this approach, there are “application issues” from the legacy of 40+ years of pcs dealing with fixed pixel grids.
    • The grid stretching (font hinting) becomes counterproductive since they are stretching to the virtual rather than the physical display.

Systems will end up with a hybrid of the two approaches mixing “new” 3-D applications with legacy office applications.

Inscribing a virtual landscape-oriented monitor uses about half of the vertical pixels of the headset.

The MQP’s Horizons appears to render the virtual monitor(s) and then re-render them in 3-D space along with the cylindrical effect plus pre-correction for their Pancake lens distortion.

The MQP’s desktop illustrates the basic issues of inscribing a virtual monitor into the VR FOV while keeping the monitor stationary. There is some margin for allowing head movement without cutting off the monitor, which would be distracting. Additionally, the binocular overlap cutting off the monitor is discussed above.

The MQP uses a 16:10 aspect ratio, 1920×1200 pixel “virtual monitors.” The multiple virtual monitors are mapped into the MQP’s 1920×1800 physical display. Looking straight ahead, sitting at the desktop, you see the central monitor and about 30% of the two side monitors.

The virtual monitor’s center uses about 880 pixels, or about half of the 1800 vertical pixels of the MQP’s physical display or 64% of the 1200 vertical pixels reported to Windows with the use at the desktop.

The central monitor behaves like it is about 1.5 meters (5 feet) away or about 2 to 3 times the distance of a typical computer monitor. This makes “head zooming” (leaning in to make the image bigger) much less effective (by a factor of 2 to 3X).

Apple’s AVP has a similar FOV and will have similar limitations in fitting virtual monitors. There is the inevitable compromise between showing the whole monitor with some latitude user movi g their head while avoiding cutting off the monitor on the sides of the monitor.

The pre-distortion correction is certainly going to hurt the image. It is possible that the AVP, using similar pancake optics, will have similar needs for pre-correction (most, if not all, VR optics have significant pincushion distortion – a side effect of trying to support a wide FOV). The MQP displays are rotated to clear the nose (no word on the AVP). However, this can be rolled into the other transformations and probably does not significantly impact the processing requirement or image quality.

A simplified example of scaling text

The image below, one cell of a test pattern with two lines of text and some 1- and 2-pixel-wide lines, shows a simulation (in Photoshop) of the scaling process. For this test, I chose a 175% scaled 11pt front which should have roughly the same number of pixels as an 11pt font at 100% on an Apple Vision Pro. This simulation greatly simplifies the issue but shows what is happening with the pixels. The MQP and AVP must support resampling with 6 degrees of free om in the virtual world and a pre-correcting distortion with the optics (and, in the case of MQP’s Horizons, curve the virtual monitor).

Source Cel (left), Simulated 64% scaling (right)
  • Sidenote: This one test pattern accidentally has an “i” rather than a “j” between the g & k that I discovered late into editing.

The pixels have been magnified by 600% (in the full-size image), and a grid has been shown to see the individual pixels. On the top right source has been scaled by 64%, about the same amount MQP Horizons scales the center of the 1920×1200 virtual monitor when sitting at the desktop. The bottom right image scales by 64% and rotates by 1° to simulate some head tilt.

If you look carefully at the scaled one and two-pixel wide lines in the simulation, you will notice that sometimes the one-pixel wide lines are as wide as the 2-pixel lines but dimmer. You will also see what started as identical fonts from line to line look different when scaled even without any rotation. Looking through the lens cells, the fonts have further degradation/softening as they are displayed on color subpixels.

Below is what the 11pt 175% fonts look like via the lens of the MQP in high enough resolution to see the color subpixels. By the time the fonts have gone through all the various scaling, they are pretty rounded off. If you look closely at the same font in different locations (say the “7” for the decimal point), you will notice every instance is different, whereas, on a conventional physical monitor, they would all be identical due to grid fitting.

MQP 175% Scaled 11pt fonts

For reference, the full test pattern and the through-the-lens picture of the virtual monitor are given below (Click on the thumbnails to see the full-resolution images). The camera’s exposure was set low so the subpixels would not blow out and lose all their color.

Test Pattern Cells Replicated
A full test pattern with a center cell and an off-center cell indicated by red rectangles (exposed to show subpixels)

Scintillating Text

When looking through the MQP, the text scintillates/sparkles. This occurs because no one can keep their head perfectly still, and every text character is being redrawn on each frame with slightly different alignments to the physical pixels causing the text to wriggle and scintillate.

Scaling/resampling can be done with sharper or softer processing. Unfortunately, the sharper the image after resampling, the more it will wriggle with movement. The only way to avoid this wriggling and have sharp images is to have a much higher ppd. MQP has only 22.5ppd, and the AVP has about 40ppd and should be better, but I think they would need about 80pp (about the limit of good vision and what Apple retinal monitors support) to eliminate the problems.

The MQP (and most displays) uses spatial color with individual red, green, and blue subpixels, so the wriggling is at the subpixel level. The picture below shows the same text with the headset moving slightly between shots.

Below is a video from two pictures taken with the headset moved slightly between shots to demonstrate the scintillation effect. The 14pt font on the right has about the same number of pixels as an 11pt font with the resolution of the Apple Vision Pro.

Scintillation/wiggle of two frames (right-click > “loop” -> play triangle to see the effect)

Conclusion

This will not be a close call, and using any VR headset, including the QP and Apple Vision Pro, as a computer monitor replacement fails any serious analysis. It might impress people who don’t understand the issues and can be wowed by a flashy short demo, and it might be better than nothing. But it will be a terrible replacement for a physical monitor/display.

I can’t believe Apple seriously thinks a headset display with about 40ppd will make a good virtual monitor. Even if some future VR headset has 80ppd and over 100-degree FOV, double the AVP linearly or 4X, it will still have problems.

Part 5B of this series will include more examples and more on my conclusions.

Appendix 1: My 1980s History with Bitmapped Fonts and Multiple Monitors

All this discussion of fonts and 3-D rendering reminded me of those early days when the second-generation TMS34020 almost got designed into the color Macintosh (1985 faxed letter from Steve Perlman from that era – right). I also met with Steve Jobs at NeXT and mentioned Pixar to him before Jobs bought them (discussed in my 2011 blog article) and John Warnock, a founder of Adobe, who was interested in doing a Port of Postscript to the 34010 in that same time frame.

In the 1980s, I was the technical leader for a series of programs that led to the first fully programmable graphics processor, the TMS34010, and the Multi-ported Video DRAM (which led to today’s SDRAM and GDRAM) at Texas Instruments (TI) (discussed a bit more here and in Jon Peddie’s 2019 IEEE article and his 2022 book “The History of the GPU – Steps to Invention”).

In the early 1980s, Xerox PARC’s work influenced my development of the TMS34010, including Warnock’s 1980 paper (while still at PARC), “The Display of Characters Using Gray Level Sample Arrays,” and the series of PARC’s articles in BYTE Magazine, particularly the August 1981 edition on Smalltalk which discussed bit/pixel aligned transfers (BitBlt) and the use of a “mouse” which had to be explained to BYTE readers as, “a small mechanical box with wheels that lets you quickly move the cursor around the screen.”

When defining the 34010, I had to explain to TI managers that the Mouse would be the next big input device for ergonomic reasons, not the lightpen (used on CAD terminals at TI in the early 1980s), which requires the user to keep their arm floating in the air which quickly become tiring. Most AR headset user interfaces make users suffer with having to float their hands to point, select, and type, so the lessons of the past are being relearned.

In the late 1980s, a systems engineer for a company I had never heard of called “Bloomberg,” who wanted to support 2 to 4 monitors per PC graphics board, came to see us at TI. In a time when a single 1023×786 graphic card could cost over $1,200 (about $3,000 in 2023 dollars), this meeting stood out. The Bloomberg engineer explained how Wall Street traders would pay a premium to get as much information as possible in front of them, and a small advantage on a single trade would pay for the system. It was my first encounter with someone wanting multiple high-resolution monitors per PC.

I used to have a life designing cutting-edge products from blank sheets of paper (back then, it was physical paper) through production and marketing; in contrast, I blog about other people’s designs today. And I have dealt with pixels and fonts for over 40 years.

1982

Below is one of my early presentations on what was then called the “Intelligent Graphics Controller” (for internal political reasons, we could not call it a “processor”), which became the TMS34010 Graphics System Processor. You can also see the state of 1982 presentation technology with a fixed-spaced font and the need to cut and paste hand drawings. This slide was created in Feb 1982. The Apple Lisa didn’t come out until 1983, and the Mac in 1984.

1986 and the Battle with Intel for Early Graphics Processor Dominance

e announced the TMS34010 in 1986, and our initial main competitor was the Intel 82786. But the Intel chip was “hardware” and lacked the 34010’s programmability, and to top it off, the Intel chip had many bugs. In just a few months, the 82786 was a non-factor. The copies of a few of the many articles below capture the events.

1986 we wrote two articles on the 34010 in the IEEE CG&A magazine. You can see from the front pages of the articles the importance we put on drawing text. Copies of these articles are available online (click on the thumbnails below to be linked to the full articles). You may note the similarity of the IEEE CG&A article’s first figure to the one in the 1981 Byte Smalltalk article, where we discussed extending “BitBlt” to the color “PixBlt.”

Around 1980 we started publishing a 3rd party guide of all the companies developing hardware and software for the 340 family of products, and the June 1990 4th Edition contained over 200 hardware and software products.

Below is a page from the TMS340 TIGA Graphics Library, including the font library. In the early 1980s, everyone had to develop their font libraries. There was insufficient power to render fonts with “hints” on the fly. We also do well to have bitmapped fonts with little or no antialiasing/smoothing. From about

Sadly, we are a bit before our time, and Texas Instruments had, by the late 1980s, fallen far behind TSMC and many other companies in semiconductor technology for making processors. Our competitors, such as ATI (NVidia wasn’t founded until 1993), could get better semiconductor processing at a lower cost from the then-new semiconductor 3rd party fabs such as TSMC (founded in 1987).

Appendix 2: Notes on Pictures

All the MQP pictures in these two articles were taken through the l ft eye optics using either the Canon R5 (45mp) with an RF16mmf2.8 or 28mmf2.8 “pancake” lens or the lower resolution Olympus E-M5D-3 (20mp) with 9-18mm zoom lens at 9mm. Both cameras feature a “pixel shift” feature that moves the lens, giving 405mp (24,576 x 16,384) for the R5 and 80mp (10,368 x 7,776 pixels) for the M5D-3 and all the pictures used this feature as it gave better resolution, even if the images were later scaled down.

High-resolution pictures of computer monitors with color subpixels and any scaling or compression cause issues with color and intensity moiré (false patterning) due to the “beat frequency” between the camera’s color sensor and the display device. In this case, there are many different beat frequencies between both the pixels and color subpixels of the MQP’s displays and the cameras. Additionally, the issues of the MQP’s optics (which are poor compared to a camera lens) vary the resolution radially. I found for the whole FOV image, the lower-resolution Olympus camera didn’t have nearly as severe a moiré issue (only a little in intensity and almost none in color). In contrast, it was unavoidable with the R5 with the 16mm lens (see comparison below).

Lower Resolution Olympus D3 with very little moiré
Higher Resolution Cano R5 “catches” moiré

The R5 with the 28mmf2.8 Lens and pixel shift mode could capture the MQP’s individual red, green, and blue subpixels (right). In the picture above, the two “7s” on the far right have a little over 1 pixel wide horizontal and diagonal stroke. The two 7’s are formed by different subpixels caused by them being slightly differently aligned in 3D space. The MQP’s displays are rotated by about 20°; thus, the subpixels are on a 20° diagonal (about the same as the lower stoke on the 7’s. Capturing at this resolution where the individual red, green, and blue sub-pixels are visible necessitated underexposing the overall image by about 8X (3 camera stops). Otherwise, some color dots (particularly green) will “blow out” and shift the color balance.

As seen in the full-resolution crop above, each color dot in the MQP’s display device covers about 1/8th of the area of a pixel, with the other two colors and black filling the rest of the area of a pixel. Note how the scaled-down version of the same pixels on the right look dim when the subpixels are averaged together. The camera exposure had to be set about three stops lower (8 times in brightness as stops are a power of two) to avoid blowing out the subpixels.

Appendix 3: Confabulating typeface “points” (pt) with With Pixels – A Brief History

Making a monitor appear locked in 3-D spaces breaks everything about how PCs have dealt with rendering text and most other objects. Since the beginning of PC bitmap graphics, practical compromises (and shortcuts) have been made to reduce processing and to make images look better on affordable computer monitors. A classic compromise is the font “point,” defined (since 1517) at ~1/72nd of an inch.

So, in theory, when rendering text, a computer should consider the physical size of the monitor’s pixels. Early bitmapped graphics monitors in the mid-1980s had about 60 to 85 ppi, so the PC developers (except Adobe with their Postscript printers, with founders from Xerox PARC, that also influenced Apple) without a processing power to deal with it and the need to get on with making products confabulated “points” and “pixels.” Display font “scaling” helps correct this early transgression.

Many decades ago, MS Windows decided that a (virtual) 96 dots per inch (DPI) would be their default “100%” font scaling. An interesting Wikipedia article on the convoluted logic that led to Microsoft’s decision is discussed here. Conversely, Apple stuck with 72 PPI as their basis for fonts and made compromises with font readability on lower-resolution monitors with smaller fonts. Adherence to 72 PPI may explain why a modern Apple Mac 27” monitor is 5K to reach 218 ppi (within rounding of 3×72=216). In contrast, the much more common and affordable 27” 4K monitor has 163 ppi, not an integer multiple of 72, and Macs have scaling issues with 3rd party monitors, including the very common 27” 4k.

Microsoft and Macs have tried to improve the text by varying the intensity of the color subpixels. Below is an example from MS Windows with “ClearType” for a series of different-size fonts. Note particularly the horizontal strokes at the bottom of the numbers 1, 2, and 7 below and how the jump from 1 pixel wide with no smoothing from Calibri 9 to 14pt, then an 18pt, the strokes jump to 2 pixels wide with a little smoothing and then at 20pt become 2 pixels wide with no smoothing vertically.

Apple has a similar function known as “LCD Font Smoothing. Apple had low-ppd text rendering issues in its rearview mirror with “retinal resolution” displays for Mac laptops and monitors. “Retinal resolution” translates to more than 80ppd when viewed normally, which is about from about 12” (0.3 meters) for handheld devices (ex. iPhone) or about 0.5 to 0.8 meters for a computer.

The chart was Edited for Space, and ppd in information was added.

Apple today sells “retina monitors” with a high 218 PPI, which makes text grid fitting less of an issue. But as the chart from Mac external displays for designers and developers (right), Mac systems have resolution and performance issues with in-between resolution monitors.

The Apple Vision Pro has less than 40 ppd, much lower than any of these monitors at normal viewing distance. And that is before all the issues with making the virtual monitor seem stationary as the user moves.

Apple Vision Pro (Part 4) – Hypervision Pancake Optics Analysis

26 June 2023 at 23:07

Introduction

Hypervision, a company making a name for itself by developing very wide field of view VR pancake optics, just released a short article analyzing the Apple Vision Pro’s pancake on their website titled, First Insights about Apple Vision Pro Optics. I found the article very interesting from a company that designs pancake optics. I will give a few highlights and key points from Hypervision’s article, but I recommend going to their website for more information.

Hypervision has demonstrated a single pancake 140° VR and an innovative 240° dual pancake per eye optical design. I will briefly discuss Hypervision’s designs after the Apple Vision Pro optics information.

Apple Vision Pro’s Pancake Optical Design

Hypervision’s article starts with a brief description of the basics of pancake optics (this blog also discussed how pancake optics work as part of the article Meta (aka Facebook) Cambria Electrically Controllable LC Lens for VAC?).

Hypervision points out that an important difference in the Apple Pancake optics shown in the WWDC 2023 video and other pancake optics, such as the Meta Quest Pro, is that the Quarter Waveplate (QWP) retarder 2, as shown above, must be curved. Hypervision shows both Meta (Facebook) and Apple patent applications showing pancake optics with a curved QWP. Below are Figs 8 and 9 from Apple’s patent application and Hypervision’s translation into some solid optics.

Hypervision’s Field of View Analysis

Hypervision has also made a detailed field-of-view analysis. They discuss how VR experts who have seen the AVP say they think the AVP FOV is about 110°. Hypervision’s analysis suggests APV’s FOV “wishfully” could be as high as 120°. Either value is probably within the margin of error due to assumptions. Below is a set of diagrams from Hypervisions analysis.

Pixels Per Degree (ppd)

Hypervision’s analysis shows 34 pixels per Degree (ppd) on the lower end. The lower PPD comes from Hypervision’s slightly wider FOV calculations. Hypervision notes that this calculation is rough and may vary across the field of view as the optics may have some non-linear magnification.

I have roughly measured the Meta Quest Pro’s (MQP) ppd in the center and come up with about 22 ppd. Adjusting for about 1.8X more pixels linearly and the difference of 106 FOV for the MQP, and 110 for the AVP results, I get an estimate of about 39 ppd. Once again, with my estimate, there are a lot of assumptions. Considering everything, depending on the combination of high and low estimates, the AVP has between 34 ppd and 39 ppd.

Eye Box

Hypervision makes the point that due to using a smaller pixels size that thus requires more magnification, the eye box (and thus the sweet spot) of the AVP is likely to be smaller than some other headsets that use pancake optics with LCDs rather than the AVP’s use of Micro-OLEDs.

Hypervision

Hypervision clearly has some serious optical design knowledge. I first saw them in 2022, but as their optics have been aimed at VR, I have not previously written about them. But at AR/VR/MR 2023, they showed a vastly improved optical quality design using pancake optics to support 140° with a single pancake optics and 240° with what I call a dual pancake (per eye) design. I took more notice of pancake optics becoming all the rage in VR headsets with MR passthrough.

AR/VR/MR 2022 with Dual Fused Fresnel Lenses and 270°

I first saw Hypervision at AR/VR/MR in January 2022. At the time, they were demonstrating a 270° headset based on what I call a fused dual Fresnel optical design using two LCDs. I took some pictures (below), but I was not covering much about VR at the time unless it was related to passthrough mixed reality. While the field of view was very impressive, there were the usual problems with Fresnel optics and the seam between the dual Fresnel lenses was pretty evident.

AR/VR/MR 2023 Pancake Optics

Below are pictures I took at AR/VR/MR 2023 of Hypervision’s 140° single pancake and 240° dual pancake designs. The pancake designs were optically much better than their earlier Fresnel-based designs. The “seam” with the dual pancakes seemed barely noticeable (Brad Lynch also reported a barely invisible seam in his video). Hypervision has some serious optical design expertise.

I mentioned Brad Lynch of SadlyItsBradley and who covers VR in more detail about Hypervision. Brad had the chance to see them at Display Week 2023 and recorded a video discussing them. Brad said that multiple companies, including Lynx, were impressed by Hypervision.

Closing

Hypervision is a company with impressive optical design expertise, and they demonstrated that they understand pancake optics with their designs. I appreciate that they contacted me to let me know they had analyzed the Apple Vision Pro. It is one thing for me, with an MSEE who picked up some optics through my industry exposure, to try and figure out what is going on with a given optical design; it is something else to have the analysis from a company that has designed that type of optics. So once again, I would recommend reading the whole article on Hypervision’s site.

Apple Vision Pro (Part 3) – Why It May Be Lousy for Watching Movies On a Plane

22 June 2023 at 01:58

Introduction

Part 1 and Part 2 of this series on the Apple Vision Pro (AVP) primarily covered the hardware. Over the next several articles, I plan to discuss the applications Apple (and others) suggest for AVP. I will try to show the issues with human factors and provide data where possible.

I started working in head-mounted displays in 1998, and we bought a Sony Glasstron to study. Sony’s 1998 Glasstron had an 800×600 (SVGA) display, about the same as most laptop computers in that year, and higher resolution than almost everyone’s television in the U.S. (HDTVs first went on sale in 1998). The 1998 Glasstron even had transparent (sort of) LCD and LCD shutters to support see-through operation.

In the past 25 years, many companies have introduced headsets with increasingly better displays. According to some reports, the installed base of VR headsets will be ~25 million units in 2023. Yet I have never seen anyone on an airplane or a train wear a head-mounted display. I first wrote about this issue in 2012 in an article on the then-new Google Glass with what I called “The Airplane Test.”

I can’t say I was surprised to see Apple showing the movie watching on airplanes VR app, as I have seen it again and again over the last 25 years. It makes me wonder how well Apple verified the concepts they showed. As Snazzy Lab’s explained, there were no new apps that Apple showed that had not failed before, and it is not clear they failed due to not having better hardware.

Since the technology for watching videos on a headset has been available for decades, there must be reasons why almost no one (Brad Lynch of SadlyItsBradley says he has) uses a headset to watch movies on a plane. I also realize that some VR fans will watch movies on their headsets, but this, like VR, does not mean it will support mass market use.

As will be shown, the total pixel angular (pixels per degree) resolution of the AVP, while not horrible, is not particularly good for watching movies. But then, the resolution has not been what has stopped people from using VR on airplanes; it has been other human factors. So the question becomes, “Has the AVP solved the human factors problems that prevent people from using headsets to watch movies on airplanes?

Some Relevant Movie Watching Human Factors Information

In 2019 in FOV Obsession, I discussed an excellent Photonics West’s AR/VR/MR Conference presentation by Thad Starner, the Georgia Institute of Technology and a long-time AR advocate and user.

First, the eye only has high resolution in the fovea, which covers only ~2°. The eye goes through a series of movements and fixations known as saccades. What a person “sees” results from the human vision system piecing together a series of “snapshots” at each saccade. The saccadic movement is a function of the activity and the person’s attention. Also, vision is partially, but not completely, blanked when the eye is moving (see: We thought our eyes turned off when moving quickly, but that’s wrong, and Intrasaccadic motion streaks jump-start gaze correction)

Starner shows the results from a 2017 Thesis by Haynes, which included a study on FOV and eye discomfort. Haynes’ thesis states (page 8 of 303 pages and 275 megabytes – click here to download it):

Thus, eye physiology provides some basic parameters for potential HWD design. A display can be no more than 55° horizontally from the normal line of sight based on oculomotor mechanical limits. However, the effective oculomotor range places a de facto limit at 45°. Further, COMR and saccadic accuracy suggest visually comfortable display locations may be no more than [plus or minus] 10-20° from the primary position of gaze.

The encyclopedic Optical Architectures for Augmented-, Virtual-, and Mixed-Reality Headsets by Bernard Kress writes about a “fixed foveated region of about 40-50° (right). But in reality, the eyes can’t see 40-50° with high resolution for more than a few minutes without becoming tired.

The bottom line is that the human eye will want to stay within about 20° of the center when watching a movie. Generally, if a user wants to see something more than about 30° from the center of their vision, they will turn their head rather than use just their eyes. This is also true when watching a movie or using a large computer monitor for office-type work.

The Optimum Movie Watching FOV is about 30-40 Degrees

It may shock many VR game players that want 120+ degree FOVs, but SMTPE, which sets the recommendations for movie theaters, says the optimal viewing angle for HDTV is only 30°. THX specifies 40 degrees (Wikipedia and many other sources). These same optimum seating location angles apply to normal movie theaters as well.

The front row of a “normal” movie theater is about 60°, which is usually the last row in a theater where people will want to sit. Most people don’t want to sit in the front rows of a theater because of the “head pong” (as Thad Starner called it) required to watch a movie that is ~60° wide.

While 30°-40° may seem small, it comes back to human factors and a feedback loop of the content generated to work well with typical theater setups. A person in the theater will naturally only see what is happening in the center ~30° of the Screen most of the time, except for some head-turning fast action.

The image content generated outside of ~30° helps give an immersive feel but costs money to create and will not be seen in any detail 99.999% of the time. If you take content generated assuming a nominal 30° to 40° viewing angle and enlarge it to fill 90°, it will cause eye and head discomfort for the user to watch it.

AVP’s Pixels Per Degree Are Below “Retinal Resolution”

Another factor is “angular resolution.” The bands in the chart on the right show how far back from a given size TV with a given resolution must sit before you can’t see the pixels. The metric they use for being “beneficial” is 60ppd or more. Also shown on the chart with the dotted white lines are the SMTPE 30° and THX 40° recommendations.

Apple has not given the exact resolution but stated 23 Million (pixels for both eyes). Assuming a square display, this computes to about 3,400 pixels in each direction. The images in the video look to be about a 7:6 aspect ratio which would work out to about ~3680 by ~3150. Also, the optics cut off some of the display’s pixels for each eye, yet often companies count all the display’s pixels.

Apple didn’t specify the field of view (FOV). One big point of confusion on FOV is that VR headsets are typically quoted for both eyes, including the binocular view combing both eyes. The FOV also varies based on the eye relief from person to person (people’s eye insets, foreheads, and other physical features are different). Reports are that the FOV is “similar” to the Meta Quest Pro, which has a binocular FOV of about 106 degrees. The single-eye FOV is about 90°.

Combining the information from various sources, the net result is about 35 to 42 pixels per degree (ppd). Good human 20/20 vision is said to be ~60ppd. Steve Jobs with the iPhone 6 called 300 pixels per inch at reading distance, which works out to ~60ppd), “retinal resolution.” For the record, people with very good eyesight can see 80ppd

Some people wearing the AVP commented that they could make out some screen door effect consistent with about 35-40ppd. The key point is that the AVP is below 60, so jagged line effects will be noticeable.

Using the THX 40° horizontal FOV standard and assuming the AVP is about 90° horizontally (per eye, 110 for both eyes), ~3680 pixels horizontally, and almost no pixels get cropped, this leaves 3680 x (40/90) = ~1635 pixels horizontally. Using the STMPE 30° gives about 3680 x (30/60) = ~1226 pixels wide.

If the AVP is used for watching movies and showing the movie content “optimally,” the image will be lower than full HD (1920×1080) resolution, and since there are ~40ppd, jaggies will be visible.

While the AVP has “more pixels than a 4K TV,” as claimed, they can’t deliver those pixels to an optimally displayed movie’s 40° or 30° horizontal FOV. Using the full FOV would, in effect, put you visually closer than the front row of a movie theater, not where most people would want to watch a movie.

Still, resolution and jaggies alone are not so bad as they would not, and have not, stopped people from using a VR headset for movies.

Vestibulo–Ocular Reflex (VOR) – Stabilizing the View with Head Movement – Simple Head Tracking Fails

The vestibulo-ocular reflex (VOR) stabilizes a person’s gaze during head movement. The inner ear detects the rotation, and if one is gazing, it causes the eyes to rotate to counter the movement to stay fixed on where the person is gazing. In this way, a person can, for example, read a document even if their head is moving. People with a VOR deficiency have problems reading.

Human vision will automatically suppress the VOR when it is a counter product. For example, the VOR reflex will be suppressed if one is tracking an object with a combination of head and eye movement, whereas VOR would be counter-productive. The key point is that the display system must account for the combined head and eye movement to generate the image without causing a vestibular (motion sickness) problem where the inner ear does not agree with the eyes.

Quoting from the WWDC 2023 video at ~1:51:18:

Running in parallel is a brand-new chip called R1. This specialized chip was designed specifically for the challenging task of real-time sensor processing. It processes input from 12 cameras, five sensors, and six microphones.

In other head-worn systems, latency between sensors and displays can contribute to motion discomfort. R1 virtually eliminates lag, streaming new images to the displays within 12 milliseconds. That’s eight times faster than the blink of an eye!

Apple did not say if the “12 cameras” included eye-tracking cameras, as they only showed the cameras on the front, but likely they are included. Complicating matters further is the saccadic movement of the eye. Eye tracking can know where the eye is aimed, but not what is seen. The AVP is known to have superior eye tracking for selecting things from a menu. But we don’t know if the eye tracking coupled with the head tracking deals with VOR, and if so, whether it is accurate and fast enough to solve to not cause VOR-related problems for the user.

Movies on AVP (and VR) – Chose Your Compromises

Now consider some options for displaying a virtual screen on a headset below. Apple has shown locking the Screen in the 3-D space. For their demos, they appear to have gone with a very large (angularly) virtual screen for the demo impact. But, as outlined below, making a very large virtual screen is not the best thing to do for more normal movie and video watching. No matter which option is chosen below, jaggies and “zipper/ripple” antialiasing artifacts will be visible at times due to the angular resolution (pdd) of the AVP.

  1. Simplistic Option: Scale the image to full Screen for the maximum size and have the Screen moves with the headset (not locked in the virtual 3-D space). This option is typically chosen for headsets with smaller FOVs, but it is a poor choice for headsets with large FOVs.
    • It is like sitting in a movie theater’s front row (or worse).
    • The screen moves unnaturally with head motion as it follows any head motion.
  2. Lock the Virtual Screen but nearly fill the FOV: This is what I will call “Head-Lock for Demos Only Mode.” If the virtual Screen nearly fills the FOV, then the small head movement will cause the Screen to cut off and will, in turn, will trigger a person’s peripheral vision causing some distraction. To avoid distraction, the user must limit head movement and eye movement; perhaps doable in a short demo, but not a comfortable way to watch a movie.
  3. Locking Screen in 3-D space with the Screen at STMPE 30° to THX 40°: With ~40° FOV, there is room for the head to turn and total without cutting off the Screen or forcing the user to keep their head rigidly held in one location.
    • This will test the ability of the system to track head motion without causing motion sickness. There will always be some motion-to-photon lag and some measurement errors. There is also the VOR issue discussed earlier and whether it is solvable.
    • Some additional loss in resolution and potential for motion/temporal artifacts as the flat or 3-D movie is resampled into the virtual space.
    • Add motion blur to deal with head and eye movement (unlikely as it would be really complex).
    • The AVP reshows a 24 fps movie four times at 96Hz – does each frame get corrected at 96Hz, and what about visual artifacts when doing so?
    • What does it do for 30 fps and 60 fps video?
    • The Screen will still unnaturally be cut off if the user’s head turns too far. It does not “degrade gracefully” as a real-world screen would when you turn away from it.

Apple showed (above) images that might fill about 70 to 90 degrees of the FOV in its short Avatar demos (case 2 above). This will “work” in a demo to be something new and different, but as discussed in #2 above, it is not what you would want to do for a long movie.

And You Are on a Plane and Wearing A Heavy Headset Pressed Against Your Face with a Cord to Snag

On top of all the other issues, the headset processing and sensor must address vestibular-related motion sickness problems caused by being in a moving vehicle while displaying an image.

You then have the ergonomic issues of wearing a somewhat heavy, warm headset sealed against your face with no air circulation for hours while on a plane. Then you have the snag hazard of the cord, which will catch on just about everything.

There will be flight attendants or others tapping you to get your attention. Certainly, you don’t want the see-through mode to come on each time somebody walks by you in the aisle.

A more basic practical problem is that a headset takes up more room/volume due to its shape and the need to protect the glass front than a smartphone, tablet, or even a moderately sized laptop.

Conclusions

It is important to note that humans understand what behaves as “real” versus virtual. The AVP is still cutting off much of a person’s peripheral vision. Something like VOR and Vergence-Accommodation Conflict (VAC discussed in Part 2) and the way focus behaves are well-known issues with VR, but many more subtle issues can cause humans to sense there is something just not right.

In visual human factors, l like to bring up the 90/90 rule, which states, “it takes 90% of the effort to get 90% of the way there, and then the other 90% of the effort to solve the last 10%.” Sometimes this rule has to be applied recursively where multiples of the “90%” effort are required. Apple could do a vastly better job of head and eye tracking with faster response time, and yet people would still prefer to watch movies and videos on a direct-view display.

Certainly, nobody will be the wiser in a short flashy demo. The question is whether it will work for most people watching long movies on an airplane. If it does, it will break a 25+ year losing streak for this application.

Apple Vision Pro (Part 2) – Hardware Issues

16 June 2023 at 20:52

Introduction

This part will primarily cover the hardware and related human physical and visual issues with the Apple Vision Pro (AVP). In Part 3, I intend to discuss my issues with the applications Apple has shown for the AVP. In many cases, I won’t be able to say that the AVP will definitely cause problems for most people, but I can see and report on many features and implementation issues and explain why they may cause problems.

It is important to note that there is a wide variation between humans in their susceptibility and discomfort with visual issues. All display technologies are based on an illusion, and different people have different issues with various imperfections in the illusions. Some people may be able to adapt to some ill effects, whereas others can’t or won’t. This article points out problems I see with the hardware that might not be readily apparent in a shot demo based on over 40 years of working with graphics and display devices. I can’t always say there will be problems, but some things concern me.

The Appendix has some “cleanup/corrections” on Part 1 of this series on the Apple Vision Pro (AVP).

Demos are a “Magic Show” and “Sizzle Reels”

Things a 30-minute demo won’t show

I’m constantly telling people that “Demos are Magic Shows,” what you see has been carefully selected not to show any problems and only what they want you to see. Additionally is impossible to find all the human factor physical and optical issues in the cumulative ~30-minute demo sessions at WWDC. Each session was further broken into short “Sizzle Reels” of various potential applications.

The experience that people can tolerate and enjoy with a short theme park ride or movie clip might make them sick if they endure it for more than a few minutes. In recent history, we have seen how 3-D movies reappeared, migrated to home TVs, and later disappeared after the novelty wore off and people discovered the limitations and downsides of longer-term use.

It will take months of studies with large populations as it is well known that problems with the human visual perception of display technologies vary widely from person to person. Maybe Apple has done some of these studies, but they have not released them. There are some things that Apple looks like they are doing wrong from a human and visual factors perspective (nothing is perfect), but how severe the effects will be on humans will vary from person to person. I will try to point out things I see that Apple is doing that may cause issues and claims that may be “incomplete” and gloss over problems.

Low Processing Lag Time and High Frame Rate are Necessary but not Sufficient to Solve Visual Issues

Apple employed a trick that gets the observer to focus on one aspect of a problem that is a known issue and where they think they do well. Quoting from the WWDC 2023 video at ~1:51:34:

In other head-worn systems, latency between sensors and displays can contribute to motion discomfort. R1 virtually eliminates lag, streaming new images to the displays within 12 milliseconds. That’s eight times faster than the blink of an eye!

I will give him credit for saying that the delay “can contribute” rather than saying it is the whole cause. But they were also very selective with the wording “streaming new images to the displays within 12 milliseconds,” which is only a part of the “motion to photon” latency problem. They didn’t discuss the camera or display latency. Assuming the camera and display are both at 90Hz frame rates and are working one frame at a time, this would roughly triple the total latency, and there may be other buffering delays not mentioned. We then have any errors that will occur.

The statement, “That’s eight times faster than the blink of an eye!” is pure marketing fluff as it does not tell you if it is fast enough.

In some applications, even 12 milliseconds could be marginal. Some very low latency systems process scan lines from the camera to the display with near zero latency rather than frames to reduce the motion-photon-time. But this scan line processing becomes even more difficult when you add virtual content and requires special cameras and displays that work line by line synchronously. Even systems that work on scan lines rather than frames may not be fast enough for intensive applications. Specifically, this issue is well-known in the area of night vision. The US and other militaries still prefer monochrome (green or b&w) photomultiplier tubes in Enhanced Night Vision Goggles (ENVG) over cameras with displays. They still use the photomultiplier tubes (improved 1940s-era technology) and not semiconductor cameras because the troops find even the slightest delay disorienting.

Granted, troops making military maneuvers outdoors for long periods may be an extreme case, but at least in this application, it shows that even the slightest delay causes issues. What is unknown is who, what applications, and which activities might have problems with the level of delays and tracking errors associated with the AVP.

The militaries also use photomultiplier tubes because they still work with less light (just starlight) than the best semiconductor sensors. But I have been told by night vision experts that the delay is the biggest issue.

Poor location of main cameras relative to the user’s eye due to the Eyesight Display

The proper location of the cameras would be coaxial with the user’s two eyes. Still, as seen in the figure (right), the Main Cameras and all the other cameras and sensors are in fixed locations well below the eyes, which is not optimal, as will be discussed. This is very different than other passthrough headsets, where the passthrough cameras are roughly located in front of the eyes.

It appears the main cameras and all the other sensors are so low down relative to the eyes to be out of the way of the “Eyesight Display.” The Eyesight display (right) has a glass cover that contributes a lot of weight to the headset. I hear the glass cover is also causing some calibration problems with the various cameras and sensors, as there is variation in the glass, and its placement varies from unit to unit. The glass cover also contributes significant weight to the headset while inhibiting heat from escaping on top of the power/heat caused by the display itself.

It seems Apple wanted the Eyesight Display so much that they were willing to hurt significantly other design aspects.

Centering correctly for the human visual system

The importance of centering the (actual or “virtual”) camera with the user’s eye for long-term comfort was a major point made by mixed reality (optical and passthrough) headset user and advocate Steve Mann in his March 2013 IEEE Spectrum article, “What I’ve learned from 35 years of wearing computerized eyewear“. Quoting from the article, “The slight misalignment seemed unimportant at the time, but it produced some strange and unpleasant results. And those troubling effects persisted long after I took the gear off. That’s because my brain had adjusted to an unnatural view, so it took a while to readjust to normal vision.” 

I don’t know if or how well Apple has corrected the misalignment with “virtual cameras” (transforming the image to match what the eye should see) as Meta attempted (poorly) with the MQP. Still, they seem to have made the problem much more difficult by locating the cameras so far away from the center of the eyes.

Visual Coordination and depth perception

Having the cameras and sensors in poor locations would make visual depth sensing and coordination more difficult and less accurate, particularly at short distances. Any error will be relatively magnified as things like one’s hands get close to the eyes. In the extreme case, I don’t see how it would work if the user’s hands were near and above the eyes.

The demos indicated using some level of depth perception in the video (stills below) were contrived/simple. I have not heard any demos stressing coordinated hand movement with a real object. Any offset error in the virtual camera location might cause coordination problems. Nobody may know or have serious problems with a short demo, particularly if they don’t do anything close up, but I am curious about what will happen with prolonged use.

Vergence Accommodation Confict or Variable Focus

There must be on the order of a thousand papers and articles on the issue of vergence-accommodation conflict (VAC). Everyone in the AR/VR and 3-D movie industries knows about the problem. The 3-D stereo effect is caused by having a different view for each eye which causes the eyes to rotate and “verge,” but the muscles in the eye will adjust focus, “accommodate,” based on what it takes to focus. If the perceived distances are different, it causes discomfort, referred to as VAC.

Figure From: Kieran Carnegie, Taehyun Rhee, “Reducing Visual Discomfort with HMDs Using Dynamic Depth of Field,” IEEE Computer Graphics & Applications, Sept.-Oct. 2015, doi:10.1109/MCG.2015.98

Like most other VR headsets, the AVP most likely has a fixed focus at about 2 meters (+/- 0.5m). From multiple developer reports, Apple seems to be telling developers to put things further away from the eyes. Two meters is a good compromise distance for video games where things are on walls or further away. VAC is more of a problem when things get inside 1m, such as when the user works with their hands, which can be 0.5m or less away.

When there is a known problem with many papers on the subject and no products solving it, it usually means there aren’t good solutions. The Magic Leap 1 tried a dual-focus waveguide solution but at the expense of image quality and cost and abandoned it on Magic Leap 2. Meta regularly presents papers and videos about their attempts to address VAC, including Half Dome 1, 2, and 3, focus surfaces, and a new paper using varifocal at Siggraph in August 2023.

There are two main approaches to VAC; one involves trying to solve for focus everywhere, including light fields, computational holograms, or simultaneous focus planes (ex. CREAL3D, VividQ, & Lightspace3D), and the other uses eye tracking to control varifocal optics. Each requires more processing, hardware complexity, and a loss of absolute image quality. But just because the problem is hard does not make it disappear.

From bits and pieces I have heard from developers at WWDC 2023, it sounds like Apple is trying to nudge developers to make objects/screens bigger but with more virtual distance. In essence, to design the interfaces to reduce the VAC issue from close-up objects.

Real-World Monitors are typically less than 0.5m away

Consider a virtual computer monitor placed 2m away; it won’t behave like a real-world monitor less than 1/2 meter away. You can blow up the monitor to have the text be the same size, but if working properly in the virtual space, the text and other content won’t vary in size the same way when you lean in, no less being able to point at something with your finger. Many subtle things you do with a close-up monitor won’t work with a virtual, far-away large monitor. If you make the virtual monitor act like it is the size and distance of a real-world monitor, you have a VAC problem.

I know some people have suggested using large TVs from further away as computer monitor to relax the eyes, but I have not seen this happening much in practice. I suspect it does not work very well. I have also seen “Ice Bucket challenges,” where people have worn a VR headset as a computer monitor for a week or month, but I have yet to see anyone say they got rid of their monitors at the end of the experiment. Granted, the AVP has more resolution and better motion sensing and tracking than other VR headsets, but these may be necessary but not sufficient. I don’t see a Virtual workspace as efficient for business applications compared to using one or more monitors (I am open to seeing studies that could prove otherwise).

A related point that I plan to discuss in more detail in Part 3 is that there have been near-eye “glasses” for TVs (such as Sony Glasstron) and computer use for the last ~30 years. Yet, I have never seen one used on an airplane, train, or office in all these years. It is not that the displays didn’t work or were too expensive for an air traveler (who will spend $350 on noise-canceling earphones) and had a sufficient resolution for at least watching movies. But 100% of people decide to use a much smaller (effective) image; there must be a reason

The inconvenient real world with infinite focus distances and eye saccades

VAC is only one of many image generation issues I put in the class of “things not working right,” causing problems for the human visual system. The real world is also “inconvenient” because it has infinite focus distances, and objects can be any distance from the user.

The human eye works very differently from a camera or display device. The eye jumps around in “saccades,” that semi-blank vision between movements. Where the eye looks is a combination of voluntary and involuntary movement and varies if one is reading or looking, for example, at a face. Only the center of vision has a significant resolution and color differentiation, and a sort of variable resolution “snapshot” is taken at each saccade. The human visual system then pieces together what a person “sees” from a combination of objective things captured by each saccade and subjective information (eyewitnesses can be highly unreliable). Sometimes the human vision pieces together some display illusions “wrong,” and the person sees an artifact; often, it is just a flash of something the eye is not meant to see.

Even with great eye tracking, a computer system might know where the eye is pointing, but it does not know what was “seen” by the human visual system. So here we have the human eye taking these “snapshots,” and the virtual image presented does not change quite the way the real world does. There is a risk that the human visual system will know something is wrong at a conscious (you see an artifact that may flash, for example) or unconscious level (over time, you get a headache). And once again, everybody is different in what visual problems most affect them.

Safety and Peripheral Vision

Anyone who has put on a VR headset from a major manufacturer gets bombarded with messages at power-up to make sure they are in a safe place. Most have some form of electronic “boundaries” to warn you when you are straying from your safe zone. As VR evangelist Bradley Lynch told me, the issue is known as “VR to the ER,” for when an enthusiastic VR user accidentally meets a real-world object.

I should add that the warnings and virtual boundaries with VR headsets are probably more of a “lawyer thing” than true safety. As I’m fond of saying, “No virtual boundary is small enough to keep you safe or large enough not to be annoying.”

Those in human visual factors say (to the effect), “Your peripheral vision is there to keep you from being eaten by the tigers,” translated to the modern world, it keeps you from getting hit by cars and running into things in your house. Human vision and anatomy (how your neck wants to bend) are biased in favor of looking down. The saying goes, there are many more dangerous things on the ground than in the air.

Peripheral vision has very low resolution and almost no sense of color, but it is very motion and flicker-sensitive. It lets you sense things you don’t consciously see to make you turn your head to see them before you run into them. The two charts on the right illustrate a typical person’s human vision for the Hololens 2 and the AVP. The lightest gray areas are for the individual right and left eye; the central rounded triangular mid-gray area is where the eye has binocular overlap, and you have stereo/depth vision. The near-black areas are where the headset blocks your vision. The green area shows the display’s FOV.

Battery Cable and No Keep-Alive Battery

What is concerning from a safety perspective is that with the AVP, essentially all peripheral vision is lost, even if the display is in full passthrough mode with no content. It is one thing to have a demo in a safe demo room with “handlers/wranglers,” as Apple did at the WWDC; it is another thing to let people loose in a real house or workplace.

Almost as a topper on safety, the AVP has the battery on an external cable which is a snag hazard. By all reports, the AVP does not have a small “keep-alive” battery built into the headset if the battery is accidentally disconnected or deliberately swapped (this seems like an oversight). So if the cable gets pulled, the user is completely blinded; you better hope it doesn’t happen at the wrong time. Another saying I have is, “There is no release strength on a breakaway cable that is weak enough to keep you safe that is strong enough not to release when you don’t want it to break.

Question, which is worse?:

A) To have the pull force so high that you risk pulling the head into something dangerous, or

B) To have the cord pull out needlessly blinding the person so they trip or run into something

This makes me wonder what warnings, if any, will occur with the AVP.

Mechanical Ergonomics

When it comes to the physical design of the headset, it appears that Apple strongly favored style over functionality. Even from largely favorable reviewers, there were many complaints about physical comfort being a problem.

Terrible Weight Distribution

About 90% of the weight of the AVP appears to be in front of the eyes, making the unit very front-heavy. The AVP’s “solution” is to clamp the headset to the face with the “Light Seal” face adapter applying pressure to the face. Many users with just half-hour wear periods discussed the unit’s weight and pressure on the face. Wall Street Journal reporter Joanne Stern discussed the problem and even showed how it left red marks on her face. Apple was making the excuse that they only had limited face adapters and that better adapters would fix or improve the problem. There is no way a better Light Seal shape will fix the problem with so much weight sitting beyond the eyes and without any overhead support.

Estimation of the battery size and weight

Experience VR users that tried on the AVP report that they think the AVP headset weighs at least 450 grams, with some thinking it might be over 500 grams. Based on the battery cable size, I think it weighs about 60 grams pulling asymmetrically on the headset. Based on a similar size but slightly differently shaped battery, the AVP’s battery is about 200 grams. While a detachable battery gives options for larger batteries or a direct power connection, it only saves about 200-60 = 140 grams of weight on the head in the current configuration.

Many test users commented on their being an over-the-head strap, and one was shown in the videos (see lower right above). Still, this strap shown is very far behind the unit’s center of gravity and will do little to take the weight off the front that could help reduce the clamping force required against the face. This is basic physics 101.

I have seen reports that several strap types will be available, including ones made out of leather. I expect there will have to be front-to-back straps built-in to relieve pressure on the user’s face.

I thought they could clip a battery back with a shorter cable to the back of the headset, similar to the Meta Quest Pro and Hololens 2 (below), but this won’t work as the back headband is flexible and thus will not transfer the force to help balance the front. Perhaps Apple or 3rd parties will develop a different back headband without as much flexibility, incorporating a battery to help counterbalance the front. Of course, all this talk of straps will be problematic with some hairstyles (ex., right) where neither a front-to-back nor side-to-side strap will work.

Meta Quest Pro is 722 grams (including a ~20Wh battery), and Hololens 2 is 566 grams (including a ~62Wh battery). Even with the forehead pad, the Hololens 2 comes with a front-to-back strap (not shown in the picture above), and the Meta Quest Pro needs one if worn for prolonged periods (and there are multiple aftermarket straps). Even most VR headsets lighter than the AVP with face seals have overhead straps.

If Apple integrated the battery into the back headband, they would only add about 200 grams or a net 140 grams, subtracting out the weight of the cable. This would place the AVP between the Meta Quest Pro and Hololens 2 in weight.

Apple denies physics and the shape of human heads to think they won’t need better support than they have shown for the AVP. I don’t think the net 140 grams of a battery is the difference between needing head straps.

Conclusions

I see Many of the problems with the AVP because doing Passthrough AR well is very hard and because of trade-offs and compromises they made between features and looks. I think Apple made some significant compromises to support the Eyesight feature that even many fans of the technology say Eyesight will have Uncanny Valley problems with people.

As I wrote in Part 1, the AVP blows away the Meta Quest Pro (MQP) and has a vastly improved passthrough. The MQP is obsolete by comparison. Still, I am not convinced it is good enough for long-term use. There are also a lot of basic safety issues.

Next time, I plan to explore more about the applications Apple presented and whether they are realistic regarding hardware support and human factors.

Appendix: Some Cleanup on Part 1

I had made some size comparisons and estimated that the AVP’s battery was about 35Wh to 50Wh, and then I found that someone had leaked (falsely) 36Wh, so I figured that must be it. But not a big difference, as other reports now estimate the battery at about 37Wh. My main point is that the power was higher than some reported, and my power estimate seems close to correct.

All the pre- and post-announcement rumors suggested that the AVP uses pancake optics. I jumped to an erroneous conclusion from the WWDC 2023 video that they made it look like it was aspheric refractive. In watching the flurry of reports and concentrating on the applications, I missed circling back to check on this assumption. It turns out that Apple’s June 5th news release states, “This technological breakthrough, combined with custom catadioptric lenses that enable incredible sharpness and clarity . . . ” Catadioptric means a combination of refractive and reflective optical elements, which included pancake optics. Apple recently bought Limbak, an optics design company known for catadioptric designs, including those used in Lynx (which are catadioptric, but not pancake optics, and not what the AVP uses). They also had what they called “super pancake” designs. Apple eschews using any word used by other companies as they avoided saying MR, XR, AR, VR, and Metaverse, and we can add to that list “pancake optics.”

From Limbak’s Website: Left shows their 2 -element “Super-Pancake,” and the middle two show Lynx’s optics.

Apple Vision Pro (Part 1) – What Apple Got Right Compared to The Meta Quest Pro

14 June 2023 at 03:21

Update June 14, 2023 PM: It turns out that Apple’s news release states, “This technological breakthrough, combined with custom catadioptric lenses that enable incredible sharpness and clarity . . . ” Catadioptric means a combination of refractive and reflective optical elements. This means that they are not “purely refractive” as I first guessed (wrongly). They could be pancake or some variation of pancake optics. Apple recently bought Limbak, an optics design company known for catadioptric designs including those used in Lynx. They also had what they called “super pancake” designs. Assuming Apple is using a pancake design, then the light and power output of the OLEDs will need to be about 10X higher.

UPDATE June 14, 2023 AM: The information on the battery used as posted by Twitter User Kosutami turned out to be a hoax/fake. The battery shown was that of a Meta Quest 2 Elite as shown in a Reddit post of a teardown of the Quest 2 Elite. I still think the battery power of the Apple Vision Pro is in the 35 to 50Wh range based on the size of the AVP’s battery pack. I want to thank reader Xuelei Zhang for pointing out the error. I have red-lined and X-out the incorrect information in the original article. Additionally based on the battery’s size, Charger Labs estimates that the Apple Vision Pro could be in the 74WH range, but I think this is likely too high based on my own comparison.

I have shot a picture with a Meta Quest Pro (as a stand-in to judge size and perspective to compare against Apple’s picture of the battery pack. In the picture is a known 37Wh battery pack. This battery pack is in a plastic case with two USB-A and one USB-micro, not in the Apple battery pack (there are likely some other differences internally).

I tried to get the picture with a similar setup and perspective, but this is all very approximate to get a rough idea of the battery size. The Apple battery pack looks a little thinner, less wide, and longer than the 37Wh “known” battery pack. The net volume appears to be similar. Thus I would judge the Apple battery to be between about 35Wh and 50Wh.

Introduction

I’ve been watching and reading the many reviews by those invited to try (typically for about 30 minutes) the Apple Vision Pro (AVP). Unfortunately, I saw very little technical analysis and very few with deep knowledge of the issues of virtual and augmented reality. At least they didn’t mention what seemed to me to be obvious issues and questions. Much of what I saw were people that were either fans or grateful to be selected to get an early look at the AVP and wanted (or needed) to be invited back by Apple.

Unfortunately, I didn’t see a lot of “critical thinking” or understanding of the technical issues rather than having “blown minds.” Specifically, while many discussed the issue of the uncanny valley with the face capture and Eyesight Display, no one even mentioned the issues of variable focusing and Vegence Accommodation Conflict (VAC). The only places I have seen it mentioned are in the Reddit AR/VR/MR and Y-Combinator forums. On June 4th, Brad Lynch reported on Twitter that Meta would present their “VR headset with a retinal resolution varifocal display” paper at Siggraph 2023.

As I mentioned in my AWE 2023 presentation video (and full slides set here), I was doubtful based on what was rumored that Apple would address VAC. Like many others, Apple appears to have ignored the well-known and well-documented human mechanical and visual problem with VR/MR. As I said many times, “If all it took were money and smart people, it would be here already. Apple, Meta, etc. can’t buy different physics,” and I should add, “they are also stuck with humans as they exist with their highly complex and varied visual systems.”

Treat the above as a “teaser” for some of what I will discuss in Part 2. Before discussing the problems I see with the Apple Vision Pro and its prospective applications in Part 2, this part will discuss what the AVP got right over the Meta Quest Pro (MQP).

I know many Apple researchers and executives read this blog; if you have the goods, how about arranging for someone that understands the technology and human factor issues to evaluate the AVP?

Some Media with some Critical Thinking about the AVP

I want to highlight three publications that brought up some good issues and dug at least a little below the surface. SadlyIsBradley had an hour and 49-minute live stream discussing many issues, particularly the display hardware and the applications relative to VR (the host, Brad Lynch, primarily follows VR). The Verge Podcast had a pre-WWDC (included some Meta Quest 3) and post-WWDC discussion that brought up issues with the presented applications. I particularly recommend listening to Adi Robertson’s comments in the “pre” podcast; she is hilarious in her take. Finally, I found Snazzy Lab’s 13-minute explanation about the applications put into words some of the problems with the applications Apple showed; in short, there was nothing new that had not failed before and was not just because the hardware was not good enough.

What Apple got right that Meta Quest (Half)-Pro got wrong.

Apple’s AVP has shown up in Meta’s MQP in just about everyone’s opinion. The Meta quest pro is considered expensive and poorly executed, with many features poorly executed. The MQP costs less than half as much at introduction (less than 1/3rd after the price drop) but is a bridge to nowhere. The MQP perhaps would better be called the Quest 2.5 (i.e., halfway to the Quest 3). Discussed below are specific hardware differences between the AVP and MQP.

People Saying the AVP’s $3,499 price is too high lack historical perspective

I will be critical of many of Apple’s AVP decisions, but I think all the comments I have seen about the price being too high completely miss the point. The price is temporal and can be reduced with volume. Apple or Meta must prove that a highly useful MR passthrough headset can be made at any price. I’m certainly not convinced yet, based on what I have seen, that the AVP will succeed in proving the future of passthrough MR, but the MQP has shown that halfway measures fail.

The people commenting on the AVP’s price have been spoiled by looking at mature rather than new technology. Take as just one example, the original retail price of the Apple 2 computer with 4 KB of RAM was US$1,298 (equivalent to $6,268 in 2022) and US$2,638 (equivalent to $12,739 in 2022) with the maximum 48KB of RAM (source Wikipedia). As another example, I bought my first video tape recorder in 1979 for about $1,000, which is more than $4,400 adjusted for inflation, and a blank 1.5-hour tape was about $10 (~$44 in 2023 dollars). The problem is not price but whether the AVP is something people will use regularly.

Passthrough

Meta Quest Pro’s (MQP) looks like a half-baked effort compared to the AVP. The MQP’s passthrough mode is comically bad, as shown in Meta Quest Pro (Part 1) – Unbelievably Bad AR Passthrough. Apple’s AVP passthrough will not be “perfect” (more on that in part 2), but Apple didn’t make something with so many obvious problems.

The MQP used two IR cameras with a single high-resolution color camera in the middle to try and synthesize a “virtual camera” for each eye with 3-D depth perception. The article above shows that the MQP’s method resulted in a low-resolution and very distorted view. The AVP has a high-resolution camera per eye, with more depth-sensing cameras/sensors and much more processing to create virtual camera-per-eye views.

I should add that there are no reports I have seen on how accurately the AVP creates 3-D views of the real world, but by all reports, the AVP’s passthrough is vastly better than that of the MQP. A hint that all is not well with the AVP’s passthrough is that the forward main cameras are poorly positioned (to be discussed in Part 2).

Resolution for “Business Applications” (including word processing) – Necessary, but not sufficient

The next issue is that if you target “business applications” and computer monitor replacement, you need at least 40 pixels per degree (ppd), preferably more. The MQP has only about 20 pixels per degree, meaning much less readable text can fit in a given area. Because the fonts are bigger, the eyes must move further to read the same amount of text, thus slowing down reading speed. The FOV of the AVP has been estimated to be about the same as the MQP, but the AVP has more than 2X the horizontal and vertical pixels, resulting in about 40 ppd.

A note on measuring Pixels per Degree: Typically, VR headset measurement of FOV includes the biocular overlap from both eyes. When it comes to measuring “pixels per degree,” the measurment is based on the total visible pixels divide by the FOV in the same direction for a single eye. The single eye FOV is often not specified and there may be pixels that are cut off based on the optics and the eye location. Additionally, the measurement has a degree of variability based on the amount of eye relief assumed.

Having at least 40 pixels per degree is “necessary but not sufficient” for supporting business applications. OI believe that other visual human factors will make the AVP unsuitable for business applications beyond “emergency” situations and what I call the “Ice Bucket Challenges,” where someone wears a headset for a week or a month to “prove” it could be done and then goes back to a computer monitor/laptop. I have not seen any study (having looked for many years), and Apple presented none that suggests the long-term use of virtual desktops is good for humans (if you know of one, please let me know).

The watchOS WWDC message to watch screens less

Ironically, in the watchOS video, only a few minutes before the AVP announcement, Apple discussed (linked in WWDC 2023 video) how they implemented features in watchOS to encourage people to go outside and stop looking at screens, as it may be a cause of myopia. I’m not the only one to catch this seeming contradiction in messaging.

AVP Micro-OLED vs. MQP’s LCD with Mini-LED Local Dimmable Backlight

The AVP’s Micro-OLED should give better black levels/contrast than MPQ’s LCD with a mini-LED local dimmable backlight. Local dimming is problematic and based on scene content. While the mini-LEDs are more efficient in producing light, much of that light is lost when going through the LCD, and typically only about 3% to 6% of the backlight makes it through the LCD.

While Apple claims to be making the Micro-OLED CMOS “backplane,” by all reports, Sony is applying the OLEDs and performing the Micro-OLED assembly. Sony has long been the leader Micro-OLEDs used in camera viewfinders and birdbath AR headsets, including Xreal (formerly Nreal — see Nreal Teardown: Part 2, Detailed Look Inside).

Micro-Lens Array for Added Efficiency

The color sub-pixel arrangement in the WWDC videos shows a decidedly small light emission area with black space between pixels than the older Sony ECX335 (shown with pixels roughly to scale above). This suggests that Apple didn’t need to push the light output (see optic efficiency in next section) and supported more efficient light collection (semi-collimation) with the use of micro-lens-arrays (MLAs) which are reportedly used on top of the AVP’s Micro-OLED.

MQP’s LCD with Mini-LED with Local Dimming

John Carmack, former Meta Consulting CTO, gave some of the limitations and issues with MQP’s Local Dimming feature in his unscripted talk after the MQP’s introduction (excerpts from his discussion):

21:10 Quest Pro has a whole lot of back lights, a full grid of them, so we can kind of strobe them off in rows or columns as we scan things out, which lets us sort of get the ability of chasing a rolling shutter like we have on some other things, which should give us some extra latency. But unfortunately, some other choices in this display architecture cost us some latency, so we didn’t wind up really getting a win with that.

But one of the exciting possible things that you can do with this is do local dimming, where if you know that an area of the screen has nothing but black in it, you could literally turn off the bits of the backlight there.  . . .

Now, it’s not enabled by default because to do this, we have to kind of scan over the screens and that costs us some time, and we don’t have a lot of extra time here. But a layer can choose to enable this extra local dimming. . . .

And if you’ve got an environment like I’m in right now, there’s literally no complete, maybe a little bit on one of those surfaces over there that’s a complete black. On most systems, most scenes, it doesn’t wind up actually benefiting you. . . .

There’s still limits where you’re not going to get, on an OLED, you can do super bright stars on a completely black sky. With local dimming, you can’t do that because if you’ve got a max value star in a min value black sky, it’s still gotta pick something and stretch the pixels around it. . . . We do have this one flag that we can set up for layer optimization.

John Carmack Meta Connect 2022 Unscripted Talk

Pancake (MQP) versus Aspherical Refractive Optics (AVP)

Update June 14, 2023 PM: It turns out that Apple’s news release states, “This technological breakthrough, combined with custom catadioptric lenses that enable incredible sharpness and clarity . . . ” Catadioptric means a combination of refractive and reflective optical elements. This means that they are not “purely refractive” as I first guessed (wrongely). They could be pancake or some variation of pancake optics. Apple recently bought Limbak, an optics design company known for catadioptric designs including those used in Lynx. They also had what they called “super pancake” designs. Assuming Apple is using a pancake design, then the power output of the OLEDs will need to be about 10X higher.

Apple used a 3-element aspherical optic rather than Pancake optics in the MQP and many other new VR designs. See this blog’s article Meta (aka Facebook) Cambria Electrically Controllable LC Lens for VAC? which discusses the efficiencies issues with Pancake Optics. Pancake optics are particularly inefficient with Micro-OLED displays, as used in the AVP because they require the unpolarized OLED light to be polarized for the optics to work. This polarization typically loses about 55% of the light (45% transmissive). Then there is a 50% loss on the transmissive pass and another 50% loss on the reflection of a 50/50 semi-mirror in the pancake optics, which results, when combined with the polarization loss, less than 11% of the OLED’s light, making it through pancake optics. It should be noted that the MQP currently uses LCDs that output polarized light, so it doesn’t suffer the polarization loss with pancake optics but still has the 50/50 semi-mirror losses.

AVP’s Superior Hand Tracking

The AVP uses four hand-tracking cameras, with the two extra cameras supporting the tracking of hands at about waist level. Holding your hand up to be tracked has been a major ergonomic complaint of mine since I first tried the Hololens_1. Anyone who knows anything about ergonomics knows that humans are not designed to hold their hands up for long periods. Apple seems to be the first company to address this issue. Additionally, by all reports, the hand tracking is very accurate and likely much better than MQP.

AVP’s Exceptionally Good Eye Tracking

According to all reports, the AVP’s eye tracking is exceptionally good and accurate. Part of the reason for this better eye tracking is likely due to better algorithms and processing. On the hardware side, it is interesting that the AVP’s IR illuminator and cameras go through the eyepiece optics. In contrast, on the Meta Quest Pro, the IR illuminator and cameras are closer to the eye on a ring outside the optics. The result is that the AVP cameras have a more straight-on look at the eyes. {Brad Lynch of SadlyIsBradley pointed out the difference in IR illuminator and camera location between the AVP and MQP in an offline discussion.}

Processing and power

As many others have pointed out, the AVP uses a computer-level CPU+GPU (M2) and a custom-designed R1 “vision processor,” whereas the MQP uses high-end smartphone processors. Apple has pressed its advantage in hardware design over Meta or anyone else.

The AVP (below left), the AVP has two squirrel-cage fans situated between the M2 and R1 processor chips and the optics (below left). The AVP appears to have about 37 Watt-Hour battery (see next section) to support the two-hour rated battery life. Thus it suggests that the AVP consumes “typically” about 18.5 Watts. This is consistent with people noticing very-warm/hot air coming out of the top vent holes. The MQP (below right) has a similar dual fan cooling. The MQP has a 20.58 Watt-Hour battery, with the MQP rated by Meta as lasting 2-3 hours.

Because the AVP uses a Micro-OLED and a much more efficient optical design, I would expect the AVP’s OLED to consume less than 1W per eye and much less when not viewing mostly white content. I, therefore, suspect that much of the power in the AVP is going to the M2 and R1 processing. In the case of Meta’s MQP, I suspect that a much higher percentage of the system power will power through the inefficient optical architecture.

It should be noted that the AVP displays about 3.3 times the pixels, has more and higher resolution cameras, and supports much higher resolution passthrough. Thus the AVP is moving massively more data which also consumes power. So while it looks like the AVP consumes about double the power, the power “per pixel” is about 1/3rd less than the MQP and probably much less when considering all factors. Considering the processing done by the AVP seems much more advance processing, it demonstrates Apple’s processing efficiency.

APV appears to take about 2X the power of the MQP – And 2X what most others are reporting

CORRECTION (June 14, 2023): Based on information from reader Xuelei Zhang, I was able to confirm that widely reported tweet of the so-called Apple Vision Pro Battery was a hoax and what was shown is the battery used in a Meta Quest 2 Elite. You can see in the picture on the right how the number is the same and there is the metal slug with the hole just like the supposed AVP battery. I still think based on the size of the battery pack is similar in size to a 37Wh battery or perhaps larger. In an article publish today, Charger Labs estimates that the Apple Vision Pro could be in the 74WH range which is certainly possible, but appears to me to be too big. It looks to me like the batter is between 35Wh and 50Wh.

Based on the available information, I would peg the battery to be in the 35 to 50Wh range and thus the power “typical” power consumption of the AVP to be in the 17.5W to 25W range or about two times the Meta Quest Pro’s ~10W.

Numerous, what I think is erroneous, articles and video report that the AVP has a 4789mAh/18.3Wh battery. Going back to the source of those reports, at Tweat by Kosutami, it appears that the word “dual” was missed. Looking at the original follow-up Tweats, the report is clear that two cells are folded about a metal slug and, when added together, would total 36.6Wh. Additionally, in comparing the AVP’s battery to scale with the headset, it appears to be about the same size as a 37Wh battery I own, which is what I was estimating before I saw Kosutami’s tweet.

Importantly, if the AVP’s battery capacity is doubled, as I think is correct, then the estimated power consumption of the AVP is about double what others have reported, or about 18.5 Watts per hour.

The MQP battery was identified by iFixit (above left) to have two cells that combine to form a 20.58Wh battery pack, or just over half that of the AVP.

With both the MQP and AVP claiming similar battery life (big caveat, as both are talking “typical use”), it suggests the AVP is consuming about double the power.

Based on my quick analysis of the optics and displays, I think the AVP’s displays consume less than 1W per ey or less than 2W. This suggests that the bulk of the ~18W/hour is used by the two processors (M2, R1), data/memory movement (often ignored), the many cameras, and IR illuminators.

In part 2 of this series, I plan to will discuss the many user problems I see with the AVP’s battery pack.

Audio

This blog does not seriously follow audio technology, but by all accounts, the AVP’s audio hardware and spatial sound processing capability will be far superior to that of the MQP.

Conclusions

In many ways, the AVP can be seen as the “Meta Quest Pro done much better.” If you are doing more of a “flagship/Pro product,” it better be a flagship. The AVP is 3.5 times the current price of the MQP and about seven times that of the Meta Quest 3, but that is largely irrelevant in the long run. The key to the future is whether anyone can prove that the “vision” for passthrough VR at any price is workable for a large user base. I can see significant niche applications for the AVP (support for people with low vision is just one, although the display resolution is overkill for this use). But as I will discuss next time, there are giant holes in the applications presented.

If the MQP or AVP would solve the problems they purport to solve, the price would not be the major stumbling block. As Apple claimed in the WWDC 2023 video, the feature set of the AVP would be a bargain for many people. Time and volume will cure the cost issues. My problem (teaser for Part 2) is that neither will be able to fulfill the vision they paint, and it is not the difference between a few thousand dollars and a few more years of development.

❌
❌