Canon R5 Mk ii Drops Pixel Shift High Res. – Is Canon Missing the AI Big Picture?

23 August 2024 at 03:31

Introduction

Sometimes, companies make what seems, on the surface, technically poor decisions. I consider this the case with Canon’s new R5 Mark ii (and R1) dropping support for sensor Pixel Shifting High Resolution (what Canon calls IBIS High Res). Canon removed the IBIS High Res mode, which captures (as I will demonstrate) more real information and seemingly adds an AI upscaling to create fake information. AI upscaling, if desired, can be done better and more conveniently on a computer, but Pixel Shift/IBIS High Res cannot.

The historical reason for pixel shift is to give higher resolution in certain situations. Still, because the cameras combine the images “in-camera” with the camera’s limited processing and memory resources plus simple firmware algorithms, they can’t deal with either camera or subject motion. Additionally, while the Canon R5 can take 20 frames per second (the R5 Mark ii can take 30 frames per second), taking the nine frames takes about half a second, but then it takes another ~8 seconds for the camera to process them. Rather than putting more restrictions on shooting, it would have been much easier and faster to save the raw frames (with original sensor subpixels) to the flash drive for processing later by a much more capable computer using better algorithms that can constantly be improved.

Canon’s competitors, Sony and Nikon, are already saving raw files with their pixel-shift modes. I hoped Canon would see the light with the new R5 mark ii (R5m2) and support IBIS HR in saving the raw frames. Instead, Canon went in the wrong direction; they dropped IBIS High Res altogether and added an in-camera “AI upscaling.” computer. The first-generation R5 didn’t have IBIS High Res, but a firmware release later added this capability. I’m hoping the same will happen with the R5 Mark ii, only this time saving the RAW frames rather than creating an in-camera JPEG.

Features Versus Capabilities

I want to distinguish between a “feature” and a “capability.” Take, for example, high dynamic range. The classical photography problem is taking a picture in a room with a window with a view; you can expose inside the room, in which case the view out the window will be blown out, or you expose the view out the window, in which case the room will look nearly black. The Canon R5 has an “HDR Mode” that takes multiple frames at different exposure settings and allows you to save a single processed image only or with all the frames saved. The “feature” was making a single HDR image, and the “capability” was rapidly taking multiple frames with different exposures and saving those frames.

The Canon R5 made IBIS High Res a feature when it only offered a single JPEG output without the capability of saving individual frames with the sensor shifted by sub-pixel amounts. By saving raw frames, the software could better combine frames. Additionally, the software could deal with camera and subject motion, which are unsavable artifacts in an IBIS high-res JPEG. As such, when I use IBIS High Res, I typically take three pictures just in case, as one of the pictures often will have unfixable problems that can only be seen once viewed on a computer monitor. It would also be desirable to select how many frames to save; for example, saving more than one cycle of frames would help deal with subject or camera motion.

Cameras today support some aspects of “computational photography.” Saving multiple images can be used for panoramic stitching, high dynamic range, focus stacking (to support larger than possible depths of focus with a single picture), and astrophotography image stacking (using interval timers to take many shots that are added together). Many cameras, like the R5, have even added modes to support taking multiple pictures for focus stacking, high dynamic range, and interval timers. So for the R5 mk. ii to have dropped sensor pixel shifting seems like a backward direction in the evolution of photography.

This Blog’s Use of Pixel Shifting for Higher Resolution

Both cameras have “In-Body-Stabilization” (IBIS) that normally moves the camera sensor based on motion detection to reduce camera/lens motion blur. They both also support a high-resolution mode where, instead of using the IBIS for stabilization, they use it to shift the sensor by a fraction of a pixel to take a higher-resolution image. Canon called this capability “IBIS High Res.” The R5 in-camera combines nine images, each shifted by 1/3rd of a pixel, to make a 405mp JPEG image. The D5 combines four images, each shifted by a half pixel.

In the past year, I started using my “personal camera,” the Canon R5 (45MP “full frame” 35mm), to take pictures of VR/Passthrough-AR and optical AR glasses (where possible). I also use my older Olympus D5 Mark iii (20MP Micro 4/3rd) because it is a smaller camera with smaller lenses that lets it get into the optimum optical location in smaller form factor AR glasses.

The cameras and lenses I use most are shown on the right, except for the large RF15-35mm lens on the R5 camera, which is shown for comparison. To take pictures through the optics and get inside the eye box/pupil, the lens has to be physically close to the image sensor in the camera, which limits lens selection. Thus, while the RF15-35mm lens is “better” than the fixed focus 28mm and 16mm lenses, it won’t work to take a headset picture. The RF28mm and RF16mm lenses are the only full-frame Canon lenses I found to work. Cell phones with small lenses “work,” but they don’t have the resolution of a dedicated camera, aperture control, and shutter speed control necessary to get good pictures through headsets.

Moiré

In addition to photography being my hobby, I take tens of thousands of pictures a year via the optics of AR and VR headsets, which pose particular challenges for this blog. Because I’m shooting at displays with a regular pattern of pixels with a camera its regular pattern of pixels, there is a constant chance for moiré due to the beat frequencies between the pixels and color subpixels of the camera and the display device as magnified by the camera and headset optics (left). To keep within the eye box/pupil of the headset, I am limited to simpler lenses that are physically short to keep the distance from the headset optics to the camera short, which limits the focal lengths and thus magnification to combat moiré. In camera, pixel-shifting has proven to be a way to not only improve resolution but greatly reduce moiré effects.

Issues with moiré are not limited to taking pictures via AR and VR headsets; it is a problem with real-world pictures that include things like patterns in clothing (famously with fences (from a distance where they form a small pattern) and other objects with a regular pattern (see typical photographic moiré problems below).

Anti-Aliasing

Those who know signal theory know that a low-pass cutoff filter reduces/avoids aliasing (moiré is a form of aliasing). Cameras have also used “anti-aliasing” filters, which very slightly blur the image to reduce aliasing, but this comes at the expense of resolution. In the past, with lower-resolution sensors, the chance of encountering real-world things in a picture that would cause aliasing was more likely, and the anti-aliasing filters were more necessary.

As the resolution of sensors has increased, there is a lesser likelihood that something in the typical picture that is in focus will be at the point it aliases and combined with better algorithms that can detect and reduce the effect of moiré. Still, while sometimes the moiré can be fixed in post-processing, in critical or difficult situations, it would be better if additional frames were stored to clue software into processing it as aliasing/moiré rather than “real” information.

Camera Pixels and Bayer Filter (and misunderstanding)

Most cameras today (including Canon) use a Bayer Filter pattern (below right) with two green-filtered pixels for each red or blue pixel. When producing an image for a person to view, a computer’s camera or RAW conversion software, often called “debayering” or “demosaicing,” generates a full-color pixel by combining the information from many (8 or more) surrounding single-color pixels with the total number of full-color pixels equaling the number of photosites.

Camera makers count every photosite as a pixel even though the camera only captured “one color” at that photosite. Some people, somewhat mistakenly, think the resolution is one-quarter claimed since only one-quart red and blue photosites exist. After all, with a color monitor, we don’t count the red, green, and blue subpixels as 3 pixels but just one. However, Microsoft’s ClearType does gain some resolution from the color subpixels to refine text better.

It turns out that except for extreme image cases, including special test patterns, the effective camera resolution is close to the number of photosites (and not 1/4th or 1/2). There are several reasons why this is true. First, note the red, green, and blue filter’s frequency responses for the color camera sensor (above left – taken from a Sony sensor as it was available). Notice how their spectrums are wide and overlapping. The wide spectral nature of these filters is necessary to capture all the continuous spectrums of color in the real world (every call “red” does not have the same wavelength). If the filters were very narrow and only captured a single wavelength, then any colors that are not that wavelength would be black. Each photosite captures intensity information for all colors, but the filtering biases it toward bands of colors.

Almost everything (other than spectral lines from plasmas, lasers, and some test patterns) that can be seen in the real world is not a single wavelength but a mix of wavelengths. There is even the unusual case of magenta, which does not have a wavelength (and thus, many claim it is not a color) but is a mix of blue and red. With a typical photo, we have wide-spectrum filters capturing wide-spectrum colors.

It turns out that humans sense resolution mostly in intensity and not color. This fact has been exploited to reduce the bandwidth of early color television and to reduce data in all the video and image compression algorithms. Thanks to the overlap in the color filters in the camera filters, there is considerable intensity information in the various color pixels.

Human Vision and Color

Consider human vision if the camera sensor’s Bayer patterns and color filter spectral overlaps were bad, then consider the human retina. On average, humans have 7 million cones in the retina, of which ~64% are long (L) wavelength (red), ~32% medium (M – green), and ~2% short (S – blue). However, these percentages vary widely from person to person, particularly the percentage of short/blue cones. The cones that sense color support high resolution are concentrated in the center of vision.

Notice the spectral response of the so-called red, green, and blue cones (below left) and compare it to the camera sensor filters’ response above. Note how much the “red” and “green” responses overlap. On the right is a typical distribution of cones near the fovea (center) of vision, and note there are zero “blue”/short cones in the very center of the fovea; it makes the Bayer pattern look great😁.

Acuity of the Eye

Next, we have the fact that the cones are concentrated in the center of vision and that visual acuity falls off rapidly. The charts below show the distribution of rods and cones in the eye (left) and the sharp fall-off in visual acuity from the center of vision.

Saccadic Eye Movement – The Eyes’ “Pixel Shifting”

Looking at the distribution of cones and the lack of visual acuity outside the fovea, you might wonder how humans see anything in detail. The eye constantly moves in a mix of large and small steps known as saccades. The eye tends to blank while it moves and then takes a metaphorical snapshot. The visual cortex takes the saccade’s “snapshots” and forms a composite image. In effect, the human visual system is doing “pixel shifting.”

My Use of Pixel Shifting (IBIS High-Res)

I am a regular user of the IBIS High-Resolution on this blog. Taking pictures of displays with their regular patterns is particularly prone to moiré. Plus, with the limited lenses I can use that are all wide-angle (and thus low magnification), it helps to get some more resolution. With IBIS, a single picture 405 mp (24,576 by 16,384 pixels) IBIS High-Resolution image can capture ~100-degree wide FOV and yet see details of individual pixels from a 4K display device.

It seems a bit afterthought on the R5 with the JPEG output. Even with the camera on a tripod, it screws up, so usually, I take three shots just in case because I will only know later when I look at the results blown up on a monitor if one of them messed up. The close-in crops (right) are from two back-to-back shots with IBIS high-res. In the bad shot, you can see how the edges look feathered/jagged (particularly comparing vertical elements like the “l” in Arial). I would much rather have had the IBIS HR output the 9 RAW images.

IBIS High-Res Comparison to Native Resolution

IBIS High Res helps provide higher resolution and can significantly reduce moiré. Often, the pixel shift output will have much less moiré. I can often reduce the IBIS high-res to a lower resolution, and the image has much less moiré and is a bit sharper even when scaled down to the size of a “native” resolution picture as shown below.

The crops below show the IBIS High Res image at full resolution and the native resolution scaled up to match, along with insets of the IBIS High Res picture scaled down to match the native resolution.

The Image below was taken in IBIS High Resolution and then scaled down by 33.33% for publication on this blog (from the article AWE 2024 VR – Hypervision, Sony XR, Big Screen, Apple, Meta, & LightPolymers).

The crops below compare the IBIS High Res at full resolution to a native image upscaled by 300%. Notice how the IBIS High Res has better color detail. If you look at the white tower on a diagonal in the center of the picture (pointed to by the red arrow), you can see the red (on the left) blue chroma aberrations caused by the headset’s optics, but these and other color details are lost in the native shot.

Conclusions

While my specific needs are a little special, I think Canon is missing out on a wealth of computational photography options by not supporting IBIS High-Res with RAW output. The obvious benefits are helping with moiré and getting higher-resolution still lifes. By storing RAW, there is also the opportunity to deal with movement in the scene, which may even be hand-held. It would be great to have the option to control the shift amount (shift by 1/3 and 1/2 would be good options) and the number of pictures. For example, it would be good to capture more than one “cycle” to help deal with motion.

Smartphones are cleaning up on dedicated cameras in “computational photography” to make small sensors with mediocre optics look very good. Imagine what could be done with better lenses and cameras. Sony, a leader in cell phone sensors, knows this and has pixel shift with RAW output. I don’t understand why Canon is ceding the pixel shift to Sony and Nikon. Hopefully, it will be a firmware update like it was on the original R5. Only this time, please save the RAW/cRAW files.

In related news, I’m working on an article about Texas Instrument’s renewed thrust into AR with DLP. TI DLP has been working with PoLight to support Pixel Shift (link to video with PoLight) for resolution enhancement with AR glasses (see also Cambridge Mechatronics and poLight Optics Micromovement (CES/PW Pt. 6))

Reading view