Augmented Reality Technology: An Overview

Augmented Reality Technology: An Overview

This post will be a quick but comprehensive summary of the technology behind AR and some of its applications as of September 2020. We will investigate Microsoft HoloLensFacebook Spark ARGoogle ARCore, and Apple ARKit. After all, that’s what we’ve mostly worked with since 2016 at R2U (at the time called Real2U).

Before proceeding, it’s important to make it clear what exactly is defined as AR technology. Briefly speaking, Augmented Reality is a combination of multiple techniques related to computer vision and motion processing that combine information from the device’s camera and sensors in order to apply 2D or 3D content on top of it. The creators of Pokemon Go have one of the best definitions I could find:

Most of the augmented reality technology today actually comes from the Artificial Intelligence [Robotics] background, because for Augmented Reality there’s the reality part — you need to understand reality in order to augment it (Ross Finman, Head of AR, Niantic)

Target Tracking

Perhaps one of the first consumer applications of computer vision in augmented reality, Target Tracking was developed 10 years ago as a way to place 3D content on top of a 2D plane with specific characteristics, such as a QR Code, often referred to as a Marker.

From independent SDKs to Vuforia, we can say it all began with QR Codes

Image Tracking

Image Tracking (also called Image Anchor or Augmented Image) is the evolution of the QR Code. Instead of placing an object on top of a dull black-and-white marker, we can now use any image we want, provided it has enough feature points so that it is detected with clarity by the device.

I’m still waiting for the day where we’ll be all wearing AR glasses playing Yu-Gi-Oh! just like on the anime

Spatial Mapping

Spatial mapping provides a detailed representation of real-world surfaces in the environment around the user. This is more evident in powerful devices such as the HoloLens, where the world mesh is comprehensive enough to allow complex applications such as physics simulation and interaction of virtual objects with the real world.

A dense triangle mesh representation on the HoloLens

Spatial Awareness

Spatial Awareness or Scene Geometry is real-world environmental awareness in augmented reality applications. It takes a step further on the spatial mapping by actually understanding what has been mapped and putting some kind of label to the objects encountered.

Both the HoloLens and ARKit have a basic knowledge of objects detected and can understand simple things such as windows, doors, chairs, and couches

Plane Tracking

Now that we can map and understand the world around us, let’s keep track of horizontal and vertical planes, such as floors and walls, so that we can place virtual objects there and be sure they will not move in relation to the device.

At first, only horizontal planes were supported by most SDKs, but now vertical surfaces are also a reality

Spatial Anchors

After mapping our surroundings with a device, we can track any arbitrary position and rotation on that space, not only planes. That specific pose will be used as an anchor, just like our old friend QR Code, to store where the virtual object is located.

A virtual object is mapped to a fixed Spatial Anchor on the floor

Cloud Anchors

Cloud Anchors (also called Shared Experiences) are Spatial Anchors powered by the cloud. They enable the interaction between multiple devices and platforms, allowing HoloLens, iPhones, and Android devices to see the same mixed reality world.

A model is seen by both an iPad and a HoloLens at exactly the same place

Location Anchors

Popularized by Snapchat Landmarkers in 2019, Location Anchors are when Augmented Reality experiences are tied to a specific geolocation. They are now coming to ARKit 4, so we’ll probably see more applications of this tech in the near future.

An AR model fixed at a specific location in the real world, defined by latitude, longitude, and elevation

Object Tracking

Object Tracking is very similar to Image Tracking in the sense that you first need to scan a physical marker in order for it to be recognized by the device. This time, the marker is a 3D object in the real world, which can be detected after its spatial features are recorded.

The scanning process encodes three-dimensional spatial features of known real-world objects

Hand Tracking

Beginning with Hand Tracking, we exit the realm of positioning 2D or 3D content on top of static/fixed things such as floors and walls and start interacting with the human body. From here on, the technology becomes much more platform-dependent and we see a big gap in feature support between different SDKs.

HoloLens hand joint representation with 25 labels

For example, while Microsoft’s HoloLens can map your hand to a great degree of accuracy (which can even be boosted with external sensors), Facebook’s SparkAR will only understand the palm of your hand facing the camera up.

Want to wear a virtual watch on Instagram? Well, I guess you’ll have to wait for some time (pun intended)

Hand Gestures

In order to provide an accurate interpretation of Hand Gestures, you first need a precise tracking of hand movement. This is why only the HoloLens currently has good support for this technology, allowing the user to touch, grab, point to, and focus on a target.

Excerpt of the HoloLens input system

Face Tracking

While mobile devices have generally poor support for Hand Tracking capabilities, they surely excel when it comes to Face Tracking. Facebook’s SparkAR is perhaps the one with the largest userbase, surpassing even Apple and Google’s Augmented Faces, which are more restricted in terms of device support.

Who would think that tagging your friends on Facebook would end up on a virtual mustache?

Face Gestures

In the same way that it’s possible to understand Hand Gestures, we can also understand Face Gestures. The Facebook SDK can tell you when a user blinks, raises or lowers their eyebrows, moves their head, and open their mouths. It can even understand the generic context of the facial expression and say if you are with a happy, kissing, smiley, or surprised face — something we could maybe call Face Awareness, borrowing and extending the definition from Spatial Awareness.

The Flappy Bird blink clone which went viral on Instagram

Eye Tracking

Perhaps more common in the VR world than in AR, Eye Tracking provides developers with the ability to use information about what the user is looking at. On HoloLens 2, you can track the attention heatmap of the eye movement and even interact with the application by just looking at the UI.

Developers can log and visualize what users have been looking at in their app

Body Tracking

Body Tracking can be used to follow a person in the physical environment and visualize their motion by applying the same body movements to a virtual character. The current number of joints detected is much lower than what we see on the HoloLens hand tracking, but it’s good enough for most consumer applications.

Both ARKit and SparkAR have decent body tracking capabilities, but we’re yet to see a killer app exploring this technology

Background Segmentation

SparkAR is the platform that has popularized Background Segmentation both on Facebook and on Instagram. It’s like having a green screen behind you in the real world and then applying any kind of animation to the environment.

Background Segmentation is what happens when people are embarrassed by their room on a Zoom call 🏖


An Occlusion is an event that occurs when one object is hidden by another object that passes between it and the observer. In Augmented Reality, it happens when a 3D object blends seamlessly in the environment where it’s been placed. Depending on the platform, it can be achieved through a variety of different technologies, such as with Depth APIs, Depth Maps, or Depth Images.

Blue is further away than Red

People Occlusion

People Occlusion cover an app’s virtual content with people perceived in the camera feed. It is currently only available on iOS for iPhone XS or more recent devices. So, pretty powerful, but not pretty much used yet.

ARKit 3 brought People Occlusion to the masses, but it was Pokemon Go that pioneered it

Instant Placement

I don’t really like to say that Instant Placement (or Instant AR) is a new technology, per se, since this isn’t something you are actually using on your augmented reality application. It’s rather the technological progress of the AR technology itself (meaning computer vision and sensor fusion), which will inevitably get better over time.

LiDAR scanners and software techniques enable incredibly quick plane detection and faster placement of AR

In 2018, we were happy to wait 10 seconds for a virtual chair to appear on the floor, but now we complain if it takes more than 3 seconds. It will eventually be as fast as lightning, so do we really need yet another fancy name every time technology improves? Apple and Google think we do.

Light Estimation

Light Estimation APIs on AR applications analyze a given image for discrete visual cues (such as shadows, ambient light, shading, specular highlights, and reflections) and provide detailed information about the lighting in a given scene. Developers can then use this information when rendering virtual objects to light them under the same conditions as the scene they’re placed in, making these objects feel more realistic and enhancing the immersive experience for users.

Which of these mannequins is real and which is not?

Spatial Sound

Spatial Sound gives life into holograms and gives them a presence in the world — so that if users happen to lose sight of the virtual objects, they can find them with the help of echos positioned in 3D space. This is still restricted to headsets such as HoloLens, but hopefully one day we’ll see more integration with glasses and other wearable devices.

Spatial Sound is best used in conjunction with Spatial Mapping


Augmented Reality is very much a reality in our daily lives. Whether we shop for furniture online or we use face filters on Instagram, it happens so seamlessly that we barely notice what’s going on under the hood. The idea of this post was to provide an overview of some of its applications, so please let me know in the comments if something’s missing 🚀