3D Object Tracking in AR
July 7, 2022
One of the most powerful use-cases for AR is augmenting objects to add context-relevant information, highlights, overlays, instructions or textures. Imagine walking up to a complex machine and seeing a highlight around the gear you need to replace or the dial you need to turn.
Tracking a real-life object is useful for many applications:
- Safety and training for machinery, engines or any mechanical device
- Augmenting artwork or sculptures to visualize information that is otherwise hidden
- Instructions and user manuals for tools or medical devices
Now that we understand the value, how do we go about tracking an object? Well, it depends on the required accuracy, size of object, outdoor vs indoor, platform and other restrictions. In this blog post we will review all the different ways we can achieve object tracking and pros / cons for each. We're going to list these from easy to complex, and from free to paid. Let's begin.
The easiest and most "familiar" way to track an 3D object is to slap a 2D marker on it, or place one nearby. Once the user scans the marker, the app will know where the phone is located in 3D space and can render the virtual 3D object on top of the physical object. This only works if the 2D marker is placed firmly in a fixed and known position, and never moves in relation to the object. For example, in the object below the marker is on the base of the sculpture.
This method is easy and almost any AR library can track 2D markers in 3D space. If the library supports "SLAM" (or "pose tracking") then you can move the camera away from the marker and it will maintain tracking. Also, tracking is instant (this is not the case with some of the other methods). When the marker is in camera view, tracking starts.
As of July 2022, this is one of few methods for 3D object tracking that will work in a browser (as opposed to a native app that needs to be downloaded and installed).
Big objects will require big 2D markers. The smaller the marker, the larger the tracking error will be. It's hard to estimate exactly how much error because it depends on many factors, including the device itself (camera quality, CPU speed), but generally speaking to track an object the size of a person, the marker needs to be at least A4 (US-letter) size. Note that you can "disguise" a marker as an information banner, or some other graphics, it doesn't need to be a visually unappealing QR code. Also, the marker should be as close as possible to the object.
The accuracy of the tracking (how "tight" the augmentation fits on top of the real object) is going to be low. At least not when compared to the other methods we'll list next. Whether or not the tracking accuracy is good enough will depend on the specific use case.
ARCore Cloud Anchors
ARCore provides a way to place persistent content that's tied to a specific location. This feature is called Cloud Anchors. They are designed for location-based AR but they can also be used to augment specific objects, as long as the object is unmovable, like a sculpture or a structure.
Cloud Anchors work a little like 3D scanning, to create one you'll need to "capture" the location as if you were scanning it, by moving the phone around in a sweeping lateral motion. The anchor is stored on Google's servers and then an app can "resolve" it and get an accurate location in 3D space, which is all that's needed to augment an object in that space. It's a bit "backwards" but it works.
Cloud Anchors are free and work on both Android and iOS. They are easy to implement in your app and work surprisingly well outside (in most weather conditions) and indoors as well. Tracking is accurate to about 20cm (15") outdoors and 3-5cm indoors, in tests we've done.
Cloud Anchors have a lifespan of 1 year, which can be extended via their API (but you need to remember to do that every year!). They can't be stored locally and require an internet connection to work. As mentioned, this method only works for objects that are fixed in place and relatively large (fridge size or bigger). Also, cloud anchors will not work if the environment is very busy (for example, full of people), if it's "noisy" (surrounded by trees or vegetation) or if changes (someone moved around the furniture). Lastly, resolving an anchor ("localization") is not instant, the user is required to move their phone a bit.
ARKit Object Tracking
ARKit introduced a feature that allows apps to scan and detect objects. It's very straightforward to use and does exactly what you'd expect it to do. The scan relies on feature points so the object you're scanning needs to be high contrast for the best recognition.
Free to use. Easy to implement in your app, tracks quickly and accurately.
iOS only (sadly that's a deal breaker for most of our projects). Scanning the object has to be done with an iOS device running the ARKit scanning app, you can't supply a 3D model or an existing scan.
The scan captures color data so if the light hitting your object changes, you will want to scan multiple times through out the day. The sample app allows you to combine multiple scans.
The sample app can hits memory limits, so if you are scanning a large object you'll want to use the most powerful device available (probably an iPad pro).
Now we're entering the realm of paid AR libraries, and the first we'll review is Wikitude. Their Object Tracking feature has the ability to detect and track a 3D object based on a given 3D model, which can be either generated from CAD or 3D scanned. License is around $2500 USD, one time fee.
The cheapest of the paid AR libraries. Works with almost any model. Tracking quality and accuracy is ok. Not amazing, not bad.
The object tracking feature is still beta. To track an object you'll need to send a 3D model to the Wikitude team, and they send you back an asset you then add to your project. This can be time consuming. The basic license can only track one object at a time, multiple objects require a much more expensive license.
VisionLib is unique on this list because it's a library that is dedicated to 3D object tracking and does nothing else. Based in Germany, the company is new to the market and has only recently released the first public version. Similar to Wikitude and Vuforia (next on the list), to track an object you need to provide a 3D model.
Tracking is quite fast and accuracy is good (but not the best on this list). Accuracy can be improved by doing a "camera calibration", however that means you need to know in advance which device will run your app. Another very useful feature is the ability to record a video on device and then play it back in editor as a camera simulator. Being able to test and debug in editor saves a lot of time in the development process.
Cost is around $3700 USD per app per year (as of July 2022), plus a yearly developer license (no limit on number of apps).
Excellent tracking quality, if you can dictate which device the app will run on. Cheaper than Vuforia. Video recording and playback feature is very useful.
Still in Beta. API is clunky to use and requires manually editing configuration files. Documentation is also a bit lacking. Tracking quality without camera calibration is not as good as Vuforia.
The gold standard of 3D object tracking, Vuforia is one of the first (if not the first) AR frameworks and has a substantial track record. Vuforia has an interesting history of being the de facto leader of the mobile AR industry and then losing that title to ARCore+ARKit. They underwent multiple confusing licensing model changes and have now settled on something that's simple and coherent, and are starting to regain their userbase.
However, being focused on enterprise use, their Model Targets are priced accordingly. As with all the paid libraries, you provide a 3D model of the object you want to track. Then you need to prepare it with a special desktop app (downloaded from Vuforia), send it to cloud processing, download the resulting asset and add it to your Unity project. This process takes about an hour.
Tracking is instant and the best quality we've seen, and it runs on almost any device. My guess is that Vuforia (which is own by PTC) has enough resources to generate camera calibration files for every device on the market, and bundles those inside their SDK. Vuforia also has the very handy video recording and playback feature.
Best tracking on the market, in terms of detection speed and accuracy ("fit" of the augmentation over the physical object). Runs on almost any device (that's not very old). Video recording and playback feature is very useful. Pre-processing a model is clunky but once that is done, using the API in Unity is very straight forward and easy.
Eyewatering $25k USD per year license fee (for the "Premium" plan, which is required for object tracking), however this is for an unlimited number of apps. If your studio publishes more than 5-6 apps a year, this is actually cheaper than VisionLib.
3D object tracking is tricky if you want to do it for free, or in a browser where options are very limited. If your budget allows it, Vuforia or VisionLib can offer an excellent solution.