ARKit is quite good at tracking images, but it struggles to disambiguate similar compositions. Core ML can help fill in the gaps.

ARKit Image Tracking
ARKit is a powerful tool that allows developers to create Augmented Reality apps. It comes loaded with image detection and tracking functionality, which allows apps to “anchor” virtual content contextually on to real-world surfaces.
Tracking/Detection Trade-offs
For the best experience, image detection should be robust across lighting conditions, orientation, and other printing/reproduction irregularities. ARKit prioritizes this stronger, uninterrupted tracking experience over fine disambiguation between tracking images. Consequently, ARKit is fairly “lenient” when it comes to image detection.
An Example: Identifying Playing Cards
Consider an application where a different AR experience is triggered off of each playing card. (Perhaps we learn the story of the different Queens and their path to royalty.)
Unfortunately, ARKit considers the Queen of Clubs and the Queen of Diamonds to be compositionally too similar to track separately.



How Core ML Can Help
Core ML can be employed to help disambiguate the playing cards using a simple image classifier. Compared to its vast capabilities, differentiating a few static compositions is a trivial task for machine learning. Using Create ML, Custom Vision, Watson, or any other drag-and-drop service capable of generating a .mlmodel
file, you can have a robust image classifier with as few as 5 training images per classification.
How To Use Core ML With ARKit
The general workflow for employing Core ML alongside ARKit is simple:
- ARKit informs you that it has detected a reference image coming in from the camera
- grab a snapshot of this real-world object
- feed it into your machine learning classifier
- Use the results to show the correct content
MLRecognizer
Lucky for you, we’ve done the heavy lifting for you over at https://github.com/Raizlabs/ARKit-CoreML.We’ve abstracted the tricky functionality behind a simple
MLRecognizer
class.
Instantiate it with a reference to your MLModel
and your ARSceneView
.
lazy var recognizer = MLRecognizer(
model: PlayingCards().model,
sceneView: sceneView
)
Then, use the classify
method to receive a classification for a given ARImageAnchor
func classify(imageAnchor: ARImageAnchor, completion: @escaping (Result<String>) -> Void)
For example, in the ARSCNViewDelegate
renderer(_:didAdd:for:)
callback, we can forward the image anchor to the MLRecognizer
to be snapshotted, cropped, deskewed, and classified.
func renderer(_ renderer: SCNSceneRenderer, didAdd node: SCNNode, for anchor: ARAnchor) {
guard let imageAnchor = anchor as? ARImageAnchor else { return }
// send off anchor to be snapshotted, cropped, deskewed, and classified
recognizer.classify(imageAnchor: imageAnchor) { [weak self] result in
if case .success(let classification) = result {
// update app with classification
self?.attachLabel(classification, to: node)
}
}
}
That’s it! Go build something cool.