Enhancing ARKit Image Detection with CoreML

ARKit is quite good at tracking images, but it struggles to disambiguate similar compositions. Core ML can help fill in the gaps.

AR Kit Card Demo
Detecting and tracking a card demo

ARKit Image Tracking

ARKit is a powerful tool that allows developers to create Augmented Reality apps. It comes loaded with image detection and tracking functionality, which allows apps to “anchor” virtual content contextually on to real-world surfaces.

Tracking/Detection Trade-offs

For the best experience, image detection should be robust across lighting conditions, orientation, and other printing/reproduction irregularities. ARKit prioritizes this stronger, uninterrupted tracking experience over fine disambiguation between tracking images. Consequently, ARKit is fairly “lenient” when it comes to image detection.

An Example: Identifying Playing Cards

Consider an application where a different AR experience is triggered off of each playing card. (Perhaps we learn the story of the different Queens and their path to royalty.) Unfortunately, ARKit considers the Queen of Clubs and the Queen of Diamonds to be compositionally too similar to track separately.
Queen of Clubs
Queen of Diamonds
XCode reference issue
XCode can’t recognize the difference between these two iamges
This ambiguity makes it impossible to build the above experience using ARKit alone. (Both queens will be recognized in each card). By inspection, however, these two images should be easy for a machine to differentiate. Their colors and compositions are plenty different.

How Core ML Can Help

Core ML can be employed to help disambiguate the playing cards using a simple image classifier. Compared to its vast capabilities, differentiating a few static compositions is a trivial task for machine learning. Using Create MLCustom VisionWatson, or any other drag-and-drop service capable of generating a .mlmodel file, you can have a robust image classifier with as few as 5 training images per classification.

How To Use Core ML With ARKit

The general workflow for employing Core ML alongside ARKit is simple:
  • ARKit informs you that it has detected a reference image coming in from the camera
  • grab a snapshot of this real-world object
  • feed it into your machine learning classifier
  • Use the results to show the correct content
While the high-level approach isn’t complicated, the low-level execution is more difficult.


Lucky for you, we’ve done the heavy lifting for you over at https://github.com/Raizlabs/ARKit-CoreML. 

We’ve abstracted the tricky functionality behind a simple MLRecognizer class. Instantiate it with a reference to your MLModeland your ARSceneView.
lazy var recognizer = MLRecognizer(
    model: PlayingCards().model,
    sceneView: sceneView
Then, use the classify method to receive a classification for a given ARImageAnchor
func classify(imageAnchor: ARImageAnchor, completion: @escaping (Result<String>) -> Void)
For example, in the ARSCNViewDelegate renderer(_:didAdd:for:) callback, we can forward the image anchor to the MLRecognizer to be snapshotted, cropped, deskewed, and classified.
func renderer(_ renderer: SCNSceneRenderer, didAdd node: SCNNode, for anchor: ARAnchor) {

    guard let imageAnchor = anchor as? ARImageAnchor else { return }

    // send off anchor to be snapshotted, cropped, deskewed, and classified
    recognizer.classify(imageAnchor: imageAnchor) { [weak self] result in
        if case .success(let classification) = result {

            // update app with classification
            self?.attachLabel(classification, to: node)
That’s it! Go build something cool.

Leave a Reply

Your email address will not be published.