Machine vision is getting really awesome. And Apple's coreML offers some pretty accessible Machine vision.

Tools like make it easier to train custom coreML models.

Request: A coreML node that uses the coreML framework, and outputs results so that they can be used in the composition.

Could be a Pro feature?




jmcc's picture
Submitted by
Feature status:
Waiting for review by Team Vuo
Waiting for more information from reporter


Just to begin with, Core ML currently has 6 very different model types. Our goal with Vuo is to keep nodes understandable and task-focused. We think this is too broad for a Vuo feature request. Can you submit requests for specific things you'd like to be able to do using ML? We have several requests for tracking in images: Camera tracking, Tracking blobs, that seem similar to the types of things you might want to do with Core ML.

I am specifically interested

keithlang's picture
Submitted by

I am specifically interested in hand tracking. Ie, imagine a node that was similar to the Find Faces in Image node. 'Hand and Body Pose detection' came as part of Big Sur.

One current workaround, running HandPose OSC standalone app is ok, but is limited to one hand, and doesn't have the option of sending single frames for recognition. It also has the downside of not being able to be included in App export, and I assume has some tradeoffs in performance as a separate app.

Vuo has built-in support for

jmcc's picture
Submitted by

Vuo has built-in support for hand tracking using the Leap Motion, but not with normal RGB (non-depth) cameras.

HandPose OSC uses TensorFlow's handpose model (not CoreML). That model uses the Apache license, so we could potentially integrate it into Vuo (though, as you noted, it currently only tracks a single hand at a time).

Apple provides VNDetectHumanHandPoseRequest (only available in macOS 11.0+). We briefly tested their sample code; it's slower than HandPose OSC, and it also only detects a single hand at a time.

So, for single-hand detection, we have 2 options, at a two-dot complexity (and Pro only).

For multiple-hand detection, we could probably train a model to segment the image into separate per-hand images and run the existing model on each of those sub-images, but that would bump it up into 3-dot complexity.

Let us know if you want single or multiple-hand detection, and we'll modify the FR accordingly and open it for voting.

Cool topic, thank you for

Bodysoulspirit's picture
Submitted by

Cool topic, thank you for opening it Keith !

I see on Tensor's Hand Face Detection Model page they write multiple hand tracking is coming.

Jumping on the train with my limited understanding of all this :) Have some questions, maybe someone can answer me some.

There are some blob tracking feature requests already and this topic is about hand tracking, but are there yet some requests that I could have missed regarding

  1. 🏞 Image segmentation for people / background removal for example ?
    Using non-depth cameras like the Kinect, for example to extract the person only from its background from a webcam (would something like TensorFlow's BodySegmentation be better for this, or TensorFlow's Sementic Segmentation ?)

  2. 💀 Also, some skeletal tracking without Kinect (for example TensorFlow Pose Estimation ?).

  3. 👨 Face Landmark Detection
    Now we have already "Find Faces" with eyes recognition, but what about like a 3D mesh of the face like TensorFlow again their "Face Landmark Detection ?) Would that be useful in Vuo, don't know ;) Would we want to create some AR Masks with Vuo ? ;)

  4. Can I create some feature requests for those ? Or have I missed some that already exist ?

  5. A general question about those from my limited understanding about the topic, does for example those TensorFlow models come like pre-trained algorithms ? Or do you have to train them yourself ? Or is training them yourself a possibility also if you wanted to ? I see on the Apple CoreML page you can load trained models but also create some with CreateML ?

  6. Regarding Apple Models or libraries like Tensor's Models, I guess it's again kinda the question about Vulkan/OpenGL or Metal ? If a Windows version is still in the pipelines, the Vuo Team would have to implement CoreML for Mac users, and add some workload to implement different techniques for other platforms, so open source multi platform tools require less effort ?
    Of course, some Apple tools are really optimised for Apple products, so I guess as Jean Marie seem to say, it's about testing out performances and possibilities and finding the right balance ? How for example does Apple's new Skeletal Tracking performs VS TensorFlow's one ?
    I've stumbled across some what seem to be very efficient technologies, like the Banuba ones which of course come with a paid license, maybe those are the technologies used by Zoom for background removal (thought Snapchat would use it too, but I see they acquired another startup in this domain for their technology).
    I say this because sometimes the open source and free libraries seem less performant, and I guess it's up to the team to find those that work best, but those from Tensor seem to work pretty great !?

Of course I'd love to be able to implement such libraries and models myself into Vuo, but I can barely create some stateless nodes using the given API functions, I can't even create stateful nodes or custom functions ;)
I remember Martinus asked some questions about implementing stuff, can't wait to see what he's coming up with ;)
And also there is a feature request about a tutorial to implement libraries. Can't wait ;)

Feature status

When we (Team Vuo) plan each release, we try to implement as many of the community's top-voted feature requests as we have time for. Vote your favorite features to the top! (How do Vuo feature requests work?)

  • Submitted to
  • Waiting for more information from reporter