Single hand tracking in Vuo with non-depth cameras

keithlang · November 25, 2020, 4:33am

Machine vision is getting really awesome. And Apple’s coreML offers some pretty accessible Machine vision.

Tools like https://docs.lobe.ai/docs/export/export/ make it easier to train custom coreML models.

Request: A coreML node that uses the coreML framework, and outputs results so that they can be used in the composition.

Could be a Pro feature?

jmcc · November 27, 2020, 5:12pm

Just to begin with, Core ML currently has 6 very different model types. Our goal with Vuo is to keep nodes understandable and task-focused. We think this is too broad for a Vuo feature request. Can you submit requests for specific things you’d like to be able to do using ML? We have several requests for tracking in images: Camera tracking, Tracking blobs, that seem similar to the types of things you might want to do with Core ML.

keithlang · November 28, 2020, 8:57pm

I am specifically interested in hand tracking. Ie, imagine a node that was similar to the Find Faces in Image node. ‘Hand and Body Pose detection’ came as part of Big Sur.

One current workaround, running HandPose OSC standalone app is ok, but is limited to one hand, and doesn’t have the option of sending single frames for recognition. It also has the downside of not being able to be included in App export, and I assume has some tradeoffs in performance as a separate app.

jmcc · December 16, 2020, 1:59pm

Vuo has built-in support for hand tracking using the Leap Motion, but not with normal RGB (non-depth) cameras.

HandPose OSC uses TensorFlow’s handpose model (not CoreML). That model uses the Apache license, so we could potentially integrate it into Vuo (though, as you noted, it currently only tracks a single hand at a time).

Apple provides VNDetectHumanHandPoseRequest (only available in macOS 11.0+). We briefly tested their sample code; it’s slower than HandPose OSC, and it also only detects a single hand at a time.

So, for single-hand detection, we have 2 options, at a two-dot complexity (and Pro only).

For multiple-hand detection, we could probably train a model to segment the image into separate per-hand images and run the existing model on each of those sub-images, but that would bump it up into 3-dot complexity.

Let us know if you want single or multiple-hand detection, and we’ll modify the FR accordingly and open it for voting.

Bodysoulspirit · December 17, 2020, 10:58pm

Cool topic, thank you for opening it Keith !

I see on Tensor’s Hand Face Detection Model page they write multiple hand tracking is coming.

Opened a somehow related discussion : Questions about ML models & techniques for feature requests