Hey everyone, I've started working on a little personal project at the moment. My kids both recently started up at a new school and they are both taking music classes for the first time ever. One is playing trumpet and the other clarinet; since the rest of the students have a year or two lead in practice I was hoping I could maybe build them something in Vuo to help them out a little. What I was thinking was to make them a Visual Electronic Music Tuner that would show them the note they were playing and give them visual feedback on how close they are to the note they are trying to play.

A few nodes of interest were Split Audio by Frequency, Calculate Loudness (so I can pull the dominant frequency from the microphone audio sample) and the Find Maximum nodes. I've also made an XML file with the frequencies of all the notes and octaves in the Major Scale that correspond to the MIDI nodes that VUO has (I figured it would keep things simple if I use the same numbering scheme in my lists that MIDI uses for it's notes), so there are a total of 128 notes, almost 11 full octaves, although the instruments they are playing will only span about 6 of those octaves.

What I was hoping to do what take the input microphone audio sample and find the dominant frequency and hopefully cross-reference that to nearest frequency in the XML file and pull the appropriate Note and Octave to display on the screen with an indicator of some type that shows how close they are to hitting the nearest note. It would hold the results for a set period of time or until a new sound that broke the threshold was heard. I've attached a file showing the direction I was going during testing but it's not working for some reason and it's going to be very node and cable heavy doing it this way. Is there a simpler way of doing this that I'm overlooking? Also can anyone see the flaw(s) in my composition? Any help or advice is appreciated! Thanks.


Pitch detection is an

jstrecker's picture
Submitted by

Pitch detection is an interesting and difficult problem :) The way that brains perceive sound is weird, and not always what you would expect from looking at an FFT. A sample FFT of a clarinet and a trumpet are here (p. 125-126).

Rather than trying to tackle the whole MIDI range, I'd recommend starting with an octave or less, and see how that goes. Like, if you know the note being played is somewhere within that octave, does the highest-amplitude frequency within that octave correctly detect the note?

In your composition, I'm suspicious that maybe Split Audio by Frequency doesn't know what to do with all those zeroes. It might be assuming the input will be a list of positive, increasing numbers.

Here's a weird suggestion

unicode's picture
Submitted by

Here's a weird suggestion which might yield some results:

You can use the Make Image From Channels node and some image processing to implement an auto-correlation algorithm. Here's roughly how that might work.

If the input waveform is very periodic, then that means its peaks and troughs will be spaced fairly regularly apart in time. When you feed a periodic waveform into the Make Image From Channels node, that will mean the output image will form evenly-spaced bright bars, and it's going to be very similar to a horizontally-shifted copy of itself.

The critical question is, how far horizontally shifted? Well, that may require trial and error. Try shifting the image over by some offset, say 50 pixels, cropping appropriately, and combining it with the unshifted version using a "Difference" blending mode. If the peaks and troughs of this waveform have lined up very well with its shifted copy, then they should cancel out and the output of "Difference" should be a very dark image, which you can test using the Sample Color From Image node. If they don't line up, then the output image will be brighter.

So, you can rig up a "build list" loop which tries this comparison for every offset within some range under consideration; 50 pixels, then 51, then 52, up to 100 maybe, and records the brightness of each resulting image. The darkest images indicate the offsets with the highest degree of autocorrelation. A little bit of math will transform those offset distances into wavelengths, which in turn map to audio frequencies. 50 pixels represent 50/48000 of a second, so a 50-pixel correlation is ( 48000 / (50 seconds) ) = 960 Hz.

(of course, if the autocorrelation is strong at an offset which is 1x the wavelength of the musical pitch, it will also be strong for 2x, 3x, 4x etc. Some work to detect these cases might be necessary)

I hope this exploration is fruitful! :)

Here's a little proof-of

unicode's picture
Submitted by

Here's a little proof-of-concept I whipped up, which generates a sine wave and then tries to detect its pitch! It works "sorta" well!

Since it only tests 128 different offsets over a 1-octave range (wavelengths from 127 to 255 pixels), it doesn't have very good pitch resolution, so it's probably not quite up to snuff as a musical tuner yet, but you might improve on my methods by:

  • gluing together several successive audio frames into a much longer audio frame so you can crop and compare bigger windows
  • checking much larger ranges of offsets, maybe refining your pitch estimates by checking the 2x, 3x, etc. correlations
  • other stuff i guess?

Wow that's an interesting

cwilms-loyalist's picture
Submitted by

Wow that's an interesting idea Unicode! Thanks. I just reworked your composition quickly to be connected to make use of the built-in mic on my MacBook and while it isn't displaying the correct frequency I'm playing off my tone generator it IS displaying a fairly stable reading so I think it could be tuned to display the correct reading with some calibrating!

I will admit I didn't get much farther in this project as one of my kids ended up being a close to a musical genius with the clarinet and caught up the the class in only 3-4 weeks worth of classes and my other one is doing alright with the trumpet as well, so the urgency of the project kind of fell away. It is however still an interesting idea so I may keep working on it using your idea. Thanks so much!