Serious performance drop in macOS 10.12

Vuo 1.2.4 composition CPU load under 10.11

Vuo 1.2.4 composition CPU load under 10.12

 

Vuo 1.2.4 performance within the composition under OS 10.11

Vuo 1.2.4 performance within the composition under OS 10.12

 

Looking at the numbers on the Build, Sort, and Process Lists nodes (either half or quarter of OS 10.11), maybe it as something to do with Vuo not seeing adequately the CPU cores and threads?

@Kewl, I appreciate your taking the time to investigate the problem.

Some of our team have been running Vuo frequently on macOS 10.12 and haven’t noticed any blatant performance drops like the one you’re experiencing. It could possibly be related to the combination of 10.12 and your GPU. What GPU do you have?

The slowdown with Build List and Process List may be due to whatever is happening within their feedback loop, rather than those nodes themselves. Can you identify any particular part of the composition that is slow (i.e., removing it from the composition significantly improves performance)?

I have seen the slow down on these Mac models:

http://www.everymac.com/systems/apple/macbook_pro/specs/macbook-pro-core-i7-2.3-15-dual-graphics-late-2013-retina-display-specs.html

http://www.everymac.com/systems/apple/mac_pro/specs/mac-pro-quad-core-3.7-xeon-e5-gray-black-cylinder-late-2013-specs.html

http://www.everymac.com/systems/apple/mac_pro/specs/mac-pro-eight-core-3.0-xeon-e5-gray-black-cylinder-late-2013-specs.html

As for the GPU, on the MacBook Pro, it’s the NVIDIA GeForce GT 750M, and on the Mac Pros, it’s the AMD FirePro D300. The screenshots I posted are from the 2013 Mac Pro 8-core 3.0 GHz.

In my problematic composition, there are two sections where the performance is cut in half when I use OS 10.12.

The 1st section is circumscribed by a Build List: going into that section, the performance is at 25 events/sec, going out is between 12 and 16 events/sec. Following the 1st section, the 2nd section is circumscribed by a Process List: going into that section, the performance is between 12 and 16 events/sec, going out is between 6 and 8 events/sec.

What is similar in these two sections:

  • 1st section has two Calculate nodes, one Average and one Make 4D point;
  • 2nd section has six Calculate nodes and one Make 2D point.

Any chance it’s just affecting a specific node and that’s slowing down the whole composition?

My hunch is that it’s a specific node type.

We were able to reproduce the performance drop in this composition in 10.12 as compared to 10.11. (Thanks for emailing it, @Kewl.)

After experimenting with different variations on the composition, and taking time profiles to see how much time was spent in each part of the code, we found that the problem is not specific to any any node or node type. It has to do with low-level code that is used to synchronize different parts of the composition as they run in parallel. A data structure called a semaphore, which is part of Apple’s Grand Central Dispatch library, is now taking about 3x longer to do its job in 10.12. We’ll need to improve our code to avoid relying on semaphores so much.

Thanks for looking into it!

Since the semaphore is now taking three times as much in 10.12, maybe that could be considered a bug and reported as such to Apple?

Ha. In further testing, we found that the dispatch semaphore was 12x to 27x slower on 10.12 than on 10.11 when under heavy usage by multiple threads. We have reported the bug to Apple (rdar://29473993).

Not knowing when or if they’ll fix it, we went ahead and switched to an alternative synchronization thingy called an atomic spinlock in the parts of the code that were most affected by the slowdown. This fixes the slowdown on 10.12 and also improves performance in 10.11. To be included in Vuo 1.2.5.

2 Likes

Wow! Good news! Thanks.

Version 1.2.5, oh, Gaaaawwwwd, thank you.

Is the performance similar now on macOS 10.11 and macOS 10.12, or is 10.11 still a bit better?

:) They’re now similar.

So I’m guessing you have eliminated any dependency on Apple’s Grand Central Dispatch library?

I created a 10.11 partition on my MacBook Pro just to run Vuo, but if the performance is the same with 10.12, I’ll ditch the 10.11 partition…

No, we haven’t eliminated the dependency on Grand Central Dispatch. The other parts of it besides semaphores (dispatch queues, dispatch groups, etc.) are still working fine and are quite useful. Semaphores are sometimes OK, too; an occasionally-used one doesn’t make any noticeable difference. The noticeable slowdown is when you have a frequently-used semaphore that is under heavy usage by multiple threads. That is what we fixed in Vuo 1.2.5.

In any case, I’m glad that my composition highlighted the problem in such an obvious manner. And good timing on the update since I’m doing a presentation on my A/V work on Thursday. Cheers!

1 Like

Has the dispatch semaphore bug reared up its ugly head again?

I have a composition where I add these nodes: Adjust Image Colors, Blur Image, and Apply Mask. Without these nodes, the composition uses around 300% CPU and there’s no dropped events. When I insert the three nodes, CPU load drops to about 100%, 150%, but now with a lot of dropped events.

Without the three nodes:


test without mask blur 2.png
test without mask blur 3.png

With the three nodes:



test mask blur 2.png
test mask blur 3.png  

That sounds like a different issue. Since the nodes cause CPU load to go down and events to be dropped, the bottleneck now is probably the GPU.

You could try the image filters one at a time to see which one is affecting performance the most. I’d guess the Blur with radius 16. If so, you could use Resize Image to reduce the amount of work that Blur has to do.

OK, thanks.