The Scenario

At Northeast, we are currently split between Dante for sending audio over the network and NDI for sending video. However, the companies behind both these protocols are interested in taking the other’s turf, with Dante A/V adding video support and NDI Audio pushing audio-only NDI streams. As an architect, I am keenly interested in seeing if we can simplify our tech stack by standardizing on one or the other long term.

NDI 5 shipped earlier this week, and with it, NewTek has begun their push into audio. One of the things I’ve really enjoyed about NDI over Dante is that NDI can be run for free in software. I can easily throw together proofs of concept on my desk at home using machines and software that I already have. With that, I wanted to try out the new NDI audio features to see how they worked and get a feel for how they might function at our church.

The Testing

At Northeast, the two applications we use most heavily with NDI are ProPresenter and Wirecast. I wanted to simulate generating a multi-channel audio signal, capturing it on my Mac, and then streaming it via NDI over the “network” to these applications. Additionally, we occasionally have ad-hoc NDI clients, so I wanted to get a feeling for how they would perform.

I did most of my testing on an M1 MacBook Air, with the NDI streams never leaving the Mac. This gives NDI the best chance to perform without my home wifi network interfering. When I wanted to try with an external client, I used an iPad running the third party Sienna NDI Monitor app.

I used three sources in my testing: A mic plugged in to my Motu M4, Apple’s built-in Music app, and Overcast playing a podcast. Using two audio apps let me simulate having two different “singers”, while the mic gave me a clearer idea of my own latency.

I used Rogue Amoeba’s Loopback to create a virtual splitter. This would take the outputs from my mic and the two apps and then send them to my bookshelf speakers, the MacBook Air’s speakers, and NDI Audio virtual output. Further, I had the Music app sent to channels 3 and 4 of the NDI Audio output so that I could dive into testing multi-channel audio. Loopback has toggle switches for each of the three outputs, so I can easily switch between them to listen for differences in quality and latency.

I then configured Wirecast and ProPresenter to listen for the NDI stream from the NDI Audio virtual output. In Wirecast, I set up two shots: One with channels 1 and 2 of the NDI stream (so the mic and Overcast), and one with channels 3 and 4 (so Music). I put these shots in different layers.

wirecast_ndi_layers.png

In ProPresenter, I set up a new audio input with the source set to the NDI stream. ProPresenter incorrectly identified this NDI stream as having 16 channels. I manually selected channels 1 - 4 as being active, mapping the odds to left and the evens to right.

propresenter_ndi_input.png

The results

Does it actually work?

Everything works exactly as expected! Once everything was “wired” up in software, by toggling the switches on and off in Loopback, I can send the audio from the mic and the two apps to any combination of my bookshelf speakers, my laptop speakers, or the NDI virtual out. If the NDI virtual out is on, then Wirecast and ProPresenter will receive a signal. Turning on their own outputs will send that audio through my Mac’s main system out (and thus to my bookshelf speakers.)

Wirecast correctly detects that this is an audio-only feed and behaves correctly. In the shots UI, it looks exactly the same as a Dante Virtual Soundcard input. ProPresenter is not so smart. While I am able to add this feed as an “audio” feed, it still appears under the sources list as a “video” source.

Additionally, on my iPad’s NDI client, I am able to tune in to the NDI Virtual Out over wifi exactly as one would expect. I am able to stream audio to the iPad, and the Mac is none-the-wiser that there is a client on the other end. It is exactly as you would expect for NDI video, expect, well, audio.

How is the lag?

As a control, playing through both the bookshelf speakers and the laptop speakers is in perfect sync. It’s hard for me to hear that both are playing without going out of my way to listen to each set of speakers.

Unsurprisingly, with both the bookshelf speakers and NDI output turned on, there is a noticeable delay between when the bookshelves play a sound and when the NDI output plays the same sound. Likewise, without any other outputs, the delay between pressing play and hearing music from NDI is noticeable. I don’t have the tools to precisely measure it. However, I wouldn’t be surprised if the latency is in the ballpark of 10 ms. It’s very short, but it’s extant.

Much more surprising to me is that the latency is inconsistent within the Mac. Several times, I had both Wirecast and ProPresenter active at the same time. Seeing as the network traffic is never leaving the machine, I hypothesized this would have just resulted in hearing my music twice as loud as both apps output the same signal with negligible time difference. This was completely wrong.

The time difference between the two apps is enough to be both noticeable and aggravating. Once Dante Virtual Soundcard for M1 ships later this year, I want to try to same test to see if I get similar results (i.e. is this a quirk of NDI or just differences in the processing pipelines of the two apps? Neither app was really doing anything with the signal other than bouncing it to the main system out, but who knows what’s happening in the bowels of those giant applications.)

How is reliability?

But here we get to the big one. This stuff has to work, it has to work every time, and it has to work every instant. If it’s not rock solid, I can’t deploy it, full stop. Seeing as the traffic is not even leaving my machine, and my machine is a laughably overpowered M1, this should be NDI Audio’s time to shine.

Unfortunately, shine it does not. To test this, I just left an album playing in Music piped through NDI to ProPresenter to my speakers. I went about my business as it played. I didn’t do anything that would push my machine hard: Just read RSS, messaged some on Discord, and checked Twitter. Nothing any harder than an operator using a streaming or presenting machine would be doing on a Sunday morning. I would expect the signal from my speakers to be just as solid as if I was playing it normally instead of through this crazy pipeline.

Several times over the course of half an hour, the sound cut out. Just a blip, just for a a few milliseconds, but you couldn’t miss it. Enough that if it happened to me in production, I’d be getting text messages. Enough that I can’t ship it. A little blip in video is easy to look past. A little blip in audio is much more difficult.

Conclusion

I still think this is a really interesting technology and one that I want to keep my eye on. We’ve been well served by NDI video since we switched to it around a year ago. We’ve been investing more and more in to it, using it to send feeds in all directions across the church. It’s been doing a great job there. Using it for audio looks very promising as well, and the routing is all there. It just needs to be rock solid, not almost rock solid?