Kunquat Developer Blog: Performance benefits of rendering audio in larger chunks

I recently ran into performance issues in Kunquat as I was working on some new music (hopefully not too many months away from release). Large bursts of near-simultaneous notes caused load peaks high enough to cause audio drop-outs even on my high-performance laptop. I am going to explain the performance bottleneck and the way I was able to alleviate the problem.

Understanding the performance issue requires some background knowledge of the Kunquat audio rendering system. The playback logic is divided into two main systems: the sequencer and the signal processing system. The sequencer is responsible for reading event triggers (i.e. notation) from pattern data and processing the events at the right times. The signal processing system keeps track of internal states of all the processors that create or modify signal data (these states include information such as current waveform phase, sample position, filter states, etc.). The signal processing system also produces the final audio output by combining all the signal data.

In principle, the sequencer of Kunquat splits the pattern data into slices that may only contain event triggers at the very beginning:

An example of splitting notation data into four slices at locations where event triggers are found. Observe how the notes F5, B♭2 and F3 span multiple slices.

The basic process of converting each slice into audio is fairly straightforward: first process all the event triggers at the beginning of the slice, and then use the signal processing system to produce audio for the duration of the slice.

A key performance characteristic of the Kunquat audio rendering system (and just about any complex audio synthesiser) is the significant amount of overhead associated with each call of the signal processing system. Whenever we request more audio data, the system needs to run a lot of set-up code in order to calculate new signal inputs, traverse the internal data structures, and find out how to continue audio rendering after the previous call. Therefore, it takes less work to render a single large chunk of audio in one call than render the same amount of audio in several small chunks.

Now, consider what happens when we render the following section which might represent a guitar strum:

A zoomed-in representation of a guitar strum, and the way Kunquat 0.9.2 and earlier versions divide signal processing work.

In this case, rendering of each note is interrupted at locations where the sequencer needs to process the next note, causing additional overhead. Furthermore, each of these events would also interrupt processing of other notes played at the same time with different instruments, slowing audio processing down even further.

However, we can process this section much more efficiently. In Kunquat, a note playing in one channel cannot alter the state of notes playing in other channels. Therefore, it is possible to process the audio associated with each channel separately, and we can generate the audio with fewer splits:

The new approach in dividing signal processing work in Kunquat.

In practice, modifying the sequencer of Kunquat to support this more efficient slicing strategy was not straightforward. First, many event types in Kunquat notation affect more than the state of a single channel (e.g. tempo adjustments and instrument-specific commands), in which case we still need to introduce a global breakpoint in signal processing. Second, the event system of Kunquat can generate almost any event on the fly based on current environment state, which means we cannot easily preprocess the pattern data into a more convenient form. Finally, audio rendering of Kunquat is designed to operate without memory allocations, which causes a number of issues in resource management.

In the end, I believe the performance gains justify the increased complexity in the sequencer. Here is a comparison of time usage with the most demanding use case that I currently have:

Performance comparison between old and new slicing strategies (single-threaded). Around the -50 second mark is the peak utilisation of the signal processing system (over 800 calls to signal processing units during a single audio data request!).

While the comparison certainly looks impressive, the graphs are also misleading. This optimisation specifically targets the highest peaks in computation time while leaving the rest of the performance characteristics mostly unaffected. In any case, the benefits for real-time playback are clear.

In conclusion, maximising the amount of audio data you produce in a single pass is one of the most important optimisations you can make in your audio system. The impact may not be obvious in profiler output, but it is significant where performance matters the most.

Kunquat Developer Blog

Pages

2019-01-26

Performance benefits of rendering audio in larger chunks

No comments:

Post a Comment

Contributors