13 KiB
Audio Engine Architecture
This document defines the high-level architectural decisions for the real-time audio engine. The engine must coordinate real-time audio processing with file I/O and network operations without introducing timing glitches, while managing unlimited-length recordings without real-time memory allocation.
Overview
The architecture uses two threads to maintain real-time performance guarantees. The real-time thread handles all audio processing including capture, playback, mixing, and state machine updates. This thread cannot allocate memory, perform file operations, or make blocking system calls.
The I/O thread runs tokio and manages async tasks for file operations and the OSC server. These subsystems communicate with the real-time thread through lock-free ring buffers.
graph LR
RT[Real-Time Thread<br/>JACK Process<br/>Audio Processing]
IO[I/O Thread<br/>Tokio Runtime<br/>File Operations<br/>OSC Server]
IO -->|Pre-allocated Buffers| RT
IO -->|Loaded / Saved Audio| RT
RT -->|Tracks for Saving| IO
RT -->|Status Updates| IO
The system maintains audio state across three levels of organization. Individual tracks contain the actual recorded audio data organized as linked lists of pre-allocated buffers. Columns group tracks that share common timing behavior. Global state coordinates overall system behavior including volume and timing synchronization.
Buffer management employs a pre-allocated pool strategy to eliminate real-time memory allocation. Track ownership is shared between threads after buffers are written. This enables file operations without copying audio data, ensuring commands and data transfers never block the critical audio processing path.
Multi-Track Matrix Organization
The system implements a 5×5 matrix of tracks organized into columns and rows:
- Columns (5): Share synchronized timing behavior. The first recording in a column establishes the beat count for all subsequent recordings in that column.
- Rows (5): Individual tracks within each column that can be recorded and played independently.
- Total Tracks (25): Each cell in the matrix can contain one audio loop with independent state and volume.
Column Synchronization and Sync Offsets
When multiple tracks are recorded in the same column at different times, they must maintain their relative timing relationships. This is achieved through sync offsets that track where each recording started relative to the column's beat cycle.
Detailed Example: Understanding Sync Offsets
Step 1: First Recording (Row 1)
Column Beat: | 1 | 2 | 3 | 4 |
Row 1: | A | Bb| B | C |
- Buffer contains:
[A, Bb, B, C] - Sync offset: 0 (defines beat 1 for the column)
- Column beat count: 4
Step 2: Second Recording (Row 2, starting at column beat 3)
Column Beat: | 1 | 2 | 3 | 4 | 1 | 2 | 3 | 4 |
Row 1: | A | Bb| B | C | A | Bb| B | C |
Row 2: | | | Dm|Ebm| Em| Fm| | |
^recording starts
- Row 2 buffer contains:
[Dm, Ebm, Em, Fm](recording order) - Row 2 sync offset: 2 (buffer[0] corresponds to column beat 3)
Step 3: Understanding Playback Mapping
Row 2's logical relationship to column beats:
Column Beat: | 1 | 2 | 3 | 4 |
Row 2 Content: | Em| Fm| Dm|Ebm|
Buffer Index: | 2 | 3 | 0 | 1 |
Key Insight: When Row 2 plays alone, it should start with Em (the chord recorded during column beat 1), not Dm (the first recorded chord).
Synchronized Playback Scenarios
Scenario 1: Row 2 starts alone (from stopped state)
- Always starts at column beat 1 = Em
- Plays: Em → Fm → Dm → Ebm → (loop)
Scenario 2: Row 1 joins while Row 2 plays Fm
- Row 2 playing Fm = column beat 2
- Next beat = column beat 3
- Row 1 starts at beat 3 = B
- We hear: B (Row 1) + Dm (Row 2)
Scenario 3: Both stopped, both start together
- Both start at column beat 1
- Row 1 plays A, Row 2 plays Em simultaneously
Audio Buffer Management and Ownership
Audio buffers use Arc-wrapped chunks to enable safe sharing between threads without sacrificing RT performance.
The IO thread pre-allocates all Arc<AudioChunk> instances and stocks the buffer pool, moving allocation costs away from the RT thread.
During recording, the RT thread builds linked lists of these Arc-wrapped chunks. When recording completes, the RT thread clones the root Arc and sends it to the IO thread for saving while retaining the original reference for immediate playback. This allows seamless transition from recording to playback without waiting for file operations.
The RT thread can replace track data at any time by swapping the root pointer, even during active save operations. If recording over existing data, the old root reference can be retained in case the recording is cancelled, enabling undo functionality.
The IO thread optimizes data layout for performance. When loading files, it creates single long buffers rather than linked lists. After saving, it consolidates linked lists into single buffers. This means only unsaved or actively-recording tracks have linked-list overhead, while frequently-played loops benefit from optimized layout.
AudioData Enum for Sync Offset Support
To efficiently handle sync offsets while maintaining performance, audio data is represented by an enum that supports both unconsolidated (with offset) and consolidated (pre-aligned) formats:
- Unconsolidated: Contains Arc-wrapped linked list chunks with a sync offset. Used during and immediately after recording for instant playback capability.
- Consolidated: Contains a single optimized buffer where the sync offset has been applied during reordering. Used for long-term playback performance.
- Empty: No audio data present.
Consolidation Process
Workflow:
- Recording completes → Track has Unconsolidated data with sync offset
- RT thread sends Arc-wrapped chunks + sync offset to IO thread
- RT thread continues using Unconsolidated data (works fine, just slower)
- IO thread consolidates in background (reorders audio to remove offset)
- IO thread sends back Consolidated data via channel
- RT thread swaps at start of next
process()call
The consolidation process reorders the audio data so that the first sample corresponds to column beat 1, eliminating the need for offset calculations during playback. This provides a significant performance benefit for frequently-played loops while maintaining the flexibility to play immediately after recording.
System State Hierarchy:
GlobalState
├── samples_per_beat: f32
├── sample_rate: u32
├── selected_cell: (usize, usize)
├── click_track_samples: [f32]
└── columns: ColumnState[]
│
├── columns[0]
│ ├── beats: usize
│ ├── column_position: usize (current sample position in column cycle)
│ └── tracks: TrackState[]
│ │
│ ├── tracks[0]
│ │ ├── current_state: Idle | Recording | Playing | Solo
│ │ ├── next_state: Idle | Recording | Playing | Solo
│ │ ├── volume: f32
│ │ └── audio_data: AudioData
│ │ │
│ │ ├── AudioData::Unconsolidated
│ │ │ ├── chunks: Arc<AudioChunk>
│ │ │ ├── sync_offset: usize
│ │ │ └── length: usize
│ │ │
│ │ └── AudioData::Consolidated
│ │ ├── buffer: Box<[f32]>
│ │ └── length: usize
│ │
│ ├── TrackState[1] ...
│ └── TrackState[2] ...
│
├── ColumnState[1] ...
└── ColumnState[2] ...
Ring buffers
Communication between real-time and non-real-time threads requires lock-free data structures to avoid blocking the RT thread. The Rust Kanal crate fulfills this requirement and provides both synchronous and asynchronous interfaces, making it ideal for bridging real-time and async-based systems.
The RT thread uses the synchronous, non-blocking interface for immediate data transfer. The async I/O thread uses the asynchronous interface, allowing it to efficiently wait for data availability without busy-wait loops.
Beat quantization and command execution
Musical timing relies on beat-quantized state changes rather than immediate command execution. Each track maintains both current state (playing, recording, solo, idle) and next beat state. Commands update the next beat state, which becomes active at the next beat boundary.
Audio processing follows these major steps:
-
Process updated audio buffers: Store loaded and optimized audio data from consolidation.
-
Process MIDI commands: Update track volumes (immediate) and next beat states. Last command wins if multiple commands arrive for the same track during a buffer.
-
Beat detection: Check if a beat occurs during the current buffer and calculate the exact sample index.
-
Process audio:
- Process samples up to beat index
- Copy next beat state to current state for all tracks at beat boundary
- Handle saving and deleting
- Process remaining samples with new states
-
MIDI output: Send transport control and song position pointer messages.
This approach avoids per-sample state checking while maintaining beat-accurate timing. Commands arriving near a beat boundary apply at that beat, providing musically appropriate timing even if not sample-accurate.
Click track generation and routing
The click track provides audible timing reference using a pre-computed sine wave tone at beat boundaries. JACK port configuration includes a separate mono click output, allowing users to route the click independently of the main program audio. Click generation operates alongside the beat detection system, triggering the pre-computed waveform when beats occur. Click volume and enable/disable control operate through the standard command system for real-time adjustment.
Xrun detection and recovery
Audio buffer underruns and overruns (xruns) disrupt the continuous flow of audio data and can break musical timing if not handled properly. JACK provides xrun detection only through a callback mechanism that runs in a separate thread. So the RT thread must monitor for xruns and recover from them by monitoring the process scope.
Xrun recovery happens in stage 0 of audio processing, before the normal stages:
Stage 0 - Xrun Recovery:
- Check for time glitches: Detect if an xrun occurred and calculate missed sample count
- Calculate sample positions: Update sample index for every column based on missed time
- Maintain recording continuity: Add empty buffers to tracks currently recording to preserve timeline integrity
- Trigger saves: Send completed tracks to the I/O thread for saving
This approach maintains musical timing relationships across all tracks while accepting the temporal disruption. Recording tracks receive silent buffers for the missed duration, preventing timeline gaps. Playback positions advance by the missed sample count, wrapping at loop boundaries as needed.
If beats were missed during the xrun, the normal beat detection in subsequent stages will handle state transitions appropriately, since the missed time has already been accounted for in the position calculations.
Appendix: Full recording example
Let's say I first record the chords A, Bb, B, C, one chord per beat. Then I switch to row 2 and start recording minor chords Dm, Ebm, Em, Fm. The switching Rows and record start take some time, so I get this.
I press the record button. Let's call the first beat after the press beat 1.
During this beat I play the A chord
Beat 1 2 3 4
Row 1 | A | | |
Row 2 | | | |
Next beat I play Bb.
Beat 1 2 3 4
Row 1 | A | Bb | |
Row 2 | | | |
During beat 3 I play B.
Beat 1 2 3 4
Row 1 | A | Bb | B |
Row 2 | | | |
Finally I play C and press the record button again.
Beat 1 2 3 4
Row 1 | A | Bb | B | C
Row 2 | | | |
This leaves us with the first track recorded and looping.
I now switch to Row 2 and when playback is in beat 2 I start recording again.
Actual recording only starts at beat 3 because of the beat sync.
I play the Dm chord here.
Beat 1 2 3 4
Row 1 | A | Bb | B | C
Row 2 | | | Dm |
Next I play the Ebm chord.
Beat 1 2 3 4
Row 1 | A | Bb | B | C
Row 2 | | | Dm | Ebm
The column wraps around and starts at beat 1.
During this beat I play the Em chord.
Beat 1 2 3 4
Row 1 | A | Bb | B | C
Row 2 | Em | | Dm | Ebm
Finally I play Fm.
I don't need to press the record button to stop because the track length is already set.
Beat 1 2 3 4
Row 1 | A | Bb | B | C
Row 2 | Em | Fm | Dm | Ebm
After a while I stop playback for both tracks.
A while later again, I start Row 2 playback. It should start with the Em chord because it was recorded during beat 1.
It continues playing Fm. If I start playback of Row 1 while Fm is playing, it will start on beat 3 and I should hear B and Dm.
Say I stop both tracks again and start recording Row 3. I play the G chord first.
This G chord is recorded in beat 1, independent of other tracks, assuming no tracks in this column where playing. But is will stop after 4 beats because the track length is set to 4 beats by the very first recording.