7.9 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	Audio Engine Architecture
This document defines the high-level architectural decisions for the real-time audio engine. The engine must coordinate real-time audio processing with file I/O and network operations without introducing timing glitches, while managing unlimited-length recordings without real-time memory allocation.
Overview
The architecture uses two threads to maintain real-time performance guarantees. The real-time thread handles all audio processing including capture, playback, mixing, and state machine updates. This thread cannot allocate memory, perform file operations, or make blocking system calls.
The I/O thread runs tokio and manages async tasks for file operations and the OSC server. These subsystems communicate with the real-time thread through lock-free ring buffers.
graph LR
    RT[Real-Time Thread<br/>JACK Process<br/>Audio Processing] 
    IO[I/O Thread<br/>Tokio Runtime<br/>File Operations<br/>OSC Server]
    
    IO -->|Pre-allocated Buffers| RT
    IO -->|Loaded / Saved Audio| RT
    RT -->|Tracks for Saving| IO
    RT -->|Status Updates| IO
The system maintains audio state across three levels of organization. Individual tracks contain the actual recorded audio data organized as linked lists of pre-allocated buffers. Columns group tracks that share common timing behavior. Global state coordinates overall system behavior including volume and timing synchronization.
Buffer management employs a pre-allocated pool strategy to eliminate real-time memory allocation. Track ownership is shared between threads after buffers are written. This enables file operations without copying audio data, ensuring commands and data transfers never block the critical audio processing path.
Audio Buffer Management and Ownership
Audio buffers use Arc-wrapped chunks to enable safe sharing between threads without sacrificing RT performance.
The IO thread pre-allocates all Arc<AudioChunk> instances and stocks the buffer pool, moving allocation costs away from the RT thread.
During recording, the RT thread builds linked lists of these Arc-wrapped chunks. When recording completes, the RT thread clones the root Arc and sends it to the IO thread for saving while retaining the original reference for immediate playback. This allows seamless transition from recording to playback without waiting for file operations.
The RT thread can replace track data at any time by swapping the root pointer, even during active save operations. If recording over existing data, the old root reference can be retained in case the recording is cancelled, enabling undo functionality.
The IO thread optimizes data layout for performance. When loading files, it creates single long buffers rather than linked lists. After saving, it consolidates linked lists into single buffers. This means only unsaved or actively-recording tracks have linked-list overhead, while frequently-played loops benefit from optimized layout.
Arc destruction happens in the IO thread via the Tracks channel as a Delete message, keeping deallocation costs out of the RT thread. The RT thread's buffer pool operations involve only taking and returning Arc references, with minimal atomic operations.
System State Hierarchy:
GlobalState
├── samples_per_beat: f32
├── sample_rate: u32  
├── selected_cell: (usize, usize)  
├── click_track_samples: [f32]
└── columns: ColumnState[]
    │
    ├── columns[0]
    │   ├── beats: usize
    │   └── tracks: TrackState[]
    │       │
    │       ├── tracks[0]
    │       │   ├── current_state: Idle | Recording | Playing | Solo
    │       │   ├── next_state: Idle | Recording | Playing | Solo
    │       │   ├── volume: f32
    │       │   └── audio: Option<Arc<AudioChunk>>
    │       │       │
    │       │       └── AudioChunk
    │       │           ├── samples: [f32]
    │       │           ├── sample_count: usize
    │       │           └── next: Option<Arc<AudioChunk>>
    │       │               │
    │       │               └── AudioChunk (next in linked list)
    │       │
    │       ├── TrackState[1] ...
    │       └── TrackState[2] ...
    │
    ├── ColumnState[1] ...
    └── ColumnState[2] ...
Ring buffers
Communication between real-time and non-real-time threads requires lock-free data structures to avoid blocking the RT thread. The Rust Kanal crate fulfills this requirement and provides both synchronous and asynchronous interfaces, making it ideal for bridging real-time and async-based systems.
The RT thread uses the synchronous, non-blocking interface for immediate data transfer. The async I/O thread uses the asynchronous interface, allowing it to efficiently wait for data availability without busy-wait loops.
Beat quantization and command execution
Musical timing relies on beat-quantized state changes rather than immediate command execution. Each track maintains both current state (playing, recording, solo, idle) and next beat state. Commands update the next beat state, which becomes active at the next beat boundary.
Audio processing follows these major steps:
- 
Process updated audio buffers: Store loaded and optimized audio data. 
- 
Process MIDI commands: Update track volumes (immediate) and next beat states. Last command wins if multiple commands arrive for the same track during a buffer. 
- 
Beat detection: Check if a beat occurs during the current buffer and calculate the exact sample index. 
- 
Process audio: - Process samples up to beat index
- Copy next beat state to current state for all tracks at beat boundary
- Handle saving and deleting
- Process remaining samples with new states
 
- 
MIDI output: Send transport control and song position pointer messages. 
This approach avoids per-sample state checking while maintaining beat-accurate timing. Commands arriving near a beat boundary apply at that beat, providing musically appropriate timing even if not sample-accurate.
Click track generation and routing
The click track provides audible timing reference using a pre-computed sine wave tone at beat boundaries. JACK port configuration includes a separate mono click output, allowing users to route the click independently of the main program audio. Click generation operates alongside the beat detection system, triggering the pre-computed waveform when beats occur. Click volume and enable/disable control operate through the standard command system for real-time adjustment.
Xrun detection and recovery
Audio buffer underruns and overruns (xruns) disrupt the continuous flow of audio data and can break musical timing if not handled properly. JACK provides xrun detection only through a callback mechanism that runs in a separate thread. So the RT thread must monitor for xruns and recover from them by monitoring the process scope.
Xrun recovery happens in stage 0 of audio processing, before the normal stages:
Stage 0 - Xrun Recovery:
- Check for time glitches: Detect if an xrun occurred and calculate missed sample count
- Calculate sample positions: Update sample index for every column based on missed time
- Maintain recording continuity: Add empty buffers to tracks currently recording to preserve timeline integrity
- Trigger saves: Send completed tracks to the I/O thread for saving
This approach maintains musical timing relationships across all tracks while accepting the temporal disruption. Recording tracks receive silent buffers for the missed duration, preventing timeline gaps. Playback positions advance by the missed sample count, wrapping at loop boundaries as needed.
If beats were missed during the xrun, the normal beat detection in subsequent stages will handle state transitions appropriately, since the missed time has already been accounted for in the position calculations.