160 lines
7.9 KiB
Markdown
160 lines
7.9 KiB
Markdown
# Audio Engine Architecture
|
|
|
|
This document defines the high-level architectural decisions for the real-time audio engine.
|
|
The engine must coordinate real-time audio processing with file I/O and network operations without introducing timing glitches,
|
|
while managing unlimited-length recordings without real-time memory allocation.
|
|
|
|
## Overview
|
|
|
|
The architecture uses two threads to maintain real-time performance guarantees.
|
|
The real-time thread handles all audio processing including capture, playback, mixing, and state machine updates.
|
|
This thread cannot allocate memory, perform file operations, or make blocking system calls.
|
|
|
|
The I/O thread runs tokio and manages async tasks for file operations and the OSC server.
|
|
These subsystems communicate with the real-time thread through lock-free ring buffers.
|
|
|
|
```mermaid
|
|
graph LR
|
|
RT[Real-Time Thread<br/>JACK Process<br/>Audio Processing]
|
|
IO[I/O Thread<br/>Tokio Runtime<br/>File Operations<br/>OSC Server]
|
|
|
|
IO -->|Pre-allocated Buffers| RT
|
|
IO -->|Loaded / Saved Audio| RT
|
|
|
|
RT -->|Tracks for Saving| IO
|
|
RT -->|Status Updates| IO
|
|
```
|
|
|
|
The system maintains audio state across three levels of organization.
|
|
Individual tracks contain the actual recorded audio data organized as linked lists of pre-allocated buffers.
|
|
Columns group tracks that share common timing behavior.
|
|
Global state coordinates overall system behavior including volume and timing synchronization.
|
|
|
|
Buffer management employs a pre-allocated pool strategy to eliminate real-time memory allocation.
|
|
Track ownership is shared between threads after buffers are written.
|
|
This enables file operations without copying audio data,
|
|
ensuring commands and data transfers never block the critical audio processing path.
|
|
|
|
## Audio Buffer Management and Ownership
|
|
|
|
Audio buffers use Arc-wrapped chunks to enable safe sharing between threads without sacrificing RT performance.
|
|
The IO thread pre-allocates all `Arc<AudioChunk>` instances and stocks the buffer pool, moving allocation costs away from the RT thread.
|
|
|
|
During recording, the RT thread builds linked lists of these Arc-wrapped chunks.
|
|
When recording completes, the RT thread clones the root Arc and sends it to the IO thread for saving while retaining the original reference for immediate playback.
|
|
This allows seamless transition from recording to playback without waiting for file operations.
|
|
|
|
The RT thread can replace track data at any time by swapping the root pointer, even during active save operations.
|
|
If recording over existing data, the old root reference can be retained in case the recording is cancelled, enabling undo functionality.
|
|
|
|
The IO thread optimizes data layout for performance.
|
|
When loading files, it creates single long buffers rather than linked lists.
|
|
After saving, it consolidates linked lists into single buffers.
|
|
This means only unsaved or actively-recording tracks have linked-list overhead,
|
|
while frequently-played loops benefit from optimized layout.
|
|
|
|
Arc destruction happens in the IO thread via the Tracks channel as a Delete message,
|
|
keeping deallocation costs out of the RT thread.
|
|
The RT thread's buffer pool operations involve only taking and returning Arc references, with minimal atomic operations.
|
|
|
|
```
|
|
System State Hierarchy:
|
|
|
|
GlobalState
|
|
├── samples_per_beat: f32
|
|
├── sample_rate: u32
|
|
├── selected_cell: (usize, usize)
|
|
├── click_track_samples: [f32]
|
|
└── columns: ColumnState[]
|
|
│
|
|
├── columns[0]
|
|
│ ├── beats: usize
|
|
│ └── tracks: TrackState[]
|
|
│ │
|
|
│ ├── tracks[0]
|
|
│ │ ├── current_state: Idle | Recording | Playing | Solo
|
|
│ │ ├── next_state: Idle | Recording | Playing | Solo
|
|
│ │ ├── volume: f32
|
|
│ │ └── audio: Option<Arc<AudioChunk>>
|
|
│ │ │
|
|
│ │ └── AudioChunk
|
|
│ │ ├── samples: [f32]
|
|
│ │ ├── sample_count: usize
|
|
│ │ └── next: Option<Arc<AudioChunk>>
|
|
│ │ │
|
|
│ │ └── AudioChunk (next in linked list)
|
|
│ │
|
|
│ ├── TrackState[1] ...
|
|
│ └── TrackState[2] ...
|
|
│
|
|
├── ColumnState[1] ...
|
|
└── ColumnState[2] ...
|
|
```
|
|
|
|
## Ring buffers
|
|
|
|
Communication between real-time and non-real-time threads requires lock-free data structures to avoid blocking the RT thread.
|
|
Tokio bounded channels fulfill this requirement and provide both synchronous and asynchronous interfaces,
|
|
making it ideal for bridging real-time and async-based systems.
|
|
|
|
The RT thread uses the synchronous, non-blocking interface for immediate data transfer.
|
|
The async I/O thread uses the asynchronous interface,
|
|
allowing it to efficiently wait for data availability without busy-wait loops.
|
|
|
|
## Beat quantization and command execution
|
|
|
|
Musical timing relies on beat-quantized state changes rather than immediate command execution.
|
|
Each track maintains both current state (playing, recording, solo, idle) and next beat state.
|
|
Commands update the next beat state, which becomes active at the next beat boundary.
|
|
|
|
Audio processing follows these major steps:
|
|
|
|
- **Process updated audio buffers**: Store loaded and optimized audio data.
|
|
|
|
- **Process MIDI commands**: Update track volumes (immediate) and next beat states.
|
|
Last command wins if multiple commands arrive for the same track during a buffer.
|
|
|
|
- **Beat detection**: Check if a beat occurs during the current buffer and calculate the exact sample index.
|
|
|
|
- **Process audio**:
|
|
- Process samples up to beat index
|
|
- Copy next beat state to current state for all tracks at beat boundary
|
|
- Handle saving and deleting
|
|
- Process remaining samples with new states
|
|
|
|
- **MIDI output**: Send transport control and song position pointer messages.
|
|
|
|
This approach avoids per-sample state checking while maintaining beat-accurate timing.
|
|
Commands arriving near a beat boundary apply at that beat,
|
|
providing musically appropriate timing even if not sample-accurate.
|
|
|
|
## Click track generation and routing
|
|
|
|
The click track provides audible timing reference using a pre-computed sine wave tone at beat boundaries.
|
|
JACK port configuration includes a separate mono click output,
|
|
allowing users to route the click independently of the main program audio.
|
|
Click generation operates alongside the beat detection system,
|
|
triggering the pre-computed waveform when beats occur.
|
|
Click volume and enable/disable control operate through the standard command system for real-time adjustment.
|
|
|
|
### Xrun detection and recovery
|
|
|
|
Audio buffer underruns and overruns (xruns) disrupt the continuous flow of audio data and can break musical timing if not handled properly.
|
|
JACK provides xrun detection only through a callback mechanism that runs in a separate thread.
|
|
So the RT thread must monitor for xruns and recover from them by monitoring the process scope.
|
|
|
|
Xrun recovery happens in stage 0 of audio processing, before the normal stages:
|
|
|
|
**Stage 0 - Xrun Recovery:**
|
|
- **Check for time glitches**: Detect if an xrun occurred and calculate missed sample count
|
|
- **Calculate sample positions**: Update sample index for every column based on missed time
|
|
- **Maintain recording continuity**: Add empty buffers to tracks currently recording to preserve timeline integrity
|
|
- **Trigger saves**: Send completed tracks to the I/O thread for saving
|
|
|
|
This approach maintains musical timing relationships across all tracks while accepting the temporal disruption.
|
|
Recording tracks receive silent buffers for the missed duration, preventing timeline gaps.
|
|
Playback positions advance by the missed sample count, wrapping at loop boundaries as needed.
|
|
|
|
If beats were missed during the xrun,
|
|
the normal beat detection in subsequent stages will handle state transitions appropriately,
|
|
since the missed time has already been accounted for in the position calculations. |