Loading…
Schedule as of May 2026 - subject to change

Default Time Zone is EDT - Eastern Daylight Time


Type: HOA clear filter
Wednesday, July 1
 

10:00am CEST

(P) Complex Ratio Mask Ambisonics-to-Binaural Rendering with Intensity Vector Features and Perceptual Multi-Objective Optimization
Wednesday July 1, 2026 10:00am - 12:30pm CEST
This paper presents a neural network architecture for binaural rendering of first-order Ambisonics (FOA) signals, enabling headphone listeners to perceive immersive spatial audio from Ambisonic content without requiring individualized Head-Related Transfer Function measurements at the inference time. The model operates in the STFT domain using Complex Ratio Masks (CRM). Unlike magnitude-mask methods that process only the omnidirectional channel and discard phase, the proposed model predicts a shared CRM pair (left and right ear) applied via complex-valued multiplication to all four FOA channels with directional weighting. The omnidirectional channel W contributes at unit weight while directional channels Y, Z, X are weighted at a reduced level, preserving both magnitude and phase information from the full soundfield. The input representation extends standard spectral features with three intensity vector channels that encode sound arrival direction at each time-frequency bin, providing the network with explicit spatial information alongside magnitude and phase cues. Training uses a multi-objective loss that combines waveform-level accuracy (SI-SDR), multi-resolution spectral reconstruction at three complementary time-frequency scales, and interaural level and phase difference terms to jointly optimize signal fidelity and spatial cue preservation. The encoder-decoder backbone is a four-level UNet with residual convolutional blocks and channel-spatial attention at every level, totaling approximately four million parameters. Evaluation against a prior magnitude-masking architecture with 28 million parameters shows that the CRM variant achieves comparable spatial cue preservation with a seven-fold parameter reduction while gaining access to phase information. Processing the signal in a single STFT-domain forward pass avoids the sequential inference of autoregressive time-domain models, yielding computational efficiency suitable for real-time virtual reality deployment
Speakers
avatar for Szymon Zaporowski

Szymon Zaporowski

Teaching and Research Assistant, Gdańsk University of Technology
Researcher at the Department of Multimedia Systems, Gdańsk University of Technology, with a focus on audio machine learning, psychoacoustics, signal processing, automatic speech recognition, and deepfake audio detection. His work sits at the intersection of immersive audio and AI... Read More →
Wednesday July 1, 2026 10:00am - 12:30pm CEST
Jussieu:Room 1 4, place Jussieu Paris 5e

10:00am CEST

(P) Perceptual Limits of Ambisonic Order for Beamforming in Complex Acoustic Scenes
Wednesday July 1, 2026 10:00am - 12:30pm CEST
This study investigates the perceptually sufficient ambisonic order for beamforming in complex acoustic scenes, defined as the minimum spatial resolution above which no audible improvement is perceived. Two beamforming methods were evaluated: hypercardioid and MVDR beamforming. In contrast to previous studies, the case of an ideal microphone array was considered, in order to the evaluate the beamforming methods independently of ambisonic encoding error. Sound scenes were generated using room acoustic simulations and encoded into ambisonic signals. A perceptual evaluation was conducted using a three-interval/two-alternative forced choice (3I/2AFC) test design with an adaptive procedure. The experiment used a production-constrained reference (7th-order) and a high-order reference (19th-order). Results showed that the required order would depend on the beamforming method and characteristics of the sound scene. Diffuseness profiles can be used to analyze the influence of the ambisonic order on the sound field diffuseness and to evaluate whether the directional information available is sufficient to support effective adaptive beamforming.
Wednesday July 1, 2026 10:00am - 12:30pm CEST
Jussieu:Room 1 4, place Jussieu Paris 5e
 
Friday, July 3
 

11:00am CEST

(P) Ambimix: a Scene-Based Approach to Interactive Mixing
Friday July 3, 2026 11:00am - 12:30pm CEST
Channel-based mixing has long been the standard paradigm for audio professionals in both studio and live performance contexts, owing to its intuitive, signal-oriented workflow. While this approach excels in conventional stereo and multichannel formats, it offers limited native support for advanced spatial applications. As immersive audio formats become increasingly popular for virtual and augmented reality, new mechanisms are needed to allow engineers to work directly within spatial domains rather than adapting channel-based setups to fit immersive content. This paper presents an interactive software system for mixing First-Order Ambisonic (FOA) soundfields in real time. The system accepts A-format recordings from tetrahedral microphones, including the Sennheiser Ambeo and the Core Sound TetraMic. It converts them to B-format using a directional beamforming approach, in which the individual dimensions (left, right, front, back, top, bottom) are independently crossfaded between sources. Each directional beam is extracted via a dot product with a corresponding spherical harmonic steering vector, crossfaded between the two input soundfields, and reencoded into a new B-format using an outer product reconstruction. The program architecture is designed to accommodate future extensions to additional Ambisonic microphone formats, including the Røde SoundField, the Zylia ZM-1, and the Eigenmike. This positions the platform toward a microphone-agnostic encoding and mixing platform. By parameterizing the encoding stage through a gain-factored matrix, the directional steering vectors can be reconfigured to reflect the capsule geometry of any tetrahedral target microphone array. Future iterations of the program can extend the beamforming framework beyond the first-order WXYZ dimensions, enabling the integration of higher-order ambisonic encoding to improve spatial resolution and directional accuracy. The primary contribution of this program is a streamlined interface for ambisonic processing, designed to make scene-based mixing accessible to engineers and sound designers working in immersive audio production.
Friday July 3, 2026 11:00am - 12:30pm CEST
IRCAM:ESPRO (HOA) 1, place Igor Stravinsky Paris 4e
 
Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.