AES 2026 AVARIG Conference: Full Schedule

Schedule as of May 2026 - subject to change

Default Time Zone is EDT - Eastern Daylight Time

arrow_back View All Dates

10:00am CEST

(P) Complex Ratio Mask Ambisonics-to-Binaural Rendering with Intensity Vector Features and Perceptual Multi-Objective Optimization

Wednesday July 1, 2026 10:00am - 12:30pm CEST

Jussieu:Room 1

This paper presents a neural network architecture for binaural rendering of first-order Ambisonics (FOA) signals, enabling headphone listeners to perceive immersive spatial audio from Ambisonic content without requiring individualized Head-Related Transfer Function measurements at the inference time. The model operates in the STFT domain using Complex Ratio Masks (CRM). Unlike magnitude-mask methods that process only the omnidirectional channel and discard phase, the proposed model predicts a shared CRM pair (left and right ear) applied via complex-valued multiplication to all four FOA channels with directional weighting. The omnidirectional channel W contributes at unit weight while directional channels Y, Z, X are weighted at a reduced level, preserving both magnitude and phase information from the full soundfield. The input representation extends standard spectral features with three intensity vector channels that encode sound arrival direction at each time-frequency bin, providing the network with explicit spatial information alongside magnitude and phase cues. Training uses a multi-objective loss that combines waveform-level accuracy (SI-SDR), multi-resolution spectral reconstruction at three complementary time-frequency scales, and interaural level and phase difference terms to jointly optimize signal fidelity and spatial cue preservation. The encoder-decoder backbone is a four-level UNet with residual convolutional blocks and channel-spatial attention at every level, totaling approximately four million parameters. Evaluation against a prior magnitude-masking architecture with 28 million parameters shows that the CRM variant achieves comparable spatial cue preservation with a seven-fold parameter reduction while gaining access to phase information. Processing the signal in a single STFT-domain forward pass avoids the sequential inference of autoregressive time-domain models, yielding computational efficiency suitable for real-time virtual reality deployment

Speakers

Szymon Zaporowski

Teaching and Research Assistant, Gdańsk University of Technology

Researcher at the Department of Multimedia Systems, Gdańsk University of Technology, with a focus on audio machine learning, psychoacoustics, signal processing, automatic speech recognition, and deepfake audio detection. His work sits at the intersection of immersive audio and AI... Read More →

Bartłomiej Mróz

Wednesday July 1, 2026 10:00am - 12:30pm CEST
Jussieu:Room 1 4, place Jussieu Paris 5e

HOA, Poster

10:00am CEST

(P) Perceptual Limits of Ambisonic Order for Beamforming in Complex Acoustic Scenes

Wednesday July 1, 2026 10:00am - 12:30pm CEST

Jussieu:Room 1

This study investigates the perceptually sufficient ambisonic order for beamforming in complex acoustic scenes, defined as the minimum spatial resolution above which no audible improvement is perceived. Two beamforming methods were evaluated: hypercardioid and MVDR beamforming. In contrast to previous studies, the case of an ideal microphone array was considered, in order to the evaluate the beamforming methods independently of ambisonic encoding error. Sound scenes were generated using room acoustic simulations and encoded into ambisonic signals. A perceptual evaluation was conducted using a three-interval/two-alternative forced choice (3I/2AFC) test design with an adaptive procedure. The experiment used a production-constrained reference (7th-order) and a high-order reference (19th-order). Results showed that the required order would depend on the beamforming method and characteristics of the sound scene. Diffuseness profiles can be used to analyze the influence of the ambisonic order on the sound field diffuseness and to evaluate whether the directional information available is sufficient to support effective adaptive beamforming.

Speakers

AES 2026 AVARIG Conference

10:00am CEST

Szymon Zaporowski

Bartłomiej Mróz

10:00am CEST

Francois Salmon

Julian Palacino

Charles Verron

Mathieu Paquier

Get help with the event