Loading…
Schedule as of May 2026 - subject to change

Default Time Zone is EDT - Eastern Daylight Time


Type: HOA clear filter
arrow_back View All Dates
Wednesday, July 1
 

10:00am CEST

(P) Complex Ratio Mask Ambisonics-to-Binaural Rendering with Intensity Vector Features and Perceptual Multi-Objective Optimization
Wednesday July 1, 2026 10:00am - 12:30pm CEST
This paper presents a neural network architecture for binaural rendering of first-order Ambisonics (FOA) signals, enabling headphone listeners to perceive immersive spatial audio from Ambisonic content without requiring individualized Head-Related Transfer Function measurements at the inference time. The model operates in the STFT domain using Complex Ratio Masks (CRM). Unlike magnitude-mask methods that process only the omnidirectional channel and discard phase, the proposed model predicts a shared CRM pair (left and right ear) applied via complex-valued multiplication to all four FOA channels with directional weighting. The omnidirectional channel W contributes at unit weight while directional channels Y, Z, X are weighted at a reduced level, preserving both magnitude and phase information from the full soundfield. The input representation extends standard spectral features with three intensity vector channels that encode sound arrival direction at each time-frequency bin, providing the network with explicit spatial information alongside magnitude and phase cues. Training uses a multi-objective loss that combines waveform-level accuracy (SI-SDR), multi-resolution spectral reconstruction at three complementary time-frequency scales, and interaural level and phase difference terms to jointly optimize signal fidelity and spatial cue preservation. The encoder-decoder backbone is a four-level UNet with residual convolutional blocks and channel-spatial attention at every level, totaling approximately four million parameters. Evaluation against a prior magnitude-masking architecture with 28 million parameters shows that the CRM variant achieves comparable spatial cue preservation with a seven-fold parameter reduction while gaining access to phase information. Processing the signal in a single STFT-domain forward pass avoids the sequential inference of autoregressive time-domain models, yielding computational efficiency suitable for real-time virtual reality deployment
Speakers
avatar for Szymon Zaporowski

Szymon Zaporowski

Teaching and Research Assistant, Gdańsk University of Technology
Researcher at the Department of Multimedia Systems, Gdańsk University of Technology, with a focus on audio machine learning, psychoacoustics, signal processing, automatic speech recognition, and deepfake audio detection. His work sits at the intersection of immersive audio and AI... Read More →
Wednesday July 1, 2026 10:00am - 12:30pm CEST
Jussieu:Room 1 4, place Jussieu Paris 5e

10:00am CEST

(P) Perceptual Limits of Ambisonic Order for Beamforming in Complex Acoustic Scenes
Wednesday July 1, 2026 10:00am - 12:30pm CEST
This study investigates the perceptually sufficient ambisonic order for beamforming in complex acoustic scenes, defined as the minimum spatial resolution above which no audible improvement is perceived. Two beamforming methods were evaluated: hypercardioid and MVDR beamforming. In contrast to previous studies, the case of an ideal microphone array was considered, in order to the evaluate the beamforming methods independently of ambisonic encoding error. Sound scenes were generated using room acoustic simulations and encoded into ambisonic signals. A perceptual evaluation was conducted using a three-interval/two-alternative forced choice (3I/2AFC) test design with an adaptive procedure. The experiment used a production-constrained reference (7th-order) and a high-order reference (19th-order). Results showed that the required order would depend on the beamforming method and characteristics of the sound scene. Diffuseness profiles can be used to analyze the influence of the ambisonic order on the sound field diffuseness and to evaluate whether the directional information available is sufficient to support effective adaptive beamforming.
Wednesday July 1, 2026 10:00am - 12:30pm CEST
Jussieu:Room 1 4, place Jussieu Paris 5e

2:30pm CEST

Understanding Ambisonics Through Practical Decisions and Listening
Wednesday July 1, 2026 2:30pm - 3:30pm CEST
Ambisonics is a scene-based spatial audio format that has been around since the 1970s. In recent years its popularity has increased, with inclusion in game engines (such as Unity and Unreal) and distribution standards (like ADM or IAMF). Despite this, many practitioners view Ambisonics as being complex, mathematical, or academic. This tutorial explains Ambisonics through the lens of practical decision-making. Instead of equations, it covers the choices audio professionals are required to make when working on a project, with a particular emphasis on the audible consequences of those choices. The tutorial enables attendees to develop an intuitive understanding of Ambisonics through explanations of theory, combined with listening examples and workflow demonstrations. The topics covered in this tutorial are: • Fundamentals: What Ambisonics is and how it differs from channel-based and object-based formats, and why it is well suited to VR, AR, and game audio. • Encoding: How audio sources are converted to Ambisonics, and how to choose the Ambisonic order based on perceptual and computational trade-offs, as well as delivery constraints. • Conventions: Common channel ordering (ACN vs FuMa) and gain normalisation (SN3D vs N3D) conventions, and what happens when things get mismatched. • Processing: What kinds of effects can be used on Ambisonic signals while preserving the spatial integrity. • Decoding and binaural rendering: How Ambisonic signals are converted to loudspeaker or binaural signals. The impact of head-tracking and HRTF selection on the binaural rendering. • Mixed-order projects: What the options are when working with mixed-order sources, and the audible artefacts that can arise. The tutorial will provide brief practical demonstrations of setting up an Ambisonics project in Pro Tools and Reaper, two widely used DAWs for immersive audio. By the end of the tutorial attendees will have a practical understanding of the main concepts of Ambisonics, as well as knowing how the practical choices they make will impact the final audio. They will also be familiar with the main workflow pitfalls and how to avoid them. The tutorial assumes familiarity with general audio production concepts (DAW use, signal routing, mixing). However, no prior experience with Ambisonics or spatial audio formats is required. It is suitable for sound designers, composers, and audio engineers working in or interested in immersive media.
Speakers
Wednesday July 1, 2026 2:30pm - 3:30pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e
 
Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
Filtered by Date -