AES 2026 AVARIG Conference: Full Schedule

Schedule as of May 2026 - subject to change

Default Time Zone is EDT - Eastern Daylight Time

11:30am CEST

Thursday July 2, 2026 11:30am - 12:30pm CEST

The tools for building social capital in any career are based on networking, mentorship, and role models. For immersive audio, underrepresented groups are upskilling, teaching others, and innovating in order to pursue their ambitions. Dr. Leslie Gaston-Bird talks about her initiative "Immersive and Inclusive Audio", which has been running for over five years, and how the Pro Tools | Dolby Atmos Certification plays a role in the efforts of women and minorities to "leak up, not out" of the immersive audio career pipeline.

Speakers

Leslie Gaston-Bird

Thursday July 2, 2026 11:30am - 12:30pm CEST
IRCAM:Stravinsky 1, place Igor Stravinsky Paris 4e

Immersive audio, Workshop

1:30pm CEST

The SONICOM Ecosystem

Thursday July 2, 2026 1:30pm - 2:30pm CEST

IRCAM:Stravinsky

The SONICOM Ecosystem is a repository dedicated to spatial hearing and binaural audio. It provides means to store data as databases and tools (including their metadata), to create relations between them, and to enable specific data visualization tailored to the needs of the auditory community. It also enables persistent publications via digital object identifiers (DOIs) and supports the authors along their typical process of publishing scientific articles. In this workshop, we will guide the participants through the key features of the SONICOM Ecosystem and show how the Ecosystem can support researchers during their publication workflow.

Speakers

Piotr Majdak

Michael Mihocic

Jonathan Stuefer

Herwig Stöger

Thursday July 2, 2026 1:30pm - 2:30pm CEST
IRCAM:Stravinsky 1, place Igor Stravinsky Paris 4e

Immersive audio, Tutorial

2:30pm CEST

The Role of Source Directivity in Spatial Audio Rendering for AR/VR/XR Environments

Thursday July 2, 2026 2:30pm - 3:30pm CEST

IRCAM:Stravinsky

Source directivity constitutes a fundamental acoustic property of musical instruments, describing the variation of radiated sound pressure as a function of direction. This behavior is dependent on the geometry, material properties, and excitation mechanisms of the instrument, and plays an important role in spatial sound perception. In the real world, the directional characteristics of a source contribute significantly to how sound is localized, how timbre is perceived across different listening positions, how sound is captured with different microphone techniques and placements, and how sound interacts with the surrounding environment. Yet, despite its importance, source directivity is often simplified or neglected in contemporary spatial audio rendering approaches, particularly within AR/VR/XR applications where computational constraints and system complexity frequently dictate design choices. Directivity describes the angular dependence of radiated sound pressure and constitutes a defining acoustic signature of each instrument. Acoustic directivity measurements are based on demanding and carefully controlled procedures. Typically, they are conducted in anechoic or low-reverberation environments using dense microphone arrays, and rely on excitation mechanisms, in order to improve measurement accuracy and repeatability. It should be acknowledged, however, that there exists a gap between acoustic research and its practical integration into immersive media technologies. Many current XR applications rely on simplified or generic source models, prioritizing computational efficiency and ease of implementation over acoustic accuracy. While there is a clear benefit on the use of simplified directivity approaches, such practices reduce the perceptual realism and fidelity of the reproduced sound field. This raises critical questions: To what extent does accurate directivity contribute to perceptual realism? Are approximations sufficient, and under what conditions do they compromise the experience? This workshop addresses these questions by exploring both the scientific foundations and practical implications of incorporating source directivity into AR/VR/XR systems. It is structured in three parts, offering theoretical information and practical perspectives on the role of sound source directivity in immersive audio applications. The first part discusses source directivity and its importance in sound emission, perception, and spatial realism. Emphasis will be given on recent research involving the capture and analysis of directivity patterns of the human signing voice across different music genres and traditional Greek musical instruments. Two directivity databases dedicated to this research, which are publicly available through the SONICOM Ecosystem repository (https://ecosystem.sonicom.eu/) will be also presented, along with an overview of their structure, content, and potential applications. The second part focuses on the integration of directivity data into spatial audio rendering pipelines for AR/VR/XR environments. Participants will be introduced to the latest updates of the SOFA (Spatially Oriented Format for Acoustics) conventions specifically created for storing and exchanging directivity information. In addition, the Binaural Rendering Toolbox (BRT), developed within the SONICOM project, will be presented as a practical tool that facilitates the implementation of directivity-aware rendering workflows. The third part concerns a critical discussion on the practical implications of using accurate or approximated directivity data in immersive audio applications. Drawing on results from selected case studies, the session will evaluate the perceptual and computational trade-offs involved, offering guidance on when high-precision data is necessary and when simplified models may suffice in AR/VR/XR applications.

Speakers

Areti Andreopoulou

David Poirier-Quinot

Konstantinos Bakogiannis

Thursday July 2, 2026 2:30pm - 3:30pm CEST
IRCAM:Stravinsky 1, place Igor Stravinsky Paris 4e

Immersive audio, Workshop

10:30am CEST

Perceptual Modeling of Binaural vs. Stereo Music Mixes: A Pairwise Differential Approach with Dimension-wise Attention

Friday July 3, 2026 10:30am - 11:00am CEST

IRCAM:Stravinsky

Evaluating binaural rendering against stereo mixes is frequently confounded by "content bias," where listeners' inherent musical preferences obscure spatial quality assessments. To address this, we propose an interpretable predictive model utilizing a pairwise differential approach (Delta Strategy) and a dimension-wise attention neural network. The model achieves a competitive sign accuracy of 68.4%, outperforming traditional baselines. Crucially, the attention mechanism provides retrospective interpretability, revealing fundamental acoustic trade-offs in spatial upmixing: aggressive decorrelation for image widening compromises localization precision and timbral fullness, whereas successful externalization heavily depends on mid-side energy redistribution. This framework offers a robust evaluation tool for spatial algorithms and actionable psychoacoustic guidance for immersive audio production.

Speakers

Jiarui Liang

Yizhen Wang

Haitian Zhang

Huanhe Li

Yizhen Hou

Friday July 3, 2026 10:30am - 11:00am CEST
IRCAM:Stravinsky 1, place Igor Stravinsky Paris 4e

Perception, Lecture

11:00am CEST

The Impact of User Expertise on Immersion and Usability in an Interactive VR Music Experience

Friday July 3, 2026 11:00am - 11:30am CEST

IRCAM:Stravinsky

Designing interactive music systems in Virtual Reality (VR) requires balancing intuitive entry points with expressive depth, yet it remains unclear how domain-specific knowledge (Music Expertise) and medium-specific experience (VR Familiarity) distinctly shape the user experience within these environments. This paper investigates how user expertise impacts engagement with an interactive VR music experience. We conducted a mixed-methods study with 32 participants, categorized by these two factors, to systematically evaluate their influence on perceived usability, immersion, and interaction behavior. Results indicate that Music Expertise significantly enhanced perceived usability, whereas VR Familiarity had no significant effect. Perceived immersion was reported as universally high across all groups, regardless of background. Behavioral data revealed distinct engagement patterns: Experts and VR-familiar users focused more on 6DoF spatial mixing controls, while novices required significantly more time and physical exploration. These findings suggest that for creative VR tools, domain knowledge is a stronger predictor of usability than technical fluency. We discuss the success of a ‘Low Floor, High Ceiling, and Wide Walls’ design and propose critical design implications for onboarding, interaction metaphors, and aligning user intent in embodied music systems.

Speakers

Jacob Hedges

Robert Sazdov

Andrew Johnston

Friday July 3, 2026 11:00am - 11:30am CEST
IRCAM:Stravinsky 1, place Igor Stravinsky Paris 4e

Perception, Lecture

11:30am CEST

The Influence of Listener's Background on Virtual Source Detection in a 6DoF Spatial Audio Task

Friday July 3, 2026 11:30am - 12:00pm CEST

IRCAM:Stravinsky

The perceptual evaluation of spatial and immersive audio systems commonly relies on listening tests, where the role of listener-related factors is often treated as secondary. While previous studies have shown that listener expertise can influence performance in virtual audio tasks, this has not been systematically investigated in more complex mixed real–virtual and dynamic listening scenarios. This study examines the role of listener background in a six-degrees-of-freedom (6DoF) spatial detection task involving virtual and real sound sources. Eighteen participants identified the presence of a virtual speech source among concurrent targets and distractors while freely navigating a loudspeaker-based scene. Listener background was characterised by years of musical training and self-reported experience with spatial audio technologies, used to categorise participants as expert or naïve. Results show above-chance performance, with reduced accuracy in spatially adjacent conditions. Listeners with greater musical training and spatial audio experience achieved higher percent-correct scores. These findings are consistent with prior work on listener-dependent localisation performance, and extend them to a 6DoF mixed real–virtual context. The results highlight the importance of explicitly considering and reporting participant expertise in the design, analysis, and interpretation of spatial audio perception studies.

Speakers

Rahul Roy Chowdhury

Julie Meyer

Lorenzo Picinali

Friday July 3, 2026 11:30am - 12:00pm CEST
IRCAM:Stravinsky 1, place Igor Stravinsky Paris 4e

Perception, Lecture

12:00pm CEST

Choir Performance in Virtual Versus Real Rooms: The Influence of Acoustic Modality on Singers’ Performance and Perception

Friday July 3, 2026 12:00pm - 12:30pm CEST

IRCAM:Stravinsky

Several studies suggest that singers adapt their vocal production to room acoustics, and virtual reality (VR) has increasingly been used to investigate such interactions under controlled conditions. However, questions remain regarding the ecological validity of virtual acoustic environments for studying musicians’ behavior. While prior research has primarily focused on solo singers, the present study explores the impact of acoustic modality (real vs. virtual) on choral performance. A professional four-singer ensemble performed five different choral pieces across five acoustic conditions. Recordings were conducted both in situ, within different spaces of a church, and under corresponding virtual acoustic simulations using auralization techniques. Acoustic and physiological data were collected using close microphones and electroglottography, while subjective perceptions were assessed through questionnaires. Comparative analyses between real and virtual conditions aim to examine how acoustic modality (real or virtual) influences singers’ musical and physiological adaptations, as well as their subjective perceptions.

Speakers

Charlotte Fernandez

Nathalie Henrich

Brian F.G. Katz

Friday July 3, 2026 12:00pm - 12:30pm CEST
IRCAM:Stravinsky 1, place Igor Stravinsky Paris 4e

Perception, Lecture

2:00pm CEST

On the influence of headphone cup acoustics on individual pinna cues

Friday July 3, 2026 2:00pm - 2:30pm CEST

IRCAM:Stravinsky

In head-related transfer functions (HRTFs), spectral cues due to the individual pinna geometry are known to contribute to elevation perception and externalization. The pinna component of an HRTF is referred to as a pinna-related transfer function (PRTF). Some headphone concepts aim to excite individual PRTF cues by placing the headphone transducer away from the traditional position on the interaural axis, e.g. tilted in front of the pinna. However, it is not clear to which extent the individual PRTF is preserved when the pinna is placed inside a headphone cup enclosed by a baffle and a cushion. In this study, multiple prototype setups successively approximating a headphone cup and allowing for variable transducer positions are analyzed using a set of silicone pinna replicas. PRTF perturbations are analyzed in near field measurements and the impact of headphone cup acoustics is discussed. Based on the observation that the perturbations are systematic, an equalization scheme restoring the free field PRTF based on the median of measurements with several pinnae is proposed.

Speakers

Roman Kiyan

Stephan Preihs

Jürgen Peissig

Friday July 3, 2026 2:00pm - 2:30pm CEST
IRCAM:Stravinsky 1, place Igor Stravinsky Paris 4e

HRTFs, Lecture

2:30pm CEST

Personalized Head-Related Transfer Function Modeling Using a Neural Operator

Friday July 3, 2026 2:30pm - 3:00pm CEST

IRCAM:Stravinsky

Virtual, augmented, and mixed reality experiences are becoming more commonplace as consumer-grade devices proliferate. Head-Related Transfer Functions (HRTFs) are used to create realistic spatial audio in virtual and augmented environments. Mathematically, HRTFs represent solutions to acoustic boundary-value scattering problems governed by the Helmholtz equation. Neural operators are neural networks designed to learn the solutions of partial differential equations (PDEs). The present work proposes an operator-learning framework based on the Deep Operator Network (DeepONet) for individualized HRTF prediction. By implementing a non-uniform sampling strategy for 3-D head meshes and data compression along the frequency axis, the framework achieves high-fidelity predictions while reducing data dimensionality. Our method shows low log-spectral distortion, generalizes to unseen spatial grids, and infers an entire head’s HRTF field in ~0.3 seconds. Objective evaluations demonstrate the framework's effectiveness in personalization and spatial interpolation. Furthermore, robust performance on unseen subjects and coordinates highlights the model's generalization capability, offering a computationally efficient alternative for HRTFs personalization.

Speakers

Chenshen Lu

Kyla McMullen

Friday July 3, 2026 2:30pm - 3:00pm CEST
IRCAM:Stravinsky 1, place Igor Stravinsky Paris 4e

HRTFs, Lecture

3:00pm CEST

The Influence of Binauralizer and HRTF Preprocessing on Objective Loudness in Ambisonics

Friday July 3, 2026 3:00pm - 3:30pm CEST

IRCAM:Stravinsky

Accurate loudness estimation is essential for audio production, quality control, and loudness compliance, but no established recommendation exists for binaural playback over headphones. This paper investigates the influence of binauralizers and HRTF processing on objective loudness estimation for binauralized Ambisonics content. Two experiments were conducted using 163 Ambisonics clips binauralized with two open-source renderers and three HRTF sets under three HRTF preprocessing conditions. Objective loudness metrics were compared against ground truth loudness data derived from 7.1+4 loudspeaker feeds according to ITU-R BS.1770. Results reveal small to moderate differences in Integrated Loudness and larger differences in the True Peak values between the evaluated binauralizers, and that diffuse-field equalization can effectively eliminate loudness and True Peak differences across binauralizers and across sets of HRTFs. The findings can help to better predict and ensure loudness compliance in binauralized audio consumption in XR and gaming, especially when importing 3rd-party HRTFs is supported.

Speakers

Nils Peters

Friday July 3, 2026 3:00pm - 3:30pm CEST
IRCAM:Stravinsky 1, place Igor Stravinsky Paris 4e

HRTFs, Lecture

3:30pm CEST

Direction-Dependent Ear Canal Transmission at High Frequencies: A Multi-Subject Study using 3D-Printed Replicas

Friday July 3, 2026 3:30pm - 4:00pm CEST

IRCAM:Stravinsky

Head-Related Transfer Functions (HRTFs) are commonly measured at the blocked ear canal entrance, assuming that the ear canal transfer function is direction-independent. While this assumption holds well at low and mid frequencies, its validity at high frequencies has been questioned. A recent pilot study on a single pair of 3D-printed ear replicas found evidence of directional effects above 9 kHz, but was limited in scope. This study extends that work using 3D-printed ear replicas of ten subjects from the IHA database, mounted on a dummy head. Ear canal transfer functions were measured across a full spherical grid of 1944 incidence angles. Results reveal significant directional variability above 6–7 kHz, with standard deviations of 6 –8 dB at resonant frequencies. High measurement repeatability confirms these are genuine directional effects rather than measurement artifacts. The directional behavior is consistently observed across all subjects and appears linked to the second and higher ear canal resonances. These findings suggest that the current state of the art blocked-canal HRTF measurements may omit spatially relevant spectral information above 7 kHz.

Speakers

Baptiste Fourrier

Daniel Sinev

Benjamin Pries

Stephan Preihs

Jürgen Peissig

Friday July 3, 2026 3:30pm - 4:00pm CEST
IRCAM:Stravinsky 1, place Igor Stravinsky Paris 4e

HRTFs, Lecture

4:00pm CEST

Evaluation of Head-Related Transfer Functions Across Five Levels of Individualisation in Virtual Reality

Friday July 3, 2026 4:00pm - 4:30pm CEST

IRCAM:Stravinsky

Head-related transfer functions (HRTFs) underpin spatial hearing in virtual and augmented reality systems. Whilst individual HRTFs capture listener-specific morphology, their practical limitations have led to widespread use of generic HRTFs and growing interest in synthetic approaches. Yet their relative perceptual impact remains rarely compared within a single study. In this study, twenty listeners completed two virtual reality sound localisation experiments with complementary subsets of interleaved HRTF conditions enabling within-subject comParison of five conditions: individually measured, KEMAR, randomly selected non-individual measured, high-resolution scan-based synthetic and photogrammetry-based synthetic HRTFs. Test–retest stability of the individually measured baseline across sessions supported pooling across experiments and attributing differences to perceptual rather than session effects. Across HRTF conditions, lateral localisation metrics were largely insensitive to HRTF type, whereas polar-domain metrics and confusion rates showed strong HRTF dependence. Random HRTFs outperformed KEMAR on several polar metrics. High-resolution synthetic HRTFs matched individual measured performance, whilst photogrammetry-based synthetic HRTFs, alongside KEMAR, showed the greatest degradation. These findings clarify practical choices for non-individual baselines and highlight the importance of mesh resolution when using numerical synthesis for elevation-dependent localisation tasks.

Speakers

Ludovic Pirard

Katarina C. Poole

Friday July 3, 2026 4:00pm - 4:30pm CEST
IRCAM:Stravinsky 1, place Igor Stravinsky Paris 4e

HRTFs, Lecture

11:30am CEST

1:30pm CEST

2:30pm CEST

10:30am CEST

11:00am CEST

11:30am CEST

12:00pm CEST

2:00pm CEST

2:30pm CEST

3:00pm CEST

3:30pm CEST

4:00pm CEST

Get help with the event