AES 2026 AVARIG Conference: Full Schedule

Schedule as of May 2026 - subject to change

Default Time Zone is EDT - Eastern Daylight Time

arrow_back View All Dates

9:30am CEST

Coffee

Friday July 3, 2026 9:30am - 10:00am CEST

IRCAM:Gallery

Friday July 3, 2026 9:30am - 10:00am CEST
IRCAM:Gallery 1, place Igor Stravinsky Paris 4e

Social event

10:00am CEST

Vaulted Harmonies : Archaeoconcert at Notre-Dame film projection

Friday July 3, 2026 10:00am - 11:00am CEST

IRCAM:ESPRO (HOA)

The Vaulted Harmonies project reconstructs the acoustic and musical heritage of Notre-Dame de Paris through immersive audio-visual experiences spanning multiple centuries of the cathedral's architectural evolution, developed as part of the Past Has Ears at Notre-Dame (PHEND) research project. Building upon the dome screening presented earlier at AVARIG 2026, this presentation focuses on the spatial audio production workflow underpinning the project and the perceptual trade-offs involved in adapting it across dissemination formats. Starting from room impulse responses derived from geometrical acoustic simulations and convolved with anechoic multichannel musical recordings using RoomZ, a dynamic room impulse response panner developed as part of the project, higher-order ambisonic renderings were generated through continuous interpolation along cinematically designed camera trajectories. The 360° dome version, premiered at the Planetarium of the Cité des Sciences et de l'Industrie, decoded these renderings to a 5.1+1 loudspeaker layout. This presentation complements that screening by examining the underlying HOA production chain and presenting selected scenes in third-order ambisonic reproduction, enabling direct perceptual comParison between the two output formats, though in a conventional frontal video projection context. Topics discussed include the design of spatially coherent auralisation trajectories, maintaining a coherent audio-visual narrative across successive historical reconstructions of the cathedral, and the trade-offs related to output format and deployment context, situated within a broader workflow designed to support wide public dissemination of immersive heritage experiences.

Speakers

David Poirier-Quinot

Brian F.G. Katz

Friday July 3, 2026 10:00am - 11:00am CEST
IRCAM:ESPRO (HOA) 1, place Igor Stravinsky Paris 4e

Immersive audio, Demo

10:00am CEST

Sponsor demos

Friday July 3, 2026 10:00am - 12:30pm CEST

IRCAM:Studio 5

Come see the newest developements of our sponsors

Friday July 3, 2026 10:00am - 12:30pm CEST
IRCAM:Studio 5 1, place Igor Stravinsky Paris 4e

Sponsor demos

10:30am CEST

Perceptual Modeling of Binaural vs. Stereo Music Mixes: A Pairwise Differential Approach with Dimension-wise Attention

Friday July 3, 2026 10:30am - 11:00am CEST

IRCAM:Stravinsky

Evaluating binaural rendering against stereo mixes is frequently confounded by "content bias," where listeners' inherent musical preferences obscure spatial quality assessments. To address this, we propose an interpretable predictive model utilizing a pairwise differential approach (Delta Strategy) and a dimension-wise attention neural network. The model achieves a competitive sign accuracy of 68.4%, outperforming traditional baselines. Crucially, the attention mechanism provides retrospective interpretability, revealing fundamental acoustic trade-offs in spatial upmixing: aggressive decorrelation for image widening compromises localization precision and timbral fullness, whereas successful externalization heavily depends on mid-side energy redistribution. This framework offers a robust evaluation tool for spatial algorithms and actionable psychoacoustic guidance for immersive audio production.

Speakers

Jiarui Liang

Yizhen Wang

Haitian Zhang

Huanhe Li

Yizhen Hou

Friday July 3, 2026 10:30am - 11:00am CEST
IRCAM:Stravinsky 1, place Igor Stravinsky Paris 4e

Perception, Lecture

11:00am CEST

The Impact of User Expertise on Immersion and Usability in an Interactive VR Music Experience

Friday July 3, 2026 11:00am - 11:30am CEST

IRCAM:Stravinsky

Designing interactive music systems in Virtual Reality (VR) requires balancing intuitive entry points with expressive depth, yet it remains unclear how domain-specific knowledge (Music Expertise) and medium-specific experience (VR Familiarity) distinctly shape the user experience within these environments. This paper investigates how user expertise impacts engagement with an interactive VR music experience. We conducted a mixed-methods study with 32 participants, categorized by these two factors, to systematically evaluate their influence on perceived usability, immersion, and interaction behavior. Results indicate that Music Expertise significantly enhanced perceived usability, whereas VR Familiarity had no significant effect. Perceived immersion was reported as universally high across all groups, regardless of background. Behavioral data revealed distinct engagement patterns: Experts and VR-familiar users focused more on 6DoF spatial mixing controls, while novices required significantly more time and physical exploration. These findings suggest that for creative VR tools, domain knowledge is a stronger predictor of usability than technical fluency. We discuss the success of a ‘Low Floor, High Ceiling, and Wide Walls’ design and propose critical design implications for onboarding, interaction metaphors, and aligning user intent in embodied music systems.

Speakers

Jacob Hedges

Robert Sazdov

Andrew Johnston

Friday July 3, 2026 11:00am - 11:30am CEST
IRCAM:Stravinsky 1, place Igor Stravinsky Paris 4e

Perception, Lecture

11:00am CEST

(P) Convolving the Convoluted: Acoustigrammetry for Immersive Virtual Reality

Friday July 3, 2026 11:00am - 12:30pm CEST

IRCAM:ESPRO (HOA)

Numerous approaches have been taken to address the problem of generating navigable virtual models for multi-volume acoustic spaces. The general practice for creating empirically informed interactive models of multi-volume acoustic spaces, as embodied by the Spatially Oriented Format for Acoustics, is to discretely sample emitter-receiver pair positions. For a user to then navigate between these discrete positions involves cross-fading, blending, or otherwise perceptually interpolating between corresponding zones. This paper outlines a new approach which instead involves the continuous three-dimensional sampling of acoustic spaces, much as is done with 3D visual spaces in photogrammetry. To achieve this result, a first-of-its-kind consolidated ambisonic impulse response capturing apparatus has been designed and built. This apparatus combines a 3rd-order ambisonic microphone array with a 2nd-order ambisonic loudspeaker array and is designed to be moved through a space with maximal ease. AD/DA conversion, playback, and recording are all handled on a central compute platform. In parallel, a software workflow has been developed which can be implemented in Unreal Engine, as well as other game engines. To solve general issues of spatial audio in game engines, a custom encoding and decoding framework has been implemented. Then, to map the continuous ambisonic impulse response onto a virtual space, a spline mirroring the sampling path is drawn through the space. On the DSP side, an impulse response is extracted from any arbitrary point along the spline by way of the Common-slope Model for coupled spaces. Future work for better addressing early reflections and minimizing the theoretical intermediary of the Common-slope Model is discussed. Additionally, a special use case for visualizing acoustic energy in architectural acoustics is explored.

Speakers

Justin Alley

Friday July 3, 2026 11:00am - 12:30pm CEST
IRCAM:ESPRO (HOA) 1, place Igor Stravinsky Paris 4e

Auralization / 6DoF, Poster

11:00am CEST

(P) Immersive Drum Circle: A Tool For Performing and Composing Spatial Music

Friday July 3, 2026 11:00am - 12:30pm CEST

IRCAM:ESPRO (HOA)

Despite significant advances in the development and adoption of spatial audio, many musicians do not embed the technology within their creative processes. Instead, spatial audio technologies are more often used to create immersive adaptations of fundamentally frontal compositions or performances. This paper presents and evaluates a means of spatial music making, referred to as the immersive drum circle. The system facilitates group performance and composition, in which participants stand in a circle and perform on electronic percussion pads, with sound spatialised so that the listener experiences the music as if positioned within the ensemble. The system’s design is presented alongside implementation details, as well as feedback from musicians obtained as part of an educational workshop which aimed to inform how spatial audio can be used creatively in music. In addition to interacting with the system, participants auditioned the resulting spatial music across three playback scenarios representing: gaming, tracked and non-tracked headphone-based music consumption, and a live concert environment. The results show that the immersive drum circle system is a viable tool for music creation and a practical means of inspiring future compositional techniques.

Speakers

Joseph Williams

Matt Barnard

Katarzyna Sochaczewska

Friday July 3, 2026 11:00am - 12:30pm CEST
IRCAM:ESPRO (HOA) 1, place Igor Stravinsky Paris 4e

Authoring, Poster

11:00am CEST

(P) Ambimix: a Scene-Based Approach to Interactive Mixing

Friday July 3, 2026 11:00am - 12:30pm CEST

IRCAM:ESPRO (HOA)

Channel-based mixing has long been the standard paradigm for audio professionals in both studio and live performance contexts, owing to its intuitive, signal-oriented workflow. While this approach excels in conventional stereo and multichannel formats, it offers limited native support for advanced spatial applications. As immersive audio formats become increasingly popular for virtual and augmented reality, new mechanisms are needed to allow engineers to work directly within spatial domains rather than adapting channel-based setups to fit immersive content. This paper presents an interactive software system for mixing First-Order Ambisonic (FOA) soundfields in real time. The system accepts A-format recordings from tetrahedral microphones, including the Sennheiser Ambeo and the Core Sound TetraMic. It converts them to B-format using a directional beamforming approach, in which the individual dimensions (left, right, front, back, top, bottom) are independently crossfaded between sources. Each directional beam is extracted via a dot product with a corresponding spherical harmonic steering vector, crossfaded between the two input soundfields, and reencoded into a new B-format using an outer product reconstruction. The program architecture is designed to accommodate future extensions to additional Ambisonic microphone formats, including the Røde SoundField, the Zylia ZM-1, and the Eigenmike. This positions the platform toward a microphone-agnostic encoding and mixing platform. By parameterizing the encoding stage through a gain-factored matrix, the directional steering vectors can be reconfigured to reflect the capsule geometry of any tetrahedral target microphone array. Future iterations of the program can extend the beamforming framework beyond the first-order WXYZ dimensions, enabling the integration of higher-order ambisonic encoding to improve spatial resolution and directional accuracy. The primary contribution of this program is a streamlined interface for ambisonic processing, designed to make scene-based mixing accessible to engineers and sound designers working in immersive audio production.

Speakers

Tom Yaniv

Agnieszka Roginska

Na Na

Friday July 3, 2026 11:00am - 12:30pm CEST
IRCAM:ESPRO (HOA) 1, place Igor Stravinsky Paris 4e

HOA, Poster

11:00am CEST

(P) Perceptual Evaluation of Higher-Order Ambisonic Codecs on Both Synthetic Mixing and Native Recordings

Friday July 3, 2026 11:00am - 12:30pm CEST

IRCAM:ESPRO (HOA)

Spatial audio is spreading in applications such as virtual and augmented reality and immersive games. The higher-order ambisonic (HOA) format is particularly useful in this context. Transmitting spatial information requires multiple channels, e.g., 16 channels for third-order ambisonics, resulting in increased memory requirements for storage and higher bitrates for communication. Therefore, efficient compression algorithms are necessary for those contents. The recently standardized IVAS codec allows the coding of HOA content for communication use-cases. Here, we propose to evaluate it in comParison with a basic multi-mono approach across a variety of contents and spatialization methods. Results show that IVAS outperforms the multi-mono approach at the same bitrate. In particular, this codec exploits inter-channel correlation to reduce the bitrate. We point out that it is therefore especially robust for signals with a high interchannel correlation, such as those composed of a limited number of plane waves. Conversely, the multi-mono approach is unable to exploit this correlation and performs poorly on this type of signal.

Speakers

Adrien Llave

Grégory Pallone

Jérôme Daniel

Friday July 3, 2026 11:00am - 12:30pm CEST
IRCAM:ESPRO (HOA) 1, place Igor Stravinsky Paris 4e

Perception, Poster

11:30am CEST

The Influence of Listener's Background on Virtual Source Detection in a 6DoF Spatial Audio Task

Friday July 3, 2026 11:30am - 12:00pm CEST

IRCAM:Stravinsky

The perceptual evaluation of spatial and immersive audio systems commonly relies on listening tests, where the role of listener-related factors is often treated as secondary. While previous studies have shown that listener expertise can influence performance in virtual audio tasks, this has not been systematically investigated in more complex mixed real–virtual and dynamic listening scenarios. This study examines the role of listener background in a six-degrees-of-freedom (6DoF) spatial detection task involving virtual and real sound sources. Eighteen participants identified the presence of a virtual speech source among concurrent targets and distractors while freely navigating a loudspeaker-based scene. Listener background was characterised by years of musical training and self-reported experience with spatial audio technologies, used to categorise participants as expert or naïve. Results show above-chance performance, with reduced accuracy in spatially adjacent conditions. Listeners with greater musical training and spatial audio experience achieved higher percent-correct scores. These findings are consistent with prior work on listener-dependent localisation performance, and extend them to a 6DoF mixed real–virtual context. The results highlight the importance of explicitly considering and reporting participant expertise in the design, analysis, and interpretation of spatial audio perception studies.

Speakers

Rahul Roy Chowdhury

Julie Meyer

Lorenzo Picinali

Friday July 3, 2026 11:30am - 12:00pm CEST
IRCAM:Stravinsky 1, place Igor Stravinsky Paris 4e

Perception, Lecture

12:00pm CEST

Choir Performance in Virtual Versus Real Rooms: The Influence of Acoustic Modality on Singers’ Performance and Perception

Friday July 3, 2026 12:00pm - 12:30pm CEST

IRCAM:Stravinsky

Several studies suggest that singers adapt their vocal production to room acoustics, and virtual reality (VR) has increasingly been used to investigate such interactions under controlled conditions. However, questions remain regarding the ecological validity of virtual acoustic environments for studying musicians’ behavior. While prior research has primarily focused on solo singers, the present study explores the impact of acoustic modality (real vs. virtual) on choral performance. A professional four-singer ensemble performed five different choral pieces across five acoustic conditions. Recordings were conducted both in situ, within different spaces of a church, and under corresponding virtual acoustic simulations using auralization techniques. Acoustic and physiological data were collected using close microphones and electroglottography, while subjective perceptions were assessed through questionnaires. Comparative analyses between real and virtual conditions aim to examine how acoustic modality (real or virtual) influences singers’ musical and physiological adaptations, as well as their subjective perceptions.

Speakers

Charlotte Fernandez

Nathalie Henrich

Brian F.G. Katz

Friday July 3, 2026 12:00pm - 12:30pm CEST
IRCAM:Stravinsky 1, place Igor Stravinsky Paris 4e

Perception, Lecture

12:30pm CEST

Lunch

Friday July 3, 2026 12:30pm - 1:30pm CEST

IRCAM:Gallery

Friday July 3, 2026 12:30pm - 1:30pm CEST
IRCAM:Gallery 1, place Igor Stravinsky Paris 4e

Social event

1:30pm CEST

On the influence of headphone cup acoustics on individual pinna cues

Friday July 3, 2026 1:30pm - 2:00pm CEST

IRCAM:Stravinsky

In head-related transfer functions (HRTFs), spectral cues due to the individual pinna geometry are known to contribute to elevation perception and externalization. The pinna component of an HRTF is referred to as a pinna-related transfer function (PRTF). Some headphone concepts aim to excite individual PRTF cues by placing the headphone transducer away from the traditional position on the interaural axis, e.g. tilted in front of the pinna. However, it is not clear to which extent the individual PRTF is preserved when the pinna is placed inside a headphone cup enclosed by a baffle and a cushion. In this study, multiple prototype setups successively approximating a headphone cup and allowing for variable transducer positions are analyzed using a set of silicone pinna replicas. PRTF perturbations are analyzed in near field measurements and the impact of headphone cup acoustics is discussed. Based on the observation that the perturbations are systematic, an equalization scheme restoring the free field PRTF based on the median of measurements with several pinnae is proposed.

Speakers

Roman Kiyan

Stephan Preihs

Jürgen Peissig

Friday July 3, 2026 1:30pm - 2:00pm CEST
IRCAM:Stravinsky 1, place Igor Stravinsky Paris 4e

HRTFs, Lecture

1:30pm CEST

Streaming for mixed reality venues

Friday July 3, 2026 1:30pm - 2:30pm CEST

IRCAM:ESPRO (HOA)

A demonstration of a newly developed network spatial audio engine and its client software to show how an object-based audio performance can presented simultaneously locally and in a virtual venue. I’ll play multiple tracks of audio and position data from a laptop, as a surrogate for a local performance, and stream this object-based audio into a 6 DoF virtual audio space. Then show how a remote audience can join the same audio space via web browsers, listen to the music and explore the space. And finally I’ll show streaming back a resolved ambisonic mix of the original performance and the sounds of the remote audience into the auditorium and play it out. I'll walk it through and show how all audio and data transfer is done with simple data structures and standard non-proprietary, streaming protocols.

Speakers

Paul Harter

Friday July 3, 2026 1:30pm - 2:30pm CEST
IRCAM:ESPRO (HOA) 1, place Igor Stravinsky Paris 4e

Application / Gaming, Demo

1:30pm CEST

(P) Acoustic and Perceptual Evaluation of Integrated Near-Ear Speakers vs. Over Head Headphones in VR Environments

Friday July 3, 2026 1:30pm - 4:00pm CEST

IRCAM:Gallery

Virtual reality (VR) technologies have become increasingly widespread, extending beyond their traditional military and professional training applications to areas such as education, simulation, gaming, and entertainment. Most modern VR headsets are equipped with built-in near-ear speakers, commonly called nearphones. Between conventional headphones and loudspeakers, these devices and nearphones offer a convenient and lightweight audio solution without physically enclosing the ear. However, their impact on spatial audio perception and localization performance remains underexplored. This study explores how nearphones integrated into head-mounted displays (HMDs) perform relative to traditional headphones, focusing on identifying the specific acoustic and perceptual factors that enhance or hinder immersive audio experiences in virtual reality. Using the Oculus Quest 3 as a test platform, the research was divided into two parts. First, the frequency response of both headphones and earphones was measured to assess differences in sound quality. Second, a VR first-person shooter game was developed in Unreal Engine to evaluate sound localization. Participants identified targets based on audio cues alone, and performance metrics such as target accuracy and reaction times were collected to compare localization effectiveness. Besides localization accuracy, this research explored which users prefer audio devices. The results suggest that while traditional headphones generally offer more accurate spatial localization, nearphones provide greater comfort and convenience, highlighting a trade-off between acoustic precision and user ergonomics in VR applications.

Speakers

Zhinuo Li

Friday July 3, 2026 1:30pm - 4:00pm CEST
IRCAM:Gallery 1, place Igor Stravinsky Paris 4e

Perception, Poster

1:30pm CEST

(P) Increasing Accessibility of Auditory Research: A 6-DoF Motion-Capture-Based Interface for Localisation Testing

Friday July 3, 2026 1:30pm - 4:00pm CEST

IRCAM:Gallery

Perceptual evaluation of auditory localisation typically relies on graphical user interfaces, pointing devices, or touch screens to capture listener responses. These modalities implicitly require functional vision and/or manual dexterity, excluding participation of, for instance, people with visual impairments. This paper presents a solution for absolute sound-source localisation testing that uses head rotation, tracked by a six-degrees-of-freedom (6-DoF) optical motion-capture system as the response interface and relies solely on auditory cues for calibration and pointing. The paradigm builds on the natural coupling between auditory spatial attention and head orientation. Individual systematic bias is characterised via a mandatory training block in which stimuli are presented at discrete loudspeaker positions. A per-participant linear regression fitted to head-centred training responses provides a bias model that is applied to main-experiment trials, enabling decomposition of localisation error (LE) into constant error (CE, reflecting accuracy) and random error (RE, reflecting precision), following the established accuracy--precision framework for spatial hearing assessment. The specific use case simulates off-sweet-spot listening positions to inform development of a renderer aimed at enhancing the experience of visually impaired audiences consuming audio-described broadcast content. Preliminary data from the control group consisting of sighted participants are presented. The interface design, calibration procedure, and analysis pipeline are offered as a contribution towards inclusive spatial audio evaluation practice.

Speakers

Tomasz Rudzki

Katarzyna Sochaczewska

Michael McLoughlin

Gavin Kearney

Friday July 3, 2026 1:30pm - 4:00pm CEST
IRCAM:Gallery 1, place Igor Stravinsky Paris 4e

Perception, Poster

1:30pm CEST

(P) The impact of audio spatialisation reproduction on the neurophysiological responses of music listeners

Friday July 3, 2026 1:30pm - 4:00pm CEST

IRCAM:Gallery

Research into listeners’ emotional experience of different audio formats heavily relies on subjective, self-report measures. However, little is known about neural and physiological responses. As such, this feasibility study utilised electroencephalography (EEG), Heart Rate (HR) and Galvanic Skin Response (GSR), to explore the objective neurophysiological impacts of mono, stereo and spatial audio formats, across different music genres. In a within-subjects design, participants listened to 27 randomised stimuli, each comprising of a 30 second music excerpt across the three audio formats. Results were not significant but trends did arise in the data. While mono formats were shown to elevate cognitive load and arousal, spatial audio elicited a decrease in physiological arousal, promoting a more relaxed state. However, the effects overall were very genre-dependent. Differences in physiological response between static and dynamic spatial reproduction of different music genres are discussed. While limited by the lack of subjective validation and sample size, this study highlights interesting relationships between audio format and the physiological responses of music listeners.

Speakers

Laney Haywood

Sam Hobern

Isao Nambu

Nicolas Epain

Helena Daffern

Gavin Kearney

Friday July 3, 2026 1:30pm - 4:00pm CEST
IRCAM:Gallery 1, place Igor Stravinsky Paris 4e

Perception, Poster

1:30pm CEST

(P) Virtualising SPHERE: active listening in 3D sound localisation

Friday July 3, 2026 1:30pm - 4:00pm CEST

IRCAM:Gallery

Spatial hearing emerges from the integration of auditory, multisensory, and motor information, and is enhanced in natural conditions through active listening, where head and body movements provide dynamic cues that improve localisation accuracy and perceptual stability. This principle is central to immersive audio research in Virtual and Augmented Reality (VR/AR), where binaural rendering based on Head-Related Transfer Functions (HRTFs) and room acoustic cues enables the reproduction of interaural, monaural, and distance information. Beyond acoustics, bodily engagement (i.e., reaching toward sound sources) further supports spatial adaptation. These technologies enable controlled experimental protocols for assessment and training in both normal-hearing and hearing-impaired populations. One such paradigm is SPHERE, originally developed to study three-dimensional sound localisation in ecologically valid conditions and later applied to training and rehabilitation, including for cochlear implant users. In its original implementation, participants localise sounds presented via a physically moved loudspeaker and respond either through active exploration or under static listening constraints, while head, eye, and hand movements are tracked to analyse localisation accuracy, motor behaviour, and search strategies. However, reliance on a human operator limits reproducibility and scalability. This work introduces a fully virtualised SPHERE implementation using an immersive binaural rendering framework, preserving the original spatial configuration while enabling real-time multimodal tracking. The system also evaluates the impact of HRTF individualisation by comparing generic and personalised filters. Performance is validated against the original loudspeaker-based paradigm to assess ecological validity. Preliminary results regarding the system’s effectiveness as a research and clinical tool will be presented at the conference.

Speakers

Nicola La Magna

Lisa Lever

Clément Desoche

Nils Meyer-Kahlen

Valerie Gaveau

Lorenzo Picinali

Friday July 3, 2026 1:30pm - 4:00pm CEST
IRCAM:Gallery 1, place Igor Stravinsky Paris 4e

Perception, Poster

1:30pm CEST

Sponsor demos

Friday July 3, 2026 1:30pm - 4:00pm CEST

IRCAM:Studio 5

Come see the newest developements of our sponsors

Friday July 3, 2026 1:30pm - 4:00pm CEST
IRCAM:Studio 5 1, place Igor Stravinsky Paris 4e

Sponsor demos

2:00pm CEST

Personalized Head-Related Transfer Function Modeling Using a Neural Operator

Friday July 3, 2026 2:00pm - 2:30pm CEST

IRCAM:Stravinsky

Virtual, augmented, and mixed reality experiences are becoming more commonplace as consumer-grade devices proliferate. Head-Related Transfer Functions (HRTFs) are used to create realistic spatial audio in virtual and augmented environments. Mathematically, HRTFs represent solutions to acoustic boundary-value scattering problems governed by the Helmholtz equation. Neural operators are neural networks designed to learn the solutions of partial differential equations (PDEs). The present work proposes an operator-learning framework based on the Deep Operator Network (DeepONet) for individualized HRTF prediction. By implementing a non-uniform sampling strategy for 3-D head meshes and data compression along the frequency axis, the framework achieves high-fidelity predictions while reducing data dimensionality. Our method shows low log-spectral distortion, generalizes to unseen spatial grids, and infers an entire head’s HRTF field in ~0.3 seconds. Objective evaluations demonstrate the framework's effectiveness in personalization and spatial interpolation. Furthermore, robust performance on unseen subjects and coordinates highlights the model's generalization capability, offering a computationally efficient alternative for HRTFs personalization.

Speakers

Chenshen Lu

Kyla McMullen

Friday July 3, 2026 2:00pm - 2:30pm CEST
IRCAM:Stravinsky 1, place Igor Stravinsky Paris 4e

HRTFs, Lecture

2:30pm CEST

An investigation of AI integration in sound designer workflows and experiences.

Friday July 3, 2026 2:30pm - 3:00pm CEST

IRCAM:ESPRO (HOA)

Artificial intelligence is increasingly being integrated into professional audio production workflows, yet a gap persists between the tools developers produce and the requirements of practising sound designers. This paper investigates this gap through a mixed-methods study comprising a survey of 76 practitioners and follow-up semi-structured interviews with 20 industry professionals. Results were analysed using descriptive statistical analysis and thematic analysis to identify patterns across both datasets. Five themes emerged from our analysis: Context, Workflow, Potential, Risks, and Right Use. Our work indicates that current AI tools perform adequately in fast-consumption media contexts but lack the narrative sophistication required for high-end sound design (films,immersive experiences etc). Practitioners demonstrate a preference for assistive, task-specific applications, particularly in audio restoration and library management, over end-to-end generative systems. This work contributes to the on-going discussion on the use of AI and AI-enhanced tools in the creative industries. We report on the current status of the field from the point of view of sound designers and creative audio practitioners, and offer a set of recommendation for sound technologist and developers based on our findings to guide the development of more informed AI tools for sound design.

Speakers

Nelly Garcia

Joshua Reiss

Friday July 3, 2026 2:30pm - 3:00pm CEST
IRCAM:ESPRO (HOA) 1, place Igor Stravinsky Paris 4e

Authoring, Lecture

2:30pm CEST

The Influence of Binauralizer and HRTF Preprocessing on Objective Loudness in Ambisonics

Friday July 3, 2026 2:30pm - 3:00pm CEST

IRCAM:Stravinsky

Accurate loudness estimation is essential for audio production, quality control, and loudness compliance, but no established recommendation exists for binaural playback over headphones. This paper investigates the influence of binauralizers and HRTF processing on objective loudness estimation for binauralized Ambisonics content. Two experiments were conducted using 163 Ambisonics clips binauralized with two open-source renderers and three HRTF sets under three HRTF preprocessing conditions. Objective loudness metrics were compared against ground truth loudness data derived from 7.1+4 loudspeaker feeds according to ITU-R BS.1770. Results reveal small to moderate differences in Integrated Loudness and larger differences in the True Peak values between the evaluated binauralizers, and that diffuse-field equalization can effectively eliminate loudness and True Peak differences across binauralizers and across sets of HRTFs. The findings can help to better predict and ensure loudness compliance in binauralized audio consumption in XR and gaming, especially when importing 3rd-party HRTFs is supported.

Speakers

Nils Peters

Friday July 3, 2026 2:30pm - 3:00pm CEST
IRCAM:Stravinsky 1, place Igor Stravinsky Paris 4e

HRTFs, Lecture

3:00pm CEST

Audio Formgiving: Sound Zones as Spatial Structures in Mixed Reality

Friday July 3, 2026 3:00pm - 3:30pm CEST

IRCAM:ESPRO (HOA)

Spatial audio in extended reality (XR) has traditionally been framed as a localization tool, guiding users toward discrete virtual objects or events. This paper reframes this object-centered paradigm by presenting audio formgiving, an approach in which sound defines continuous zones demarcated by boundaries that users encounter through embodied movement. We present a mixed-reality study that investigates how participants perceive, reconstruct, and navigate such sound zones. We report our findings on reconstruction accuracy and boundary ambiguities across different sound zone shapes and sizes, and how movement trajectories relate to zone recognition, as well as participants’ strategies for navigating and identifying different types of sound zones.

Speakers

Hyunkyung Shin

Anıl Çamcı

Friday July 3, 2026 3:00pm - 3:30pm CEST
IRCAM:ESPRO (HOA) 1, place Igor Stravinsky Paris 4e

Authoring, Lecture

3:00pm CEST

Direction-Dependent Ear Canal Transmission at High Frequencies: A Multi-Subject Study using 3D-Printed Replicas

Friday July 3, 2026 3:00pm - 3:30pm CEST

IRCAM:Stravinsky

Head-Related Transfer Functions (HRTFs) are commonly measured at the blocked ear canal entrance, assuming that the ear canal transfer function is direction-independent. While this assumption holds well at low and mid frequencies, its validity at high frequencies has been questioned. A recent pilot study on a single pair of 3D-printed ear replicas found evidence of directional effects above 9 kHz, but was limited in scope. This study extends that work using 3D-printed ear replicas of ten subjects from the IHA database, mounted on a dummy head. Ear canal transfer functions were measured across a full spherical grid of 1944 incidence angles. Results reveal significant directional variability above 6–7 kHz, with standard deviations of 6 –8 dB at resonant frequencies. High measurement repeatability confirms these are genuine directional effects rather than measurement artifacts. The directional behavior is consistently observed across all subjects and appears linked to the second and higher ear canal resonances. These findings suggest that the current state of the art blocked-canal HRTF measurements may omit spatially relevant spectral information above 7 kHz.

Speakers

Baptiste Fourrier

Daniel Sinev

Benjamin Pries

Stephan Preihs

Jürgen Peissig

Friday July 3, 2026 3:00pm - 3:30pm CEST
IRCAM:Stravinsky 1, place Igor Stravinsky Paris 4e

HRTFs, Lecture

3:30pm CEST

An Open-Source Auracast Platform for Selective Listening in Assistive Hearing Applications

Friday July 3, 2026 3:30pm - 4:00pm CEST

IRCAM:ESPRO (HOA)

Modern auditory rehabilitation faces significant cIRCAM:Galleryenges in speech discrimination within complex, noisy acoustic environments. The use of Augmented Reality interfaces based on "virtual sound objects" proposes the separation and selective enhancement of audio sources, while the Auracast standard (Bluetooth LE Audio) emerges as the ideal mechanism to distribute these independent streams with low latency. However, the advancement of such selective listening strategies is strictly limited by proprietary commercial ecosystems and a complete lack of open-source research platforms that adhere to the physical and power constraints of wearable devices. To bridge this gap, this work develops an open-source Auracast application on the Tiresias open-hardware platform, establishing an accessible "front-end" infrastructure for auditory interaction. The architecture was implemented on the Nordic nRF5340 SoC utilizing the Zephyr RTOS. Preliminary evaluations on a development kit successfully validated the protocol stack integration, demonstrating stream stability. Ongoing work focuses on porting the firmware to the Tiresias board and integrating the ADAU1787 audio codec, aiming to empirically quantify the end-to-end latency and energy efficiency of the embedded system.

Friday July 3, 2026 3:30pm - 4:00pm CEST
IRCAM:ESPRO (HOA) 1, place Igor Stravinsky Paris 4e

Application / Gaming, Lecture

3:30pm CEST

A survey of HRTF dataset use in academia and industry reveals no de facto standard

Friday July 3, 2026 3:30pm - 4:00pm CEST

IRCAM:Stravinsky

Head-related transfer functions (HRTFs) are crucial for plausible binaural audio playback for virtual, augmented, and mixed-reality applications. In such applications, humans showed higher sound-localisation accuracy, higher perceived externalisation, and experience less colouration when using their individual HRTFs compared to non-individual HRTFs. Because high-quality individual HRTFs require cumbersome measurements in specialised facilities, applications often use non-indivdual or dummy-head HRTFs as a practical alternative. Humans are able to adapt to non-individual HRTFs, which leads to a localisation performance comparable to that achieved with individual HRTFs. Therefore, adaptation to non-individual HRTFs could be a practical alternative whenever individual HRTFs are unavailable; However, this would only be possible if the same non-individual standard HRTF was used across different applications. To find out if this is the case, we conducted a survey on HRTF usage among 76 professionals working in the field of spatial audio. The findings suggest that there is currently no de facto standard HRTF. Surprisingly, only half of those with access to individual HRTFs are actually using them, and most would be willing to switch to a default HRTF set if one was established.

Speakers

Fabian Brinkmann

Katharina Pollack

Nils Meyer-Kahlen

Pedro Lladó

Friday July 3, 2026 3:30pm - 4:00pm CEST
IRCAM:Stravinsky 1, place Igor Stravinsky Paris 4e

HRTFs, Lecture

4:00pm CEST

VR Unseen: An Audio-Haptic Data Framework for Accessible Virtual Storytelling for visually impaired audiences

Friday July 3, 2026 4:00pm - 4:30pm CEST

IRCAM:ESPRO (HOA)

While Virtual Reality offers transformative potential for immersive storytelling, the heavy reliance on visual stimuli often excludes Blind and Visually Impaired audiences. Conventional accessibility methods, such as linear Audio Description, frequently struggle to keep pace with the non-linear, explorative nature of virtual environments, resulting in an "accessibility chasm" where traditional two-dimensional solutions fail to support non-visual navigation. This research addresses these limitations through a User-Centred Design approach, centred on the thematic analysis of semi-structured focus groups involving twelve experienced Blind and Visually Impaired videogame players from the Royal National Institute of Blind People. The inquiry explored four themes: spatial sound navigation, audio description integration, haptic efficacy, and the social dimensions of virtual interfaces. Findings indicate that non-visual spatial exploration requires a multifaceted auditory system utilizing 3D-sound, predictable sound effects, and abstract sound signifiers, paired with a hybrid audio description model balancing functional and affective narration. To mitigate the risk of cognitive overload, participants identified haptic feedback as a critical tool for tactile confirmation and attentional guidance, serving as a non-auditory anchor that complements the primary soundscape. These user-led insights and real life examples seen on accessible video games inform the development of the ‘Description Spheres’: interactive virtual objects embedded within virtual environments that serve as multi-sensory hubs. By integrating spatialized audio, localized haptics, and experimental audio description, the system enables a transition to a dynamic, exploratory model that translates complex visual-spatial data into intuitive, non-visual sensory ecosystems, offering a scalable blueprint for inclusive design.

Speakers

Cesar Portillo

Friday July 3, 2026 4:00pm - 4:30pm CEST
IRCAM:ESPRO (HOA) 1, place Igor Stravinsky Paris 4e

Application / Gaming, Lecture

4:00pm CEST

Evaluation of Head-Related Transfer Functions Across Five Levels of Individualisation in Virtual Reality

Friday July 3, 2026 4:00pm - 4:30pm CEST

IRCAM:Stravinsky

Head-related transfer functions (HRTFs) underpin spatial hearing in virtual and augmented reality systems. Whilst individual HRTFs capture listener-specific morphology, their practical limitations have led to widespread use of generic HRTFs and growing interest in synthetic approaches. Yet their relative perceptual impact remains rarely compared within a single study. In this study, twenty listeners completed two virtual reality sound localisation experiments with complementary subsets of interleaved HRTF conditions enabling within-subject comParison of five conditions: individually measured, KEMAR, randomly selected non-individual measured, high-resolution scan-based synthetic and photogrammetry-based synthetic HRTFs. Test–retest stability of the individually measured baseline across sessions supported pooling across experiments and attributing differences to perceptual rather than session effects. Across HRTF conditions, lateral localisation metrics were largely insensitive to HRTF type, whereas polar-domain metrics and confusion rates showed strong HRTF dependence. Random HRTFs outperformed KEMAR on several polar metrics. High-resolution synthetic HRTFs matched individual measured performance, whilst photogrammetry-based synthetic HRTFs, alongside KEMAR, showed the greatest degradation. These findings clarify practical choices for non-individual baselines and highlight the importance of mesh resolution when using numerical synthesis for elevation-dependent localisation tasks.

Speakers

Ludovic Pirard

Katarina C. Poole

Friday July 3, 2026 4:00pm - 4:30pm CEST
IRCAM:Stravinsky 1, place Igor Stravinsky Paris 4e

HRTFs, Lecture

4:30pm CEST

Closing ceremony

Friday July 3, 2026 4:30pm - 5:00pm CEST

IRCAM:ESPRO (HOA)

Friday July 3, 2026 4:30pm - 5:00pm CEST
IRCAM:ESPRO (HOA) 1, place Igor Stravinsky Paris 4e

Social event

9:30am CEST

10:00am CEST

10:00am CEST

10:30am CEST

11:00am CEST

11:00am CEST

11:00am CEST

11:00am CEST

11:00am CEST

11:30am CEST

12:00pm CEST

12:30pm CEST

1:30pm CEST

1:30pm CEST

1:30pm CEST

1:30pm CEST

1:30pm CEST

1:30pm CEST

1:30pm CEST

2:00pm CEST

2:30pm CEST

2:30pm CEST

3:00pm CEST

3:00pm CEST

3:30pm CEST

3:30pm CEST

4:00pm CEST

4:00pm CEST

4:30pm CEST

Get help with the event