AES 2026 AVARIG Conference: Full Schedule

Schedule as of May 2026 - subject to change

Default Time Zone is EDT - Eastern Daylight Time

9:30am CEST

Coffee

Thursday July 2, 2026 9:30am - 10:00am CEST

IRCAM:Gallery

Thursday July 2, 2026 9:30am - 10:00am CEST
IRCAM:Gallery 1, place Igor Stravinsky Paris 4e

Social event

12:30pm CEST

Lunch

Thursday July 2, 2026 12:30pm - 1:30pm CEST

IRCAM:Gallery

Thursday July 2, 2026 12:30pm - 1:30pm CEST
IRCAM:Gallery 1, place Igor Stravinsky Paris 4e

Social event

1:30pm CEST

(P) A Compact Inverse Auditory Model for Binaural Signal Reconstruction

Thursday July 2, 2026 1:30pm - 3:00pm CEST

IRCAM:Gallery

Binaural signal synthesis is typically formulated as forward modelling using head-related transfer functions (HRTFs). We explore an inverse auditory modelling perspective in which binaural ear signals are estimated directly from a source signal and its azimuth. We present a lightweight complex-valued neural network that predicts frequency-domain binaural filters from the input source spectrum and azimuthal direction, which are then applied to synthesize binaural signals. Controlled experiments evaluate how excitation bandwidth and angular sampling density affect reconstruction and generalization. Results show accurate spectral reconstruction and interpolation to unseen source directions even when training uses sparse angular grids, while bandwidth strongly influences problem conditioning and error behaviour. This work focuses on characterizing compact signal-conditioned inverse models as efficient components for binaural signal generation.

Speakers

Vlad Paul

Philip Nelson

Thursday July 2, 2026 1:30pm - 3:00pm CEST
IRCAM:Gallery 1, place Igor Stravinsky Paris 4e

Binaural, Poster

1:30pm CEST

(P) Short-Term VR Sound-Localization Training under Simulated Single-Sided Deafness: Evaluation of an Enhanced HRTF

Thursday July 2, 2026 1:30pm - 3:00pm CEST

IRCAM:Gallery

Single-sided deafness (SSD) reduces access to binaural cues and can make spatial-audio localization difficult in virtual reality (VR). This study investigated short-term localization training under simulated SSD in a VR task using generic, non-individualized head-related transfer function (HRTF) rendering with head-movement-contingent auditory updating, and examined whether an enhanced HRTF could improve performance by emphasizing monaurally available spectral cues at the better-hearing ear. The rationale was that, although directional judgment in normal binaural listening depends strongly on interaural differences, monaural listening must rely more heavily on direction-dependent spectral characteristics that remain available at the better-hearing ear. Twenty normal-hearing participants performed a 13-source horizontal-plane localization task using a VR headset and headphones under simulated SSD. Participants were assigned to either normal-HRTF training or enhanced-HRTF training (n = 10 each). The experiment comprised pre-test, three training sessions, and post-test, and all participants were tested with both normal and enhanced HRTFs, yielding four train-test combinations. Performance was evaluated using accuracy (ACC), mean absolute error (MAE), and response time (RT). Localization performance improved with training under the present VR simulated-SSD condition. ACC increased and MAE decreased from pre-test to post-test, whereas RT showed no clear change. No significant overall between-group difference in cumulative improvement was observed. However, during training, the enhanced-HRTF group showed a significant first-session advantage, and matched train-test combinations showed descriptively larger gains than mismatched combinations. These results suggest that short-term VR localization training can improve directional judgment under simulated SSD and that enhancing monaural spectral cues may provide an early benefit by making direction-specific patterns easier to associate with source direction. The findings are limited to localization performance in the present VR task under simulated SSD and should not be directly generalized to clinical SSD populations, real-world auditory rehabilitation, or broader everyday 3D spatial-audio experience.

Speakers

Kentaro Fujii

Ryugo Kijima

Thursday July 2, 2026 1:30pm - 3:00pm CEST
IRCAM:Gallery 1, place Igor Stravinsky Paris 4e

Binaural, Poster

1:30pm CEST

(P) An evaluation benchmark of artificial intelligence models for estimating head-related transfer functions (HRTFs) from ear shape representations

Thursday July 2, 2026 1:30pm - 3:00pm CEST

IRCAM:Gallery

Head-related transfer functions (HRTFs) are fundamental to spatial audio via binaural rendering. Personalized HRTFs have been shown to improve localization accuracy and reduce perceptual artifacts and directional ambiguities. However, acquiring such HRTFs is time-consuming and requires costly measurement setups. To address this limitation, this article investigates the use of deep learning models to estimate personalized HRTFs from ear shape representations. We propose and evaluate three different architectures with various types of input data and identify the minimum achievable spectral distance error when predicting true HRTFs magnitude spectra. The best model we evaluated achieves a test Log Spectral Distortion (LSD) of 4.93 dB. We also established a performance ranking based on input data types and architectural choices.

Speakers

Alexandre Philippon

Loïc Reboursière

Thierry Dutoit

Thursday July 2, 2026 1:30pm - 3:00pm CEST
IRCAM:Gallery 1, place Igor Stravinsky Paris 4e

HRTFs, Poster

1:30pm CEST

(P) Investigating the Effect of Sample Rate Variation on the Accuracy of Sound Source Localisation Using a Neural Network

Thursday July 2, 2026 1:30pm - 3:00pm CEST

IRCAM:Gallery

This paper describes an experiment to investigate how the localisation performance of a neural network for Sound Source Localisation named `SampleDOA\_SR' would be affected by reducing the sample rate of the audio training data. Reducing the sample rate has several benefits; most notably a reduction in training time. The goal is to determine an appropriate sample rate which balances both localisation accuracy and training time. This information will be used to inform the future training of a neural network for Sound Source Localisation which will be used in a stereo upmixing pipeline. The results of this experiment indicate reducing the sample rate from 48kHz down to below 4kHz results in a significant decrease in localisation accuracy. However, above 4kHz, the decrease in localisation accuracy is minimal whilst training time is reduced significantly. This suggests providing the particular application for the model does not require the highest level of accuracy, a minimal reduction in localisation performance may be acceptable to obtain a large reduction in training time which would also reduce the environmental impact of the model training. A sample rate of 16kHz is suggested as a suitable balance between accuracy and training time.

Speakers

Samuel Hobern

Alan Archer-Boyd

Damian Murphy

Thursday July 2, 2026 1:30pm - 3:00pm CEST
IRCAM:Gallery 1, place Igor Stravinsky Paris 4e

HRTFs, Poster

1:30pm CEST

(P) Optimising HRTFs to Improve Spatial Release from Masking

Thursday July 2, 2026 1:30pm - 3:00pm CEST

IRCAM:Gallery

Binaural hearing supports effective communication in complex acoustic environments by enabling listeners to segregate spatially separated sound sources, a benefit referred to as spatial release from masking (SRM). The spatial cues that give rise to SRM are determined by the head-related transfer function (HRTF). Although individual HRTFs are generally considered optimal for accurate localisation, prior work suggests they do not necessarily maximise performance across all aspects of spatial perception, including SRM. This motivates the concept of application-specific HRTFs. Here, we propose an application-specific HRTF augmentation method to improve speech intelligibility in cocktail-party scenarios, focusing on front–back configurations where SRM is limited. HRTFs are parameterised using principal component analysis and optimised via a differentiable auditory-model-based objective to enhance spectral cues while constraining interaural level differences. The method yields model-predicted SRM gains of 4–9 dB without inducing substantial predicted lateralisation artefacts.

Speakers

Nils Marggraf-Turley

Niels Pontoppidan

Lorenzo Picinali

Thursday July 2, 2026 1:30pm - 3:00pm CEST
IRCAM:Gallery 1, place Igor Stravinsky Paris 4e

HRTFs, Poster

1:30pm CEST

(P) Which Tracking Characteristics from an Audio Only VR Onboarding Predict the Best Performance HRTFs?

Thursday July 2, 2026 1:30pm - 3:00pm CEST

IRCAM:Gallery

This study is motivated by an ambition to determine the ‘best’-matching HRTFs during an onboarding task for an audio-only virtual reality (VR) experience using a ‘shooting down sound sources’ task. The study is motivated by the needs of blind and visually impaired gamers, who may rely more crucially on accurate rendering of auditory spatial cues for succeeding in the audio-only VR experience. We present an exploratory study applying an experimental VR test platform that renders ‘target’ sound sources in a virtual environment and logs tracking characteristics of head, hand-held controller and body while participants localise and ‘shoot’ audible ‘targets’ that are visible (for task familiarisation) and invisible. Four game-relevant sound stimuli and three different HRTFs were tested across eight sessions on two separate days. In this study, we show data collected from fifteen seeing participants, which demonstrate an ability to localise the sound sources accurately. The tracking data suggests various search patterns (e.g. hemisphere swaps and direction reversals) associated with ‘weak’ localisation cues and possible ambiguities. The search patterns are likely all quantifiable via angular error, response time, path length, search directions, number of reversals, and search speed as determined from the tracking characteristics.

Speakers

Max Væhrens

Stefania Serafin

Flemming Christensen

Dorte Hammershøi

Thursday July 2, 2026 1:30pm - 3:00pm CEST
IRCAM:Gallery 1, place Igor Stravinsky Paris 4e

HRTFs, Poster

3:00pm CEST

Coffee

Thursday July 2, 2026 3:00pm - 3:30pm CEST

IRCAM:Gallery

Thursday July 2, 2026 3:00pm - 3:30pm CEST
IRCAM:Gallery 1, place Igor Stravinsky Paris 4e

Social event

9:30am CEST

Coffee

Friday July 3, 2026 9:30am - 10:00am CEST

IRCAM:Gallery

Friday July 3, 2026 9:30am - 10:00am CEST
IRCAM:Gallery 1, place Igor Stravinsky Paris 4e

Social event

12:30pm CEST

Lunch

Friday July 3, 2026 12:30pm - 1:30pm CEST

IRCAM:Gallery

Friday July 3, 2026 12:30pm - 1:30pm CEST
IRCAM:Gallery 1, place Igor Stravinsky Paris 4e

Social event

1:30pm CEST

(P) Acoustic and Perceptual Evaluation of Integrated Near-Ear Speakers vs. Over Head Headphones in VR Environments

Friday July 3, 2026 1:30pm - 4:00pm CEST

IRCAM:Gallery

Virtual reality (VR) technologies have become increasingly widespread, extending beyond their traditional military and professional training applications to areas such as education, simulation, gaming, and entertainment. Most modern VR headsets are equipped with built-in near-ear speakers, commonly called nearphones. Between conventional headphones and loudspeakers, these devices and nearphones offer a convenient and lightweight audio solution without physically enclosing the ear. However, their impact on spatial audio perception and localization performance remains underexplored. This study explores how nearphones integrated into head-mounted displays (HMDs) perform relative to traditional headphones, focusing on identifying the specific acoustic and perceptual factors that enhance or hinder immersive audio experiences in virtual reality. Using the Oculus Quest 3 as a test platform, the research was divided into two parts. First, the frequency response of both headphones and earphones was measured to assess differences in sound quality. Second, a VR first-person shooter game was developed in Unreal Engine to evaluate sound localization. Participants identified targets based on audio cues alone, and performance metrics such as target accuracy and reaction times were collected to compare localization effectiveness. Besides localization accuracy, this research explored which users prefer audio devices. The results suggest that while traditional headphones generally offer more accurate spatial localization, nearphones provide greater comfort and convenience, highlighting a trade-off between acoustic precision and user ergonomics in VR applications.

Speakers

Zhinuo Li

Friday July 3, 2026 1:30pm - 4:00pm CEST
IRCAM:Gallery 1, place Igor Stravinsky Paris 4e

Perception, Poster

1:30pm CEST

(P) Increasing Accessibility of Auditory Research: A 6-DoF Motion-Capture-Based Interface for Localisation Testing

Friday July 3, 2026 1:30pm - 4:00pm CEST

IRCAM:Gallery

Perceptual evaluation of auditory localisation typically relies on graphical user interfaces, pointing devices, or touch screens to capture listener responses. These modalities implicitly require functional vision and/or manual dexterity, excluding participation of, for instance, people with visual impairments. This paper presents a solution for absolute sound-source localisation testing that uses head rotation, tracked by a six-degrees-of-freedom (6-DoF) optical motion-capture system as the response interface and relies solely on auditory cues for calibration and pointing. The paradigm builds on the natural coupling between auditory spatial attention and head orientation. Individual systematic bias is characterised via a mandatory training block in which stimuli are presented at discrete loudspeaker positions. A per-participant linear regression fitted to head-centred training responses provides a bias model that is applied to main-experiment trials, enabling decomposition of localisation error (LE) into constant error (CE, reflecting accuracy) and random error (RE, reflecting precision), following the established accuracy--precision framework for spatial hearing assessment. The specific use case simulates off-sweet-spot listening positions to inform development of a renderer aimed at enhancing the experience of visually impaired audiences consuming audio-described broadcast content. Preliminary data from the control group consisting of sighted participants are presented. The interface design, calibration procedure, and analysis pipeline are offered as a contribution towards inclusive spatial audio evaluation practice.

Speakers

Tomasz Rudzki

Katarzyna Sochaczewska

Michael McLoughlin

Gavin Kearney

Friday July 3, 2026 1:30pm - 4:00pm CEST
IRCAM:Gallery 1, place Igor Stravinsky Paris 4e

Perception, Poster

1:30pm CEST

(P) The impact of audio spatialisation reproduction on the neurophysiological responses of music listeners

Friday July 3, 2026 1:30pm - 4:00pm CEST

IRCAM:Gallery

Research into listeners’ emotional experience of different audio formats heavily relies on subjective, self-report measures. However, little is known about neural and physiological responses. As such, this feasibility study utilised electroencephalography (EEG), Heart Rate (HR) and Galvanic Skin Response (GSR), to explore the objective neurophysiological impacts of mono, stereo and spatial audio formats, across different music genres. In a within-subjects design, participants listened to 27 randomised stimuli, each comprising of a 30 second music excerpt across the three audio formats. Results were not significant but trends did arise in the data. While mono formats were shown to elevate cognitive load and arousal, spatial audio elicited a decrease in physiological arousal, promoting a more relaxed state. However, the effects overall were very genre-dependent. Differences in physiological response between static and dynamic spatial reproduction of different music genres are discussed. While limited by the lack of subjective validation and sample size, this study highlights interesting relationships between audio format and the physiological responses of music listeners.

Speakers

Laney Haywood

Sam Hobern

Isao Nambu

Nicolas Epain

Helena Daffern

Gavin Kearney

Friday July 3, 2026 1:30pm - 4:00pm CEST
IRCAM:Gallery 1, place Igor Stravinsky Paris 4e

Perception, Poster

1:30pm CEST

(P) Virtualising SPHERE: active listening in 3D sound localisation

Friday July 3, 2026 1:30pm - 4:00pm CEST

IRCAM:Gallery

Spatial hearing emerges from the integration of auditory, multisensory, and motor information, and is enhanced in natural conditions through active listening, where head and body movements provide dynamic cues that improve localisation accuracy and perceptual stability. This principle is central to immersive audio research in Virtual and Augmented Reality (VR/AR), where binaural rendering based on Head-Related Transfer Functions (HRTFs) and room acoustic cues enables the reproduction of interaural, monaural, and distance information. Beyond acoustics, bodily engagement (i.e., reaching toward sound sources) further supports spatial adaptation. These technologies enable controlled experimental protocols for assessment and training in both normal-hearing and hearing-impaired populations. One such paradigm is SPHERE, originally developed to study three-dimensional sound localisation in ecologically valid conditions and later applied to training and rehabilitation, including for cochlear implant users. In its original implementation, participants localise sounds presented via a physically moved loudspeaker and respond either through active exploration or under static listening constraints, while head, eye, and hand movements are tracked to analyse localisation accuracy, motor behaviour, and search strategies. However, reliance on a human operator limits reproducibility and scalability. This work introduces a fully virtualised SPHERE implementation using an immersive binaural rendering framework, preserving the original spatial configuration while enabling real-time multimodal tracking. The system also evaluates the impact of HRTF individualisation by comparing generic and personalised filters. Performance is validated against the original loudspeaker-based paradigm to assess ecological validity. Preliminary results regarding the system’s effectiveness as a research and clinical tool will be presented at the conference.

Speakers

9:30am CEST

12:30pm CEST

1:30pm CEST

1:30pm CEST

1:30pm CEST

1:30pm CEST

1:30pm CEST

1:30pm CEST

3:00pm CEST

9:30am CEST

12:30pm CEST

1:30pm CEST

1:30pm CEST

1:30pm CEST

1:30pm CEST

Get help with the event