AES 2026 AVARIG Conference: Full Schedule

Schedule as of May 2026 - subject to change

Default Time Zone is EDT - Eastern Daylight Time

12:00pm CEST

Benchmarking Spatial Audio Reproduction Systems in Smart Glasses and XR Headsets: An Application-Driven Measurement Framework

Tuesday June 30, 2026 12:00pm - 12:30pm CEST

Jussieu:Conf 1

This paper presents an application-driven objective measurement framework for benchmarking spatial audio reproduction in smart glasses and extended reality (XR) headsets. Wearable XR devices render virtual spatial audio while users simultaneously perceive the physical acoustic environment, creating evaluation cIRCAM:Galleryenges distinct from conventional headphone-based playback. Existing approaches are often inconsistent, focusing on limited device classes or metrics, and do not support unified cross-device benchmarking. The proposed framework derives benchmark attributes from two application dimensions: the acoustic role of the device and the usage context. Measurements are organized into four groups: baseline playback checks, cue fidelity, sound leakage, and robustness to wearing variability. The framework adopts a system-level methodology that characterizes observable device behavior without requiring access to proprietary internal parameters, enabling reproducible cross-device comParison. An illustrative application of the framework is presented in a companion paper.

Speakers

Yi Wu

Garrett Treanor

Agnieszka Roginska

Tuesday June 30, 2026 12:00pm - 12:30pm CEST
Jussieu:Conf 1 4, place Jussieu Paris 5e

MicArray / HeadWorn, Lecture

12:30pm CEST

Investigating the perceptual impact of head-worn devices for augmented reality using a dynamic task with continuous head–eye tracking

Tuesday June 30, 2026 12:30pm - 1:00pm CEST

Jussieu:Conf 1

Augmented reality (AR) systems require listeners to wear head-worn devices (HWDs) such as headphones and head-mounted displays (HMDs), which can alter spatial hearing by modifying the acoustic cues reaching the listeners’ ears. Although acoustical and perceptual effects have been reported for isolated HWDs, most studies rely on simplified paradigms such as static sound source localisation tasks, providing limited insight into spatial perception in more ecological settings. In everyday listening, spatial perception is an active multisensory process in which listeners coordinate head and eye movements to build a stable representation of the environment, which may be disrupted by altered auditory cues. In this work, the perceptual impact of wearing HWDs was investigated using an auditory-aided visual search task with continuous tracking of head and eye movements. Multiple HWD configurations were compared, including two pairs of headphones with and without an HMD, to assess how scattering introduced by these devices affects spatial hearing in ecologically relevant AR scenarios. Results showed small but statistically significant effects of HWDs on exploration behaviour, primarily reflected in increased eye-movement search time, while head movements were only marginally affected. Across conditions, eye movements preceded head movements, with subtle differences in movement onset timing but limited impact on overall search performance. Overall, the findings indicate that HWDs introduce measurable but moderate changes in eye–head coordination, while largely preserving spatial search performance in ecologically valid listening conditions.

Speakers

Fulvio Missoni

Julie Meyer

Andrea Canessa

Tuesday June 30, 2026 12:30pm - 1:00pm CEST
Jussieu:Conf 1 4, place Jussieu Paris 5e

MicArray / HeadWorn, Lecture

3:00pm CEST

Perceptual Assessment of Real-Time Diffraction Modelling in Augmented Reality

Tuesday June 30, 2026 3:00pm - 3:30pm CEST

Jussieu:Conf 1

Including diffraction modelling in an acoustic simulation is known to improve the plausibility of rendered room acoustics in Virtual Reality (VR). In VR, acoustic rendering only needs to satisfy the expectations raised by the visual room impression. In Augmented Reality (AR), however, the user’s natural acoustic environment provides an additional reference, which typically increases the perceptual demands. This study assesses a selection of diffraction modelling approaches in an augmented reality (AR) setting in an L-shaped corridor. The participants rated the plausibility and similarity using a paired-comParison paradigm. ComParisons were included between acoustic simulations and between simulations and a real sound source. This is, to the best of the authors’ knowledge, the first experiment investigating diffraction perception in an AR context. The results indicated that room auralisation including diffraction was rated as more plausible than auralisation without, similar to VR experiments. However, the real sound source was rated as more plausible than all of the simulations. These observations suggest that the relative performance of room acoustic modelling is perceived similarly in VR and AR experiments, but needs further improvement to be suitable for occlusion scenarios in AR, where diffraction modelling might not be the main limitation. In general, perceptually accurate acoustic modelling of a complex real environment remains a cIRCAM:Galleryenge in AR.

Speakers

Joshua Mannall

Annika Neidhardt

Lauri Savioja

Paul Calamia

Russell Mason

Enzo De Sena

Tuesday June 30, 2026 3:00pm - 3:30pm CEST
Jussieu:Conf 1 4, place Jussieu Paris 5e

Auralization / 6DoF, Lecture

3:30pm CEST

Assessing Efficient Auralization Methods in Architectural Virtual Environments

Tuesday June 30, 2026 3:30pm - 4:00pm CEST

Jussieu:Conf 1

Auralization enables multisensory evaluation of architectural designs in Virtual Reality (VR), yet physically accurate acoustic simulations remain computationally prohibitive for interactive workflows. This study investigates efficient artificial reverberation methods as lightweight proxies for different stages of VR-based architectural design. After assessing the predictive capabilities of geometrically informed models, a hierarchical 3-Alternative-Forced-Choice listening experiment with a transferring task paradigm was conducted in VR using binaural audio. In this experiment, the measured room impulse responses of a physical space in untreated and acoustically treated conditions were compared with those from five auralization techniques. These techniques ranged from industry-standard simulations to artificial reverberators, all calibrated to the measured Energy Decay Curves. Statistical analysis revealed that the pure Image-Source Method was easily detected, likely because the late reverberation's temporal density was insufficient. Conversely, when incorporating a dense late reverberant tail, computationally efficient methods achieved perceptual comparability with high-fidelity simulations. Participants prioritized the timbral quality of late reverberation over geometric early reflections. This suggests that computationally efficient models can serve as convincing, scalable rendering tools for interactive design and presents this audiovisual VR paradigm as an ecologically valid platform for multisensory architectural assessment.

Speakers

Achilleas Xydis

Georgios Papadimitriou

Nils Meyer-Kahlen

Hanna Järveläinen

Carlotta Daro

Matthias Kohler

Rama Gottfried

Tuesday June 30, 2026 3:30pm - 4:00pm CEST
Jussieu:Conf 1 4, place Jussieu Paris 5e

Auralization / 6DoF, Lecture

10:30am CEST

A survey of HRTF dataset use in academia and industry reveals no de facto standard

Wednesday July 1, 2026 10:30am - 11:00am CEST

Jussieu:Conf 1

Head-related transfer functions (HRTFs) are crucial for plausible binaural audio playback for virtual, augmented, and mixed-reality applications. In such applications, humans showed higher sound-localisation accuracy, higher perceived externalisation, and experience less colouration when using their individual HRTFs compared to non-individual HRTFs. Because high-quality individual HRTFs require cumbersome measurements in specialised facilities, applications often use non-indivdual or dummy-head HRTFs as a practical alternative. Humans are able to adapt to non-individual HRTFs, which leads to a localisation performance comparable to that achieved with individual HRTFs. Therefore, adaptation to non-individual HRTFs could be a practical alternative whenever individual HRTFs are unavailable; However, this would only be possible if the same non-individual standard HRTF was used across different applications. To find out if this is the case, we conducted a survey on HRTF usage among 76 professionals working in the field of spatial audio. The findings suggest that there is currently no de facto standard HRTF. Surprisingly, only half of those with access to individual HRTFs are actually using them, and most would be willing to switch to a default HRTF set if one was established.

Speakers

Fabian Brinkmann

Katharina Pollack

Nils Meyer-Kahlen

Pedro Lladó

Wednesday July 1, 2026 10:30am - 11:00am CEST
Jussieu:Conf 1 4, place Jussieu Paris 5e

HRTFs, Lecture

11:00am CEST

Assessing localisation and localisation uncertainty for off-centre listening in a stereo loudspeaker setup

Wednesday July 1, 2026 11:00am - 11:30am CEST

Jussieu:Conf 1

In loudspeaker-based reproduction, the spatial quality deteriorates when the listeners move outside the sweet spot. While this seems well known in the spatial audio community, perceptual data that allows quantifying this effect is not common, which prevents suggesting solutions for off-centre listening. In this study, we collected perceptual data to answer three main hypotheses: (H1) that localisation for stereo reproduction over loudspeakers and over headphones with binaural recordings of a dummy head would result in similar perceptual outcomes, (H2) that translation of the listener is equivalent to adding the corresponding interchannel time and level differences in the sweet spot, and (H3) that the spread of localisation responses is correlated to the localisation uncertainty perceived by the listener. Regarding H1, the responses for binaural recordings and loudspeakers were equivalent within a 2° margin. Regarding H2, localisation off-centre produced only a shift in the responses compared to interchannel time and level differences in the sweet spot. Regarding H3, the spread in localisation responses strongly correlated with the perceived uncertainty ratings. Altogether, the results suggest that a localisation test using binaural recordings in the sweet spot — including interaural time and level differences — may be sufficient to characterise off-centre localisation and localisation uncertainty for stereo reproduction.

Speakers

Pedro Llado

Rapolas Daugintis

Enzo De Sena

Wednesday July 1, 2026 11:00am - 11:30am CEST
Jussieu:Conf 1 4, place Jussieu Paris 5e

Stereo, Lecture

11:30am CEST

Crosstalk Cancellation in Loudspeaker Arrays: Effects of Directivity, Array Size, and Listener Position

Wednesday July 1, 2026 11:30am - 12:30pm CEST

Jussieu:Conf 1

Crosstalk Cancellation (CTC) is a technology that enables binaural audio reproduction over loudspeakers. The performance of a CTC system depends on multiple factors, including the geometry of the system, the characteristics of the loudspeakers, and the accuracy of the plant models used to design the CTC filters. While previous studies have examined some of these factors, the combined influence of loudspeaker directivity, array size and listener position has received limited attention. This study models loudspeakers with a spherical pole cap and uses interpolated Neumann KU 100 head-related transfer functions to generate accurate plant responses. CTC filters are computed using a Tikhonov-regularised pseudoinverse approach, and numerical simulations are performed to evaluate the impact of directivity, array geometry and listener orientation on CTC performance.

Speakers

Francesco Veronesi

Filippo Fazi

Jacob Hollebon

Wednesday July 1, 2026 11:30am - 12:30pm CEST
Jussieu:Conf 1 4, place Jussieu Paris 5e

Transaural, Lecture

12:00pm CEST

The illusion of elevated lateral sources using loudspeakers in the horizontal plane

Wednesday July 1, 2026 12:00pm - 12:30pm CEST

Jussieu:Conf 1

Spectral manipulation techniques offer a means of generating virtual sound-source elevation using horizontal loudspeakers. In comParison to cross-talk cancellation systems, these techniques can be more flexible and operate with even a single loudspeaker. However, the azimuthal stability of such approaches remains uncharacterised. This study evaluates the effectiveness of magnitude-based difference-spectrum filtering across lateral source positions, including intermediate positions rendered via amplitude panning, in loudspeaker-based reproduction. Direction-dependent filters derived from a mean HRTF magnitude response were applied over a horizontal-plane loudspeaker array, with physically elevated loudspeakers at matched azimuths serving as perceptual references. Perceived virtual elevation was quantified using the illusion ratio, a novel metric expressing virtual elevation shift as a proportion of the physical elevation shift at each azimuth. Virtual elevation reached approximately 50% of the physical elevation shift at central azimuths, decreasing significantly with lateral displacement, consistent with the reduced effectiveness of monaural spectral cues at lateral positions. A greater virtual elevation effect was observed for ipsilateral rather than contralateral source positions relative to the filter ear. Stimulus class did not significantly alter the azimuth-dependent structure of the effect. These results demonstrate that magnitude-based spectral elevation synthesis produces a measurable and robust elevation effect, most pronounced for central sources.

Speakers

Rapolas Daugintis

Enzo De Sena

Monty Bland

Pedro Lladó

Wednesday July 1, 2026 12:00pm - 12:30pm CEST
Jussieu:Conf 1 4, place Jussieu Paris 5e

Transaural, Lecture

12:30pm CEST

Trading of interaural time and level differences for stimuli presented using a novel two-listener virtual imaging system

Wednesday July 1, 2026 12:30pm - 1:00pm CEST

Jussieu:Conf 1

Extensive research has investigated the relative influence of interaural level and time differences (ILDs and ITDs) on the perceived position of aural stimuli. Historically, these cues have been compared using trading methods with stimuli presented over headphones. For the purpose of virtual audio applications using multichannel techniques, it is important to establish whether interaural cues are exploited similarly in such listening conditions. In this work, trading experiments were carried out both with stimuli presented over headphones and using a novel two-listener crosstalk cancellation array. Listener responses revealed similar trading behaviour in the crosstalk cancellation case when compared to the headphones case. At the lowest frequency tested, the measured trading behaviour is considered less reliable due to inaccuracies in reproduction of the target stimuli. With this exception, this work demonstrates that the general trends observed in historical ILD/ITD trading experiments also apply to stimuli presented using crosstalk cancellation, namely increased sensitivity to ILD and decreased sensitivity to ITD with increasing frequency.

Speakers

Isaac Lambert

Vlad Paul

Philip Nelson

Wednesday July 1, 2026 12:30pm - 1:00pm CEST
Jussieu:Conf 1 4, place Jussieu Paris 5e

Transaural, Lecture

2:00pm CEST

Primary Source Dominance and Acoustic Scene Complexity in 6DoF VR Audio Evaluation

Wednesday July 1, 2026 2:00pm - 2:30pm CEST

Jussieu:Conf 1

In virtual reality (VR) experiences, a primary source often refers to a sound object designated for a central role within a scene, contrasted with contextual background sources. While such sources are typically assumed to guide perceptual attention, it remains unclear whether a designated primary source maintains dominance in overall audio quality evaluation as acoustic scene complexity increases, particularly in six-degrees-of freedom (6DoF) scenarios. This study investigates how per-source rendering quality and scene complexity influence overall audio quality evaluation in 6DoF VR. Rendering quality was manipulated independently for a primary source and background sources, and scene complexity was varied based on the number of sources. Rank-order elimination-by-aspects (EBA) was applied to test dominance patterns across conditions. Results indicate that under low scene complexity, overall evaluation mainly depended on primary source rendering quality. However, in high complexity multisource scenes, this dominance was no longer observed, and evaluation dependence became distributed across sources.

Speakers

Haowen Zhao

Damian Murphy

Wednesday July 1, 2026 2:00pm - 2:30pm CEST
Jussieu:Conf 1 4, place Jussieu Paris 5e

Cognition, Lecture

2:30pm CEST

Investigating the Perceptual Relevance of Voice Directivity in Virtual Vocal Instruction Environments

Wednesday July 1, 2026 2:30pm - 3:00pm CEST

Jussieu:Conf 1

Given the limited research on the use of extended reality (XR) technologies in remote music instruction from the perspective of music tutors, this work examines the perceptual importance of voice directivity within a virtual reality (VR) environment. In particular, the perceptual ability to discriminate differences between a measured vocal directivity pattern and a slightly modified omnidirectional directivity pattern is investigated. Two listening tests were conducted to probe directivity perception under (i) static and (ii) dynamic listener conditions within a simulated music practice room, integrating 3rd order Ambisonics Room Impulse Responses (RIRs) and head-tracked binaural reproduction within an interactive Unity-based interface. The static test used an ABX discrimination model to assess directivity detectability as a function of location and stimulus content. The dynamic test involved free navigation around a virtual singer and the evaluation of the perceived directional plausibility, naturalness of the sound emission, and the adequacy of the experience for the assessment of the singer’s vocal characteristics. The results suggest that while listeners can detect differences between vocal directivity patterns under controlled listening conditions, such differences may become less perceptually salient during dynamic interaction within a virtual environment. Nevertheless, the overall positive evaluations in the dynamic listening test indicate that the implemented spatial audio approach provides a plausible and effective auditory experience, supporting its potential use in XR-based applications for remote music instruction and performance evaluation.

Speakers

Eleni Tavelidou

Konstantinos Bakogiannis

Areti Andreopoulou

Wednesday July 1, 2026 2:30pm - 3:00pm CEST
Jussieu:Conf 1 4, place Jussieu Paris 5e

Cognition, Lecture

3:00pm CEST

Personality Perception in Acoustic Virtual Reality

Wednesday July 1, 2026 3:00pm - 3:30pm CEST

Jussieu:Conf 1

This study investigates how perceived auditory distance in Virtual Reality (VR) influences social perception, specifically personality attribution. Building on research linking social and physical distance, the work explores whether speakers who sound closer or farther away are judged differently in terms of personality traits. Using the SONICOM 3D Speaker Personality Corpus, the study analysed 360 spatialised speech samples from 120 speakers. Each sample was evaluated by 10 listeners, who rated both perceived auditory distance and five personality traits (Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism). Ratings were averaged to obtain a single score per recording. The analysis proceeded in two stages. First, correlations between perceived distance and personality traits were examined. Results showed significant relationships for two traits: Extraversion and Agreeableness. Specifically, speakers perceived as closer were judged as less extraverted but more agreeable, while more distant speakers were perceived as more extraverted and less agreeable. Second, machine learning models were developed to predict personality trait scores from the speech data. A feedforward neural network achieved above-chance classification performance across all traits. When the model was extended to jointly predict both personality traits and perceived distance, performance improved significantly for all traits. This suggests that perceived distance and personality attribution are linked, and that this relationship includes non-linear patterns not captured by simple correlation analysis. Overall, this is the first study to demonstrate an interaction between perceived physical distance and speech-based personality judgments. The findings highlight the importance of spatial audio in shaping social perception in VR and Extended Reality (XR). They suggest that manipulating the perceived distance of virtual speakers could influence how users interpret social cues, potentially enhancing the design of virtual agents for roles such as teachers, assistants, or companions.

Speakers

Eva Fringi

Nisreen Alshubaily

Stephen A. Brewster

Lorenzo Picinali

Tanaya Guha

Alessandro Vinciarelli

Wednesday July 1, 2026 3:00pm - 3:30pm CEST
Jussieu:Conf 1 4, place Jussieu Paris 5e

Cognition, Lecture

4:30pm CEST

The Effect of Authentic Spatial Sound on Verbal Working Memory in Online Virtual Reality Learning Environments

Wednesday July 1, 2026 4:30pm - 5:00pm CEST

Jussieu:Conf 1

This paper investigates the impact of authentic spatial audio on verbal working memory (WM) within a WebXR-based virtual reality learning environment (VRLE). While prior virtual reality (VR) research has predominantly focused on visual modalities, the influence of auditory realism, particularly authentic spatialised sound, on cognitive performance remains underexplored. To address this gap, a controlled within-subjects experiment was conducted using an adapted automated operation span (AOSPAN) task under two conditions: with and without authentic spatial sound. A total of 40 participants completed the study using a head-mounted display in a controlled laboratory setting. The VRLE was implemented using web-based technologies, incorporating ambisonics audio capture and real-time spatial sound rendering. Statistical analysis revealed no significant differences in WM performance across conditions for all measured metrics, including OSPAN score, total correct recall, and error rates. However, results consistently showed a non-significant trend toward improved performance in the presence of authentic spatial sound. In contrast, subjective measures indicated substantial enhancements in perceived presence, immersion, realism, and user preference when spatial ambient audio was enabled. These findings suggest that while authentic spatial sound does not significantly influence verbal WM performance, it enhances experiential quality without increasing cognitive load. The study highlights the importance of incorporating realistic auditory environments in VR design for education, supporting user engagement while maintaining cognitive neutrality.

Speakers

Vincent Russell

David Murphy

Wednesday July 1, 2026 4:30pm - 5:00pm CEST
Jussieu:Conf 1 4, place Jussieu Paris 5e

Cognition, Lecture

5:00pm CEST

Assessing the Impact of Spatial Audio on Cognitive Load and Memory Retention for Virtual Training Simulation in Virtual Reality

Wednesday July 1, 2026 5:00pm - 5:30pm CEST

Jussieu:Conf 1

This paper examines whether spatialised audio improves cognitive load and memory retention in Virtual Reality training. Using a commercial VR public speaking module developed by BODYSWAPS, 1,350 real-world users were randomly assigned to either a standard audio (control) or fully spatialised audio with virtual acoustics (study) condition. The study ran over a three-year period, making this the largest study of its kind. Participants completed an exit survey rating five data points: comfort, concentration, realism, retention, and simulation. The spatialised audio group reported consistently higher scores overall, with a statistically significant improvement in perceived comfort (p = 0.006, d ≈ 0.44). Directional improvements were also observed in realism and retention, though these did not reach statistical significance. Gaze-time analysis revealed that the spatialised audio. The group spent more time looking at the primary coaching figures, suggesting that spatial audio may support sustained attentional focus on key instructional sources. The findings indicate that spatial audio design is a meaningful contributor to VR training quality, particularly in comfort and perceived realism, with promising trends for learning efficacy.

Speakers

Oliver Kadel

Tomasz Rudzki

Tom Szirtes

Gavin Kearney

Wednesday July 1, 2026 5:00pm - 5:30pm CEST
Jussieu:Conf 1 4, place Jussieu Paris 5e

Cognition, Lecture

5:30pm CEST

SCHuBERT: a real-time end-to-end model for piano music emotion recognition

Wednesday July 1, 2026 5:30pm - 6:00pm CEST

Jussieu:Conf 1

In this study, we present SCHuBERT, a real-time end-to-end Piano Music Emotion Recognition (PMER) system that operates directly on raw audio and fine-tunes DistilHuBERT for short-window classification on the Valence–Arousal (V–A) plane. Designed for low latency and high responsiveness, the system is particularly well suited for immersive applications such as Virtual and Augmented Reality (VR/AR). Compared with both audio- and symbolic-domain baselines, SCHuBERT achieves strong accuracy in four-quadrant (4Q) classification as well as in binary arousal and valence tasks, while maintaining low computational overhead for real-time operation.

Speakers

12:00pm CEST

12:30pm CEST

3:00pm CEST

3:30pm CEST

10:30am CEST

11:00am CEST

11:30am CEST

12:00pm CEST

12:30pm CEST

2:00pm CEST

2:30pm CEST

3:00pm CEST

4:30pm CEST

5:00pm CEST

5:30pm CEST

Get help with the event