AES 2026 AVARIG Conference: Full Schedule

Schedule as of May 2026 - subject to change

Default Time Zone is EDT - Eastern Daylight Time

arrow_back View All Dates

2:00pm CEST

Primary Source Dominance and Acoustic Scene Complexity in 6DoF VR Audio Evaluation

Wednesday July 1, 2026 2:00pm - 2:30pm CEST

In virtual reality (VR) experiences, a primary source often refers to a sound object designated for a central role within a scene, contrasted with contextual background sources. While such sources are typically assumed to guide perceptual attention, it remains unclear whether a designated primary source maintains dominance in overall audio quality evaluation as acoustic scene complexity increases, particularly in six-degrees-of freedom (6DoF) scenarios. This study investigates how per-source rendering quality and scene complexity influence overall audio quality evaluation in 6DoF VR. Rendering quality was manipulated independently for a primary source and background sources, and scene complexity was varied based on the number of sources. Rank-order elimination-by-aspects (EBA) was applied to test dominance patterns across conditions. Results indicate that under low scene complexity, overall evaluation mainly depended on primary source rendering quality. However, in high complexity multisource scenes, this dominance was no longer observed, and evaluation dependence became distributed across sources.

Speakers

Haowen Zhao

Damian Murphy

Wednesday July 1, 2026 2:00pm - 2:30pm CEST
Jussieu:Conf 1 4, place Jussieu Paris 5e

Cognition, Lecture

2:30pm CEST

Investigating the Perceptual Relevance of Voice Directivity in Virtual Vocal Instruction Environments

Wednesday July 1, 2026 2:30pm - 3:00pm CEST

Jussieu:Conf 1

Given the limited research on the use of extended reality (XR) technologies in remote music instruction from the perspective of music tutors, this work examines the perceptual importance of voice directivity within a virtual reality (VR) environment. In particular, the perceptual ability to discriminate differences between a measured vocal directivity pattern and a slightly modified omnidirectional directivity pattern is investigated. Two listening tests were conducted to probe directivity perception under (i) static and (ii) dynamic listener conditions within a simulated music practice room, integrating 3rd order Ambisonics Room Impulse Responses (RIRs) and head-tracked binaural reproduction within an interactive Unity-based interface. The static test used an ABX discrimination model to assess directivity detectability as a function of location and stimulus content. The dynamic test involved free navigation around a virtual singer and the evaluation of the perceived directional plausibility, naturalness of the sound emission, and the adequacy of the experience for the assessment of the singer’s vocal characteristics. The results suggest that while listeners can detect differences between vocal directivity patterns under controlled listening conditions, such differences may become less perceptually salient during dynamic interaction within a virtual environment. Nevertheless, the overall positive evaluations in the dynamic listening test indicate that the implemented spatial audio approach provides a plausible and effective auditory experience, supporting its potential use in XR-based applications for remote music instruction and performance evaluation.

Speakers

Eleni Tavelidou

Konstantinos Bakogiannis

Areti Andreopoulou

Wednesday July 1, 2026 2:30pm - 3:00pm CEST
Jussieu:Conf 1 4, place Jussieu Paris 5e

Cognition, Lecture

3:00pm CEST

Personality Perception in Acoustic Virtual Reality

Wednesday July 1, 2026 3:00pm - 3:30pm CEST

Jussieu:Conf 1

This study investigates how perceived auditory distance in Virtual Reality (VR) influences social perception, specifically personality attribution. Building on research linking social and physical distance, the work explores whether speakers who sound closer or farther away are judged differently in terms of personality traits. Using the SONICOM 3D Speaker Personality Corpus, the study analysed 360 spatialised speech samples from 120 speakers. Each sample was evaluated by 10 listeners, who rated both perceived auditory distance and five personality traits (Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism). Ratings were averaged to obtain a single score per recording. The analysis proceeded in two stages. First, correlations between perceived distance and personality traits were examined. Results showed significant relationships for two traits: Extraversion and Agreeableness. Specifically, speakers perceived as closer were judged as less extraverted but more agreeable, while more distant speakers were perceived as more extraverted and less agreeable. Second, machine learning models were developed to predict personality trait scores from the speech data. A feedforward neural network achieved above-chance classification performance across all traits. When the model was extended to jointly predict both personality traits and perceived distance, performance improved significantly for all traits. This suggests that perceived distance and personality attribution are linked, and that this relationship includes non-linear patterns not captured by simple correlation analysis. Overall, this is the first study to demonstrate an interaction between perceived physical distance and speech-based personality judgments. The findings highlight the importance of spatial audio in shaping social perception in VR and Extended Reality (XR). They suggest that manipulating the perceived distance of virtual speakers could influence how users interpret social cues, potentially enhancing the design of virtual agents for roles such as teachers, assistants, or companions.

Speakers

Eva Fringi

Nisreen Alshubaily

Stephen A. Brewster

Lorenzo Picinali

Tanaya Guha

Alessandro Vinciarelli

Wednesday July 1, 2026 3:00pm - 3:30pm CEST
Jussieu:Conf 1 4, place Jussieu Paris 5e

Cognition, Lecture

4:30pm CEST

The Effect of Authentic Spatial Sound on Verbal Working Memory in Online Virtual Reality Learning Environments

Wednesday July 1, 2026 4:30pm - 5:00pm CEST

Jussieu:Conf 1

This paper investigates the impact of authentic spatial audio on verbal working memory (WM) within a WebXR-based virtual reality learning environment (VRLE). While prior virtual reality (VR) research has predominantly focused on visual modalities, the influence of auditory realism, particularly authentic spatialised sound, on cognitive performance remains underexplored. To address this gap, a controlled within-subjects experiment was conducted using an adapted automated operation span (AOSPAN) task under two conditions: with and without authentic spatial sound. A total of 40 participants completed the study using a head-mounted display in a controlled laboratory setting. The VRLE was implemented using web-based technologies, incorporating ambisonics audio capture and real-time spatial sound rendering. Statistical analysis revealed no significant differences in WM performance across conditions for all measured metrics, including OSPAN score, total correct recall, and error rates. However, results consistently showed a non-significant trend toward improved performance in the presence of authentic spatial sound. In contrast, subjective measures indicated substantial enhancements in perceived presence, immersion, realism, and user preference when spatial ambient audio was enabled. These findings suggest that while authentic spatial sound does not significantly influence verbal WM performance, it enhances experiential quality without increasing cognitive load. The study highlights the importance of incorporating realistic auditory environments in VR design for education, supporting user engagement while maintaining cognitive neutrality.

Speakers

Vincent Russell

David Murphy

Wednesday July 1, 2026 4:30pm - 5:00pm CEST
Jussieu:Conf 1 4, place Jussieu Paris 5e

Cognition, Lecture

5:00pm CEST

Assessing the Impact of Spatial Audio on Cognitive Load and Memory Retention for Virtual Training Simulation in Virtual Reality

Wednesday July 1, 2026 5:00pm - 5:30pm CEST

Jussieu:Conf 1

This paper examines whether spatialised audio improves cognitive load and memory retention in Virtual Reality training. Using a commercial VR public speaking module developed by BODYSWAPS, 1,350 real-world users were randomly assigned to either a standard audio (control) or fully spatialised audio with virtual acoustics (study) condition. The study ran over a three-year period, making this the largest study of its kind. Participants completed an exit survey rating five data points: comfort, concentration, realism, retention, and simulation. The spatialised audio group reported consistently higher scores overall, with a statistically significant improvement in perceived comfort (p = 0.006, d ≈ 0.44). Directional improvements were also observed in realism and retention, though these did not reach statistical significance. Gaze-time analysis revealed that the spatialised audio. The group spent more time looking at the primary coaching figures, suggesting that spatial audio may support sustained attentional focus on key instructional sources. The findings indicate that spatial audio design is a meaningful contributor to VR training quality, particularly in comfort and perceived realism, with promising trends for learning efficacy.

Speakers

Oliver Kadel

Tomasz Rudzki

Tom Szirtes

Gavin Kearney

Wednesday July 1, 2026 5:00pm - 5:30pm CEST
Jussieu:Conf 1 4, place Jussieu Paris 5e

Cognition, Lecture

5:30pm CEST

SCHuBERT: a real-time end-to-end model for piano music emotion recognition

Wednesday July 1, 2026 5:30pm - 6:00pm CEST

Jussieu:Conf 1

In this study, we present SCHuBERT, a real-time end-to-end Piano Music Emotion Recognition (PMER) system that operates directly on raw audio and fine-tunes DistilHuBERT for short-window classification on the Valence–Arousal (V–A) plane. Designed for low latency and high responsiveness, the system is particularly well suited for immersive applications such as Virtual and Augmented Reality (VR/AR). Compared with both audio- and symbolic-domain baselines, SCHuBERT achieves strong accuracy in four-quadrant (4Q) classification as well as in binary arousal and valence tasks, while maintaining low computational overhead for real-time operation.

Speakers

AES 2026 AVARIG Conference

2:00pm CEST

Haowen Zhao

Damian Murphy

2:30pm CEST

Eleni Tavelidou

Konstantinos Bakogiannis

Areti Andreopoulou

3:00pm CEST

Eva Fringi

Nisreen Alshubaily

Stephen A. Brewster

Lorenzo Picinali

Tanaya Guha

Alessandro Vinciarelli

4:30pm CEST

Vincent Russell

David Murphy

5:00pm CEST

Oliver Kadel

Tomasz Rudzki

Tom Szirtes

Gavin Kearney

5:30pm CEST

Massimiliano Zanoni

Riccardo Rossi

Alice Sironi

Paolo Belluco

Cristina Rottondi

Get help with the event