Loading…
Schedule as of May 2026 - subject to change

Default Time Zone is EDT - Eastern Daylight Time


arrow_back View All Dates
Wednesday, July 1
 

9:30am CEST

Coffee
Wednesday July 1, 2026 9:30am - 10:00am CEST
Wednesday July 1, 2026 9:30am - 10:00am CEST
Cafe / Lunch

10:00am CEST

Designing Spatial Audio Environments for Autonomic Regulation: Toward a Framework of Social Sonic Design
Wednesday July 1, 2026 10:00am - 10:30am CEST
Research in spatial audio has traditionally focused on localization accuracy, spatial realism, and rendering algorithms. Comparatively little work has examined how intentionally designed spatial audio environments may influence listener physiological regulation and emotional perception. This paper introduces the concept of Social Sonic Design, a framework that examines how spatially organized sonic information within and external to XR contexts may affect autonomic nervous system response and autobiographical memory. Spatial audio cues such as proximity, elevation, and diffuse reverberation influence listener perception of environmental stability and safety. Building on these perceptual principles, the present study investigates whether object-based audio environments incorporating temporally structured and personally meaningful sound materials can influence listener physiological state. Spatial audio infrastructures were constructed from lullabies, caregiving vocalizations, and environmental sonic textures. Postpartum mothers were selected as an initial participant group because caregiving sound environments and lullaby traditions play a central role in maternal–infant interaction and emotional regulation. Immersive sonic infrastructures were produced using spatial audio capture and design techniques including Dolby Atmos multichannel rendering (7.1.2) and binaural headphone reproduction. Sound sources were modified to activate three targeted neuro-cognitive nodes and spatially distributed across the listening environment to create immersive auditory scenes incorporating foreground vocal sources, diffuse environmental textures, and spatialized reverberant fields. Participants experienced these environments in brief episodic listening sessions accompanied by visual media. The episodic presentation structure draws inspiration from media models such as those developed by Miguel Sabido, in which repeated exposure to idealized sensory environments may influence perception and behavioral response over time. Physiological responses were monitored using measures associated with autonomic nervous system activity, including heart rate and heart rate variability (HRV), alongside reports of perceived calm, emotional response, and autobiographical memory recall. Preliminary observations suggest that the sonic infrastructure of immersive lullaby environments evokes caregiving memories and perceived emotional grounding among participants, indicating that spatially designed sonic environments may contribute to changes in listener perception and physiological regulation.
Speakers
Wednesday July 1, 2026 10:00am - 10:30am CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

10:00am CEST

Is now the right time for Procedural Audio?
Wednesday July 1, 2026 10:00am - 11:00am CEST
Procedural audio, sometimes known as digital Foley, is the real-time and controllable generation of sound effects. It is an alternative to sourcing sound effects from vast libraries of pre-recorded samples. It may be used to have sounds adapt to the changing game state, and to dynamically generate all the sounds of a virtual world. However, there are cIRCAM:Galleryenges concerning the diversity of sounds that may be generated, the controllability of procedural audio models and the quality of the sounds that it produces. We address all of these aspects in this presentation. We showcase the opportunities that procedural audio offers and how the cIRCAM:Galleryenges can be surmounted, while providing demonstrations of these concepts. The session opens with an introduction to the presenters before moving into a broad review of procedural audio and its history in game sound design, covering core concepts, prior uses, and how the technology has developed over time. A video presentation accompanies this overview before the workshop turns to an honest examination of the key cIRCAM:Galleryenges facing the field: the diversity of sounds that can be generated, the controllability of procedural models, and the quality of their output. Recent advances tackling these limitations are then discussed, followed by live demonstrations of state-of-the-art procedural audio systems from Nemisindo, which generate dynamic, immersive soundscapes in real time. The session closes with an open questions and answers segment. Attendees will leave with practical insight into how procedural audio can enhance and expand the creative process for game sound designers, a clearer understanding of how to implement dynamic and adaptive sound in their own projects, hands-on exposure to interactive soundscape techniques, and concrete tips and tricks for improving their game audio practice. This session is suitable for sound designers, game developers, and anyone curious about the future of game audio, no prior knowledge of procedural audio is needed.
Wednesday July 1, 2026 10:00am - 11:00am CEST
Jussieu:Room 3 4, place Jussieu Paris 5e

10:00am CEST

(P) Comparing Immersive VR and Tablet-Based Music Experience: Subjective and Physiological Responses Across Presentation Formats
Wednesday July 1, 2026 10:00am - 12:30pm CEST
Immersive virtual reality (VR) is increasingly used to simulate concert experiences, yet it remains unclear whether its experiential advantages are accompanied by corresponding physiological changes when audiovisual content is held constant. The present study compared head-mounted immersive VR concert playback with tablet-based video in a fully counterbalanced within-subject design. Musical content and audio reproduction were identical across conditions, isolating the effect of visual immersion. Results showed consistently higher subjective ratings in VR across all measures, including music-induced affect, chills, perceived presence, and performance liking. In contrast, physiological differences were comparatively small: electrodermal activity showed only a modest increase in VR, and heart rate variability did not reliably differentiate between conditions. These findings suggest that immersive VR substantially enhances subjective music experience, particularly in terms of presence and affective engagement, while corresponding changes in autonomic physiology are limited under controlled conditions. The results indicate that subjective and physiological responses were differentially sensitive to the presentation-format manipulation.
Wednesday July 1, 2026 10:00am - 12:30pm CEST
Jussieu:Room 1 4, place Jussieu Paris 5e

10:00am CEST

(P) Effects of Navigation and Perspective on Presence and Localization in Audio Augmented Reality
Wednesday July 1, 2026 10:00am - 12:30pm CEST
Audio Augmented Reality (AAR) can be experienced through different navigation techniques that may influence presence and spatial perception. This paper investigates the effects of navigation type and listener perspective on exploration behavior, presence, and localization accuracy in AAR systems built with consumer hardware. A within-subjects study compared four conditions: virtual navigation, virtual navigation with head tracking, physical navigation, and physical navigation with head tracking. Fifteen participants completed exploration and sound localization tasks in each condition. Results show that physical navigation increased presence and improved exploration behavior and localization performance compared to virtual navigation, while head tracking paired with a non-individualized HRTF for binaural rendering did not produce significant effects.
Wednesday July 1, 2026 10:00am - 12:30pm CEST
Jussieu:Room 1 4, place Jussieu Paris 5e

10:00am CEST

(P) Effects of Spatial Audiovisual Incongruency on Mental Workload and Task Performance in Immersive VR
Wednesday July 1, 2026 10:00am - 12:30pm CEST
Virtual Reality has emerged as a promising medium for high-stakes training, yet its predominantly visual design places disproportionate demands on attentional resources, limiting capacity for other task-relevant information. Spatial audio cues exploit the underutilized auditory channel to redistribute this load, with demonstrated improvements in reaction time, search efficiency, and situational awareness. However, when audio cues are spatially incongruent with visual targets, task performance degrades. The cognitive and behavioral costs of such incongruency, particularly under increasing visual complexity, remain underexplored. This pilot study examines how audiovisual spatial incongruency affects mental workload and task performance through a within-subjects VR experiment in which 15 participants complete a search-and-respond task across congruent and incongruent audiovisual conditions at three levels of visual complexity. Reaction time, target accuracy, timeouts, and subjective workload are measured across 10 trials per participant. Audiovisual incongruency is hypothesized to increase mental workload and impair performance, with effects amplified under higher visual complexity. Findings will inform spatial audio design for immersive training systems and motivate further investigation into tolerance thresholds for audiovisual misalignment.
Wednesday July 1, 2026 10:00am - 12:30pm CEST
Jussieu:Room 1 4, place Jussieu Paris 5e

10:00am CEST

(P) MAV-C: A Framework for the Joint Objective Estimation of Audio-Visual Complexity in Immersive Virtual Environments
Wednesday July 1, 2026 10:00am - 12:30pm CEST
This paper introduces MAV-C, an offline, signal-based framework for the joint objective estimation of Audio-Visual Complexity (AVC) in locally-rendered interactive games. MAV-C integrates entropy-based Acoustic Scene Complexity (ASC) features with multi-scale visual complexity metrics adapted to video via optical flow variance, and fuses modality-specific scores via Minkowski pooling. Features are normalized to a common scale relative to analytical bounds, ensuring cross-sequence comparability. We present the framework architecture, report initial verification results on synthetic stimuli with known complexity properties, and outline a parametric sensitivity analysis evaluating the effect of Entropy Weight Method (EWM) regularization, motion scaling, and pooling exponent on discriminability across gameplay sequences of varying complexity.
Wednesday July 1, 2026 10:00am - 12:30pm CEST
Jussieu:Room 1 4, place Jussieu Paris 5e

10:00am CEST

(P) Complex Ratio Mask Ambisonics-to-Binaural Rendering with Intensity Vector Features and Perceptual Multi-Objective Optimization
Wednesday July 1, 2026 10:00am - 12:30pm CEST
This paper presents a neural network architecture for binaural rendering of first-order Ambisonics (FOA) signals, enabling headphone listeners to perceive immersive spatial audio from Ambisonic content without requiring individualized Head-Related Transfer Function measurements at the inference time. The model operates in the STFT domain using Complex Ratio Masks (CRM). Unlike magnitude-mask methods that process only the omnidirectional channel and discard phase, the proposed model predicts a shared CRM pair (left and right ear) applied via complex-valued multiplication to all four FOA channels with directional weighting. The omnidirectional channel W contributes at unit weight while directional channels Y, Z, X are weighted at a reduced level, preserving both magnitude and phase information from the full soundfield. The input representation extends standard spectral features with three intensity vector channels that encode sound arrival direction at each time-frequency bin, providing the network with explicit spatial information alongside magnitude and phase cues. Training uses a multi-objective loss that combines waveform-level accuracy (SI-SDR), multi-resolution spectral reconstruction at three complementary time-frequency scales, and interaural level and phase difference terms to jointly optimize signal fidelity and spatial cue preservation. The encoder-decoder backbone is a four-level UNet with residual convolutional blocks and channel-spatial attention at every level, totaling approximately four million parameters. Evaluation against a prior magnitude-masking architecture with 28 million parameters shows that the CRM variant achieves comparable spatial cue preservation with a seven-fold parameter reduction while gaining access to phase information. Processing the signal in a single STFT-domain forward pass avoids the sequential inference of autoregressive time-domain models, yielding computational efficiency suitable for real-time virtual reality deployment
Speakers
avatar for Szymon Zaporowski

Szymon Zaporowski

Teaching and Research Assistant, Gdańsk University of Technology
Researcher at the Department of Multimedia Systems, Gdańsk University of Technology, with a focus on audio machine learning, psychoacoustics, signal processing, automatic speech recognition, and deepfake audio detection. His work sits at the intersection of immersive audio and AI... Read More →
Wednesday July 1, 2026 10:00am - 12:30pm CEST
Jussieu:Room 1 4, place Jussieu Paris 5e

10:00am CEST

(P) Perceptual Limits of Ambisonic Order for Beamforming in Complex Acoustic Scenes
Wednesday July 1, 2026 10:00am - 12:30pm CEST
This study investigates the perceptually sufficient ambisonic order for beamforming in complex acoustic scenes, defined as the minimum spatial resolution above which no audible improvement is perceived. Two beamforming methods were evaluated: hypercardioid and MVDR beamforming. In contrast to previous studies, the case of an ideal microphone array was considered, in order to the evaluate the beamforming methods independently of ambisonic encoding error. Sound scenes were generated using room acoustic simulations and encoded into ambisonic signals. A perceptual evaluation was conducted using a three-interval/two-alternative forced choice (3I/2AFC) test design with an adaptive procedure. The experiment used a production-constrained reference (7th-order) and a high-order reference (19th-order). Results showed that the required order would depend on the beamforming method and characteristics of the sound scene. Diffuseness profiles can be used to analyze the influence of the ambisonic order on the sound field diffuseness and to evaluate whether the directional information available is sufficient to support effective adaptive beamforming.
Wednesday July 1, 2026 10:00am - 12:30pm CEST
Jussieu:Room 1 4, place Jussieu Paris 5e

10:00am CEST

(P) The role of room acoustics in 6DoF interactive audio for XR
Wednesday July 1, 2026 10:00am - 12:30pm CEST
Binaural rendering in extended reality (XR) often employs static acoustic profiles that may not correspond to the user’s visual environment, potentially leading to cross-modal incongruence and the room divergence effect. However, the influence of acoustic–visual mismatch on immersion and cognitive load in interactive six-degrees of-freedom (6DoF) environments remains unclear. This study investigated the impact of acoustic–visual divergence on presence and subjective workload during real-time object interaction. An ITU-R BS.1116-3 compliant critical listening room was reconstructed at 1:1 scale in Unreal Engine 5. Ten critical listeners navigated the environment using a Meta Quest 3 headset while performing a 6DoF hand-tracking task. Spatial audio with virtual acoustics was rendered through OSC. Three acoustic conditions were evaluated: acoustically matched (RT60 = 0.21 s), anechoic (RT60 = 0 s), and highly reverberant (RT60 = 2.0 s). Presence and workload were assessed using the IPQ and NASA-TLX. Results showed a significant reduction in Spatial Presence only between the matched and highly reverberant conditions, while workload remained unaffected. The findings suggest that excessive reverberation disrupts environmental plausibility, whereas reflection absence can be partially compensated by visual and sensorimotor cues.
Wednesday July 1, 2026 10:00am - 12:30pm CEST
Jussieu:Room 1 4, place Jussieu Paris 5e

10:00am CEST

(P) AT_WaveSpace: a Wave Field Synthesis Engine for Research and Authoring, Applied to Near-Field Distance Perception of Focused Sources
Wednesday July 1, 2026 10:00am - 12:30pm CEST
Wave Field Synthesis is a well-established spatialization technique that solves the sweet-spot limitation of conventional sound reinforcement and uniquely allows the synthesis of focused sources — virtual sources positioned between the loudspeaker array and the listener. Despite its potential for extended reality (XR), WFS has remained confined to specialized environments such as live performance, installation or post-production workflows, with no accessible open-source tooling for research and creative authoring. We present AT_WaveSpace, an open-source WFS engine built on JUCE, distributed under the MIT licence, and integrated into Unity game engine, designed to democratize WFS for researchers, developers and creators. Building on the methodological framework of the SoundScape Renderer — which combined WFS engine development with a perceptual research platform — AT_WaveSpace serves simultaneously as a spatial audio delivery tool and as an experimental tool. A perceptual evaluation of near-field distance perception of focused WFS sources was conducted using this framework — a dimension absent from prior literature. Using a Midpoint ComParison procedure, participants were unable to rank sources at 40–100 cm consistently, while they ranked sources at 120–150 cm in correct order. Spectral centroid analysis reveals a distance-dependent timbral variation in the proximal zone whose physical origin remains unclear. Low-frequency ILD remains the primary candidate cue for correct ranking at 120–150 cm. Perspectives for further studies are outlined.
Wednesday July 1, 2026 10:00am - 12:30pm CEST
Jussieu:Room 1 4, place Jussieu Paris 5e

10:00am CEST

Sponsor demos
Wednesday July 1, 2026 10:00am - 12:30pm CEST
Come see the newest developements of our sponsors
Wednesday July 1, 2026 10:00am - 12:30pm CEST
Jussieu:Room 2 4, place Jussieu Paris 5e

10:30am CEST

Sound as Cultural Memory: Participatory Immersive Audio Production for the Witness Blanket VR Environment
Wednesday July 1, 2026 10:30am - 11:00am CEST
This case study examines the Witness Blanket VR Experience to explore how Indigenous‑led immersive audio production can support the safeguarding of intangible cultural heritage in virtual environments. Grounded in Indigenous epistemologies of listening, the study draws on participatory sound collection, documentation of the audio production workflow, and subjective evaluation through community‑engaged events. Results demonstrate how spatial audio and culturally grounded production protocols can enable relational storytelling, ethical engagement, and protocol‑informed VR design.
Wednesday July 1, 2026 10:30am - 11:00am CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

11:00am CEST

Mémoire Vive: Exploring the use of authentic rendering as a narrative process in an AR fiction
Wednesday July 1, 2026 11:00am - 11:30am CEST
This paper presents an exploratory mixed reality prototype investigating how low-cost spatial audio and XR technologies may already enable partially believable augmented reality experiences. Rather than pursuing maximal realism across all modalities, the project relied on selective auditory realism, intentionally degraded sounds and visuals, and progressive perceptual trust building in order to create plausible auditory events within a mixed reality environment. Participants were introduced to a fictional experimental protocol progressively constructing increasingly dense auditory reconstruction around them. At the end of the experience, all virtual reconstructions abruptly disappeared, leaving participants alone in the now silent physical room, revealing the extent to which virtual events had progressively contaminated their perception of reality. Qualitative observations suggest that coherent multimodal staging and expectation shaping play an important role in perceptual acceptance alongside rendering realism itself. Beyond the presented prototype, the project highlights how current XR and spatial audio tools already enable new forms of immersive narrative experiences based on persistent ambiguity between reality and virtual reconstruction.
Wednesday July 1, 2026 11:00am - 11:30am CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

11:00am CEST

Assessing localisation and localisation uncertainty for off-centre listening in a stereo loudspeaker setup
Wednesday July 1, 2026 11:00am - 11:30am CEST
In loudspeaker-based reproduction, the spatial quality deteriorates when the listeners move outside the sweet spot. While this seems well known in the spatial audio community, perceptual data that allows quantifying this effect is not common, which prevents suggesting solutions for off-centre listening. In this study, we collected perceptual data to answer three main hypotheses: (H1) that localisation for stereo reproduction over loudspeakers and over headphones with binaural recordings of a dummy head would result in similar perceptual outcomes, (H2) that translation of the listener is equivalent to adding the corresponding interchannel time and level differences in the sweet spot, and (H3) that the spread of localisation responses is correlated to the localisation uncertainty perceived by the listener. Regarding H1, the responses for binaural recordings and loudspeakers were equivalent within a 2° margin. Regarding H2, localisation off-centre produced only a shift in the responses compared to interchannel time and level differences in the sweet spot. Regarding H3, the spread in localisation responses strongly correlated with the perceived uncertainty ratings. Altogether, the results suggest that a localisation test using binaural recordings in the sweet spot — including interaural time and level differences — may be sufficient to characterise off-centre localisation and localisation uncertainty for stereo reproduction.
Wednesday July 1, 2026 11:00am - 11:30am CEST
Jussieu:Conf 1 4, place Jussieu Paris 5e

11:30am CEST

The Binaural Rendering Toolbox (BRT)
Wednesday July 1, 2026 11:30am - 12:30pm CEST
This workshop introduces the Binaural Rendering Toolbox (BRT), a set of open-source (GPLv3) software libraries, applications, and definitions aimed as a virtual laboratory for spatial psychoacoustic experimentation. The BRT provides a flexible and modular framework for binaural spatialisation, supporting multiple rendering models, including convolution-based and geometric approaches, as well as advanced features such as source directivity, several room acoustics models, individual HRTFs, BRIRs, near-field simulation, and real-time control via OSC.
Wednesday July 1, 2026 11:30am - 12:30pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

11:30am CEST

Crosstalk Cancellation in Loudspeaker Arrays: Effects of Directivity, Array Size, and Listener Position
Wednesday July 1, 2026 11:30am - 12:30pm CEST
Crosstalk Cancellation (CTC) is a technology that enables binaural audio reproduction over loudspeakers. The performance of a CTC system depends on multiple factors, including the geometry of the system, the characteristics of the loudspeakers, and the accuracy of the plant models used to design the CTC filters. While previous studies have examined some of these factors, the combined influence of loudspeaker directivity, array size and listener position has received limited attention. This study models loudspeakers with a spherical pole cap and uses interpolated Neumann KU 100 head-related transfer functions to generate accurate plant responses. CTC filters are computed using a Tikhonov-regularised pseudoinverse approach, and numerical simulations are performed to evaluate the impact of directivity, array geometry and listener orientation on CTC performance.
Wednesday July 1, 2026 11:30am - 12:30pm CEST
Jussieu:Conf 1 4, place Jussieu Paris 5e

12:00pm CEST

The illusion of elevated lateral sources using loudspeakers in the horizontal plane
Wednesday July 1, 2026 12:00pm - 12:30pm CEST
Spectral manipulation techniques offer a means of generating virtual sound-source elevation using horizontal loudspeakers. In comParison to cross-talk cancellation systems, these techniques can be more flexible and operate with even a single loudspeaker. However, the azimuthal stability of such approaches remains uncharacterised. This study evaluates the effectiveness of magnitude-based difference-spectrum filtering across lateral source positions, including intermediate positions rendered via amplitude panning, in loudspeaker-based reproduction. Direction-dependent filters derived from a mean HRTF magnitude response were applied over a horizontal-plane loudspeaker array, with physically elevated loudspeakers at matched azimuths serving as perceptual references. Perceived virtual elevation was quantified using the illusion ratio, a novel metric expressing virtual elevation shift as a proportion of the physical elevation shift at each azimuth. Virtual elevation reached approximately 50% of the physical elevation shift at central azimuths, decreasing significantly with lateral displacement, consistent with the reduced effectiveness of monaural spectral cues at lateral positions. A greater virtual elevation effect was observed for ipsilateral rather than contralateral source positions relative to the filter ear. Stimulus class did not significantly alter the azimuth-dependent structure of the effect. These results demonstrate that magnitude-based spectral elevation synthesis produces a measurable and robust elevation effect, most pronounced for central sources.
Wednesday July 1, 2026 12:00pm - 12:30pm CEST
Jussieu:Conf 1 4, place Jussieu Paris 5e

12:00pm CEST

Lunch A
Wednesday July 1, 2026 12:00pm - 1:00pm CEST
Wednesday July 1, 2026 12:00pm - 1:00pm CEST
Cafe / Lunch

12:30pm CEST

Extending Audio Accessibility Toward Structured and Immersive Auditory Interaction in Gameplay
Wednesday July 1, 2026 12:30pm - 1:00pm CEST
This work investigates how audio accessibility in games can be reconceptualized as a structured and immersive auditory interaction paradigm, rather than a collection of discrete assistive cues. While existing approaches have improved access to gameplay information for visually impaired players, they remain largely event-driven and fragmented, often presenting auditory signals as isolated notifications. Such approaches may limit perceptual continuity and fail to reflect the dynamic, layered nature of interactive environments. The proposed system introduces an auditory information architecture that organizes gameplay information into four continuous layers: navigation, interaction, salience, and environment. Each layer represents a distinct yet interrelated perceptual function. Navigation encodes spatial orientation and movement, interaction conveys player actions and system responses, environment reflects ambient and contextual information, and salience integrates perceptually relevant events—including hazards, state transitions, and attention-driven signals—into a unified and context-sensitive layer. By structuring auditory output across these layers, gameplay information is represented as continuously evolving auditory processes rather than discrete cues. The system is implemented using Unreal Engine and Audiokinetic Wwise, with Max/MSP and RNBO used to extend real-time audio processing. Rather than prioritizing novelty through fully generative synthesis, the approach focuses on transforming and reorganizing existing sound materials through continuous parameter mapping. This enables adaptive auditory behavior while maintaining perceptual clarity and consistency with the game’s sonic identity. Interaction design is extended through a multi-source input framework. A camera-based input layer combines webcam-based motion capture with analysis of screen-mediated interactions, including controller inputs (e.g., mouse, gamepad, touch) and player-character movement within the game environment. These inputs are translated into perceptual features and mapped to auditory parameters, forming a bidirectional interaction loop in which player behavior directly influences auditory output. A user study is planned to evaluate the effectiveness of the proposed system in non-visual navigation tasks. The study will compare the layered auditory architecture with conventional cue-based approaches. Evaluation metrics will include objective measures such as navigation accuracy and task completion time, as well as subjective measures including perceived spatial awareness, perceptual continuity, and immersion. In addition, the study will examine the impact of camera-based interaction on engagement and perceived agency. The evaluation is designed to investigate whether continuous auditory representation improves coherence between auditory feedback and gameplay experience. A central contribution of this work lies in the aesthetic integration of accessibility. Rather than functioning as an external assistive layer, accessibility-oriented audio is embedded within the core sound design. Informational signals emerge through transformations of existing sound materials, allowing perceptual clarity to be achieved without disrupting immersion. This reframes audio accessibility as an integral component of auditory interaction design. From a practical perspective, the system is structured as a modular and parameter-driven framework, allowing scalable implementation across different platforms. Potential constraints related to computational load—particularly in real-time processing and camera-based input—are considered, with an emphasis on efficient parameter mapping and system optimization for resource-limited environments such as mobile and virtual reality.
Speakers
Wednesday July 1, 2026 12:30pm - 1:00pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

12:30pm CEST

Trading of interaural time and level differences for stimuli presented using a novel two-listener virtual imaging system
Wednesday July 1, 2026 12:30pm - 1:00pm CEST
Extensive research has investigated the relative influence of interaural level and time differences (ILDs and ITDs) on the perceived position of aural stimuli. Historically, these cues have been compared using trading methods with stimuli presented over headphones. For the purpose of virtual audio applications using multichannel techniques, it is important to establish whether interaural cues are exploited similarly in such listening conditions. In this work, trading experiments were carried out both with stimuli presented over headphones and using a novel two-listener crosstalk cancellation array. Listener responses revealed similar trading behaviour in the crosstalk cancellation case when compared to the headphones case. At the lowest frequency tested, the measured trading behaviour is considered less reliable due to inaccuracies in reproduction of the target stimuli. With this exception, this work demonstrates that the general trends observed in historical ILD/ITD trading experiments also apply to stimuli presented using crosstalk cancellation, namely increased sensitivity to ILD and decreased sensitivity to ITD with increasing frequency.
Wednesday July 1, 2026 12:30pm - 1:00pm CEST
Jussieu:Conf 1 4, place Jussieu Paris 5e

1:00pm CEST

Effect of the template on the predicted sagittal-plane sound-localization performance
Wednesday July 1, 2026 1:00pm - 1:30pm CEST
In the state-of-the art models of sagittal-plane sound localization, template head-related transfer functions (HRTFs) are used to reflect the listener's internal calibration of auditory space decoding, and thus determine the prediction quality. The effect of the template HRTFs has not yet been investigated directly. Here, a model was calibrated separately to two HRTF measurements of the same listeners and its predictions were compared to behavioral localization responses of these listeners obtained in three listening conditions: two acoustically measured HRTF sets (those used for model calibration), and an additional condition (unseen during the calibration) used to test the model's ability to generalize. We analyzed the quadrant error rates (QE) and local polar errors (PEs) from eight listeners. The predicted errors were similar in both calibration conditions and increased in the unseen condition. The quality of the predictions, however, varied significantly with the template, more for PE than for QE, slightly preferring one template over the other when predicting the unseen condition. Our findings suggest that small differences in HRTFs used for the template may influence the prediction quality, especially when applied to unseen listening conditions.
Wednesday July 1, 2026 1:00pm - 1:30pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

1:00pm CEST

Lunch B
Wednesday July 1, 2026 1:00pm - 2:00pm CEST
Wednesday July 1, 2026 1:00pm - 2:00pm CEST
Cafe / Lunch

1:00pm CEST

(P) Listening from the Booth: Multi-Perspective Auralisation and Theatrical Sound Engineering
Wednesday July 1, 2026 1:00pm - 4:30pm CEST
Sound engineers doing live mixing for theatre must manage the balance between on-stage acoustic sources and electroacoustic sounds diffused in the IRCAM:Gallery, triggering, spatialising and mixing pre-recorded and live sounds while actors perform. Typically confined to the control booth of unfamiliar venues, they need to adapt to a listening perspective that differs significantly from the audience's experience in the stalls or the balconies. This work engages with sound studies, virtual acoustics, and archival practices to investigate complementary questions. How does the acoustic dissociation between the control booth and the rest of the venue influence the technical and aesthetic decisions of sound engineers? How would contemporary engineers interpret archived theatrical soundtracks when guided by annotated scripts? To address these questions, the research unfolds in four stages: capture multiple High Order Ambisonics Impulse Responses from emblematic theatres in São Paulo, Brazil, combining flexible sources setups and multiple listening positions; use them to build a real-time convolution engine, integrating the IRs with actors' voices and archival soundtracks from the collection of Brazilian theatrical sound designer Tunica Teixeira; invite sound engineers to perform mixing tasks in a virtual acoustic environment, guided by Tunica's annotated scripts; use the task metrics and structured questionnaires to assess the impact of multi-perspective listening on their technical and aesthetic decisions.
Wednesday July 1, 2026 1:00pm - 4:30pm CEST
Jussieu:Room 1 4, place Jussieu Paris 5e

1:00pm CEST

(P) Now We're Getting Somewhere – A First Prototype for a Co-Designed, Blind-accessible Auditory Navigation Toolkit for 3D Open-World Video Game Environments
Wednesday July 1, 2026 1:00pm - 4:30pm CEST
Navigation tasks are often used as a fun and engaging method of exploring and interacting with video game environments. 3D open-world games afford curiosity-driven navigation for players, providing opportunities to follow their agency and interact with points of interest within an environment. However, this is commonly a visually-motivated task that is seldom accessible to Blind and Low Vision (BLV) gamers. Given the impact of this barrier, it is imperative to design navigation systems for games that are driven by auditory information to provide equal opportunities for BLV gamers to engage with open world game environments. There is a noted lack of understanding from game developers currently that evidences the need for dialogue between researchers, BLV gamers and developers. In collaboration with both BLV gamers and developers, we present the early, first co-designed prototypes for a customisable, Blind-accessible auditory navigation toolkit in 3D open-world video game environments. We build on a series of dialogic discussions with Disabled gamers who have experienced barriers in their gameplay experiences and preset three navigation tools. We document the design of these tools and present theme explorations from analysis both each co-design phases. We present discussions on including player agency, action precision, gameplay fluidity and cognitive load, categorisation and identification, sound preference, and tutorialisation and learnability. From these themes, we derive design insights that highlight the barriers and considerations for auditory navigation in video games.
Wednesday July 1, 2026 1:00pm - 4:30pm CEST
Jussieu:Room 1 4, place Jussieu Paris 5e

1:00pm CEST

(P) Pleyel.exe
Wednesday July 1, 2026 1:00pm - 4:30pm CEST
Pleyel.exe is an interactive documentary presented as a video game, exploring the evolving landscape of the Carrefour Pleyel district in Saint-Denis. Through free navigation within immersive 3D scans generated from gaussian splatting, visitors can wander through sites in transition. As they explore, they encounter residents’ testimonies, drawn from in-situ recorded and carefully edited interviews, offering personal perspectives on the neighborhood and its ongoing transformations.
Wednesday July 1, 2026 1:00pm - 4:30pm CEST
Jussieu:Room 1 4, place Jussieu Paris 5e

1:00pm CEST

(P) A framework for 6DoF 4pi reverberation generation using wave-based-derived virtual sound sources.
Wednesday July 1, 2026 1:00pm - 4:30pm CEST
Realistic reproduction of spatial reverberation is essential for immersive audio applications, including virtual reality and interactive gaming. While geometrical acoustics methods enable efficient rendering, they do not fully capture wave phenomena such as low-frequency modal behavior and diffraction, which are particularly significant in small spaces. Wave-based simulations provide higher physical accuracy but at substantial computational cost. This paper extends VSVerb, a 4pi sampling reverberator based on virtual sound sources (VS) extracted via sound intensity analysis, to use pressure and three-axis particle velocity computed by a discontinuous Galerkin finite element method (dG-FEM) simulation, enabling reverberation that reflects the wave-based acoustic characteristics of virtual spaces to be generated. Experiments conducted in a university lecture room demonstrate that simulation-based VS distributions and their corresponding impulse responses closely match those derived from actual measurements. ComParison with measured impulse responses and geometrical acoustics ray tracing shows that the proposed method produces room acoustic parameters, including clarity and definition, closer to the measured reference across most metrics and frequency bands. A tendency to underestimate reverberation time was observed, which may be addressed through improved simulation modeling or post-processing. Furthermore, the VS distribution extracted from a single simulation can be adapted to different receiver positions by re-estimating the geometric contribution of each VS, enabling 6DoF navigation support without additional simulation. These results indicate the potential of the proposed framework for wave-based interactive reverberation in virtual spaces.
Wednesday July 1, 2026 1:00pm - 4:30pm CEST
Jussieu:Room 1 4, place Jussieu Paris 5e

1:00pm CEST

(P) An Immersive Sequencer in Virtual Reality with Natural Interaction and Hybrid Audio Reproduction
Wednesday July 1, 2026 1:00pm - 4:30pm CEST
We present Music of the Spheres (MOTS), an immersive virtual reality (VR) sequencer that integrates natural interaction with hybrid spatial audio reproduction for music composition and performance. MOTS enables users to create and manipulate sound objects arranged in a 3D step sequencer surrounding the user. Using hand gestures, users can instantiate, position, and remove sounds, simultaneously composing both temporal and spatial musical structures. The system combines binaural reproduction for private preview in the headset and Ambisonic loudspeaker reproduction for shared listening in audience-oriented experiences. In this paper, we discuss the implementation of MOTS and highlight design considerations for intuitive musical interfaces that are uniquely crafted for VR. We also present the results of a survey of 27 participants at a public exhibition, which indicate positive responses in terms of immersion and usability, as well as a coherent spatial audio experience across the hybrid reproduction system. Finally, we outline future directions, including expanded controls, collaborative functionality, and improved spatial audio rendering.
Wednesday July 1, 2026 1:00pm - 4:30pm CEST
Jussieu:Room 1 4, place Jussieu Paris 5e

1:00pm CEST

(P) An Open-Source Toolchain for Atmos Transcoding and Immersive Playback on Irregular Loudspeaker Arrays
Wednesday July 1, 2026 1:00pm - 4:30pm CEST
Audio Definition Model (ADM) metadata is central to contemporary object-based audio production and sits at the core of Dolby Atmos workflows. Yet in open research, rapid prototyping, immersive media development, and playback on irregular loudspeaker arrays, Atmos-derived material remains difficult to inspect, translate, and deploy without relying on proprietary tooling. This creates a persistent gap between the spatial audio formats used in industry and the open systems available to researchers, developers, and immersive venues. This paper presents CULT DSP, an open-source spatial audio toolchain designed to address that gap by separating transcoding, scene exchange, playback, and authoring into distinct but interoperable roles. CULT ingests spatial audio exports, extracts and normalizes scene metadata, and exports it for later use; in the authoring direction, the same module packages LUSID scene data and mono stems back into ADM/BWF output. LUSID provides a stable scene and package structure shared across the stack. Spatial Root is a layout-agnostic playback engine for real-time and offline rendering on custom loudspeaker arrays. Its EngineSession API exposes the runtime as a C++ interface used by the GUI, CLI, and external host applications. Four implementation projects extend the toolchain: Spatial Seed uses CULT and LUSID for procedural authoring from stems; LUSIDstreamer treats LUSID frames as lightweight scene-state packets; immersive-allo-root embeds Spatial Root in an AlloLib audiovisual application; and ue-root prototypes a game-engine-facing host path. Together, they show how Atmos-derived metadata can be reused for playback, authoring, inspection, and immersive media development rather than used only for final delivery.
Speakers
Wednesday July 1, 2026 1:00pm - 4:30pm CEST
Jussieu:Room 1 4, place Jussieu Paris 5e

1:00pm CEST

(P) Benchmarking Spatial Audio Reproduction Systems in Smart Glasses and XR Headsets: Illustrative Results and Interpretation
Wednesday July 1, 2026 1:00pm - 4:30pm CEST
This paper presents an illustrative cross-device evaluation of spatial audio reproduction in smart glasses and XR headsets using binaural in-ear recordings and external sound-level measurements on four anonymized commercial devices. The evaluation is organized around baseline playback behavior, cue fidelity, sound leakage, and robustness to wearing variability, with metrics derived from broadband-noise and swept-sine measurements. The results reveal distinct device behaviors, including differences in channel balance, interchannel signal behavior, preservation of HRTF-encoded binaural cues, perturbation of real-world acoustic cues, external sound radiation, and sensitivity to reseating. Rather than establishing a product ranking, this study demonstrates how the benchmark supports structured cross-device interpretation of wearable XR spatial audio systems.
Wednesday July 1, 2026 1:00pm - 4:30pm CEST
Jussieu:Room 1 4, place Jussieu Paris 5e

1:00pm CEST

(P) Vikk64: A planar microphone array for audio-visual reproduction in virtual reality
Wednesday July 1, 2026 1:00pm - 4:30pm CEST
A virtual artificial head (VAH) can be used to imprint a listener’s head-related transfer functions (HRTFs) onto a recording using a filter-and-sum beamforming approach. The previous version of the so-called Vikk, consisting of 24 microphones, was able to recreate HRTFs with low interaural errors, including temporal and spectral distortions up to \SI{5}{kHz}. A simulation of a revised topology demonstrated an increased frequency range up to \SI{8}{kHz}, motivating us to examine whether the range could be extended further, ideally beyond the audible range. We simulated two microphone topologies with different arrangement strategies based on either a Golomb ruler or Vogel’s spiral. In addition, scaling and weighting were applied to create a denser microphone placement in the centre of the array. Vogel’s spiral achieved results comparable to the Golomb ruler with 24 microphones and is easier to rescale with a larger number of microphones and parametric weighting. For this reason, we selected a weighted Vogel’s spiral to investigate how the number of microphones affects temporal and spectral distortions. Increasing the number of microphones to 32 reduced temporal and spectral distortions, although spectral distortions on the contralateral ear remained above \SI{10}{kHz}. Further increasing the number to 64 microphones reduced spectral distortions and extended the usable frequency range up to \SI{16}{kHz}. These results demonstrate the suitability of the Vikk64 for high-quality reproduction of binaural auralisations in the horizontal plane. Additionally, we outline how combining the Vikk64 with a VR180 camera enables the recording of audiovisual scenes that can be reproduced in virtual reality.
Wednesday July 1, 2026 1:00pm - 4:30pm CEST
Jussieu:Room 1 4, place Jussieu Paris 5e

1:30pm CEST

Eliciting the Effectiveness of Binaural Renderer Enhancement on the Horizontal Plane
Wednesday July 1, 2026 1:30pm - 2:00pm CEST
Binaural rendering is central to spatial audio reproduction via headphones and wearable devices, yet systematic evaluation of enhancement techniques remains methodologically inconsistent across the literature. This paper presents and applies a subjective evaluation methodology designed to consistently elicit five perceptual attributes of headphone-based spatial audio: externalization, elevation fidelity, front/back distinction, room tone, and spectral coloration. The methodology combines absolute judgment and relative comParison protocols, taking advantage of their complementary capabilities to capture both the absolute perceptual quality of individual rendering conditions, and the salience of perceived differences between them. It is applied in a controlled experiment comparing conventional HRTF-based binaural rendering against two enhancement variants that each superpose a masked spatially diffuse sound field component into the binaural output. Stimuli were spatialized across eleven azimuthal positions in the horizontal plane using a generic dummy-head dataset, and presented over closed-back headphones to participants. The results validate the proposed methodology as a tool for revealing perceptually relevant differences among binaural rendering conditions. Relative comParison tests reveal additional performance difference details between rendering methods. Both enhancement methods significantly improve externalization and front/back distinction relative to unenhanced binaural rendering, with the largest gains at lateral azimuths, and without a statistically significant increase in perceived spectral coloration beyond the baseline effect of HRTF filtering. However, the results do not indicate conclusively whether the binaural rendering methods examined here exhibit or mitigate a "spurious elevation" artifact associated with frontal sound presentation.
Wednesday July 1, 2026 1:30pm - 2:00pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

1:30pm CEST

Sponsor demos
Wednesday July 1, 2026 1:30pm - 5:30pm CEST
Come see the newest developements of our sponsors
Wednesday July 1, 2026 1:30pm - 5:30pm CEST
Jussieu:Room 2 4, place Jussieu Paris 5e

2:00pm CEST

Primary Source Dominance and Acoustic Scene Complexity in 6DoF VR Audio Evaluation
Wednesday July 1, 2026 2:00pm - 2:30pm CEST
In virtual reality (VR) experiences, a primary source often refers to a sound object designated for a central role within a scene, contrasted with contextual background sources. While such sources are typically assumed to guide perceptual attention, it remains unclear whether a designated primary source maintains dominance in overall audio quality evaluation as acoustic scene complexity increases, particularly in six-degrees-of freedom (6DoF) scenarios. This study investigates how per-source rendering quality and scene complexity influence overall audio quality evaluation in 6DoF VR. Rendering quality was manipulated independently for a primary source and background sources, and scene complexity was varied based on the number of sources. Rank-order elimination-by-aspects (EBA) was applied to test dominance patterns across conditions. Results indicate that under low scene complexity, overall evaluation mainly depended on primary source rendering quality. However, in high complexity multisource scenes, this dominance was no longer observed, and evaluation dependence became distributed across sources.
Wednesday July 1, 2026 2:00pm - 2:30pm CEST
Jussieu:Conf 1 4, place Jussieu Paris 5e

2:00pm CEST

How representative are Dummy head HRTFs? A subjective comParison of mannequin and human datasets
Wednesday July 1, 2026 2:00pm - 2:30pm CEST
Head-related transfer functions (HRTFs) are central to convincing binaural rendering in virtual and augmented reality applications. While individual HRTFs offer the highest perceptual fidelity, the practical difficulty of personal HRTF acquisition drives widespread use of dummy head (mannequin) measurements as a non-individualized substitute. Despite their ubiquity, systematic perceptual benchmarking of dummy head HRTFs against human HRTFs remains limited, particularly with respect to whether consistent trends emerge across listeners irrespective of individual HRTF preference. This study extends prior work on subjective HRTF evaluation methodologies and perceptual ranking by applying an established trajectory-based quality assessment paradigm to a mixed set of dummy head and human HRTFs. Participants were presented with predefined auditory trajectories rendered via binaural synthesis and asked to rate the perceptual quality of each rendering with respect to adherence to the prescribed trajectory. HRTFs were presented in randomised order across two sets of eight, with repeated items serving as an inter-set normalisation anchors. The HRTF pool encompassed human measurements alongside a range of dummy head types: simplified head-only geometries, head-and-torso simulators (HATS), and models incorporating absorptive materials (hair, clothing analogues). The primary research question is whether, despite well-documented listener-dependent variability in HRTF suitability, population-level trends differentiate dummy head HRTFs from human ones, and further, whether acoustic complexity of the mannequin (torso, absorptive surfaces) correlates with perceptual performance. Results are discussed in terms of implications for HRTF database design and substitute HRTF selection strategies for immersive audio applications.
Wednesday July 1, 2026 2:00pm - 2:30pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

2:30pm CEST

Investigating the Perceptual Relevance of Voice Directivity in Virtual Vocal Instruction Environments
Wednesday July 1, 2026 2:30pm - 3:00pm CEST
Given the limited research on the use of extended reality (XR) technologies in remote music instruction from the perspective of music tutors, this work examines the perceptual importance of voice directivity within a virtual reality (VR) environment. In particular, the perceptual ability to discriminate differences between a measured vocal directivity pattern and a slightly modified omnidirectional directivity pattern is investigated. Two listening tests were conducted to probe directivity perception under (i) static and (ii) dynamic listener conditions within a simulated music practice room, integrating 3rd order Ambisonics Room Impulse Responses (RIRs) and head-tracked binaural reproduction within an interactive Unity-based interface. The static test used an ABX discrimination model to assess directivity detectability as a function of location and stimulus content. The dynamic test involved free navigation around a virtual singer and the evaluation of the perceived directional plausibility, naturalness of the sound emission, and the adequacy of the experience for the assessment of the singer’s vocal characteristics. The results suggest that while listeners can detect differences between vocal directivity patterns under controlled listening conditions, such differences may become less perceptually salient during dynamic interaction within a virtual environment. Nevertheless, the overall positive evaluations in the dynamic listening test indicate that the implemented spatial audio approach provides a plausible and effective auditory experience, supporting its potential use in XR-based applications for remote music instruction and performance evaluation.
Wednesday July 1, 2026 2:30pm - 3:00pm CEST
Jussieu:Conf 1 4, place Jussieu Paris 5e

2:30pm CEST

Understanding Ambisonics Through Practical Decisions and Listening
Wednesday July 1, 2026 2:30pm - 3:30pm CEST
Ambisonics is a scene-based spatial audio format that has been around since the 1970s. In recent years its popularity has increased, with inclusion in game engines (such as Unity and Unreal) and distribution standards (like ADM or IAMF). Despite this, many practitioners view Ambisonics as being complex, mathematical, or academic. This tutorial explains Ambisonics through the lens of practical decision-making. Instead of equations, it covers the choices audio professionals are required to make when working on a project, with a particular emphasis on the audible consequences of those choices. The tutorial enables attendees to develop an intuitive understanding of Ambisonics through explanations of theory, combined with listening examples and workflow demonstrations. The topics covered in this tutorial are: • Fundamentals: What Ambisonics is and how it differs from channel-based and object-based formats, and why it is well suited to VR, AR, and game audio. • Encoding: How audio sources are converted to Ambisonics, and how to choose the Ambisonic order based on perceptual and computational trade-offs, as well as delivery constraints. • Conventions: Common channel ordering (ACN vs FuMa) and gain normalisation (SN3D vs N3D) conventions, and what happens when things get mismatched. • Processing: What kinds of effects can be used on Ambisonic signals while preserving the spatial integrity. • Decoding and binaural rendering: How Ambisonic signals are converted to loudspeaker or binaural signals. The impact of head-tracking and HRTF selection on the binaural rendering. • Mixed-order projects: What the options are when working with mixed-order sources, and the audible artefacts that can arise. The tutorial will provide brief practical demonstrations of setting up an Ambisonics project in Pro Tools and Reaper, two widely used DAWs for immersive audio. By the end of the tutorial attendees will have a practical understanding of the main concepts of Ambisonics, as well as knowing how the practical choices they make will impact the final audio. They will also be familiar with the main workflow pitfalls and how to avoid them. The tutorial assumes familiarity with general audio production concepts (DAW use, signal routing, mixing). However, no prior experience with Ambisonics or spatial audio formats is required. It is suitable for sound designers, composers, and audio engineers working in or interested in immersive media.
Speakers
Wednesday July 1, 2026 2:30pm - 3:30pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

2:30pm CEST

Coffee A
Wednesday July 1, 2026 2:30pm - 3:30pm CEST
Wednesday July 1, 2026 2:30pm - 3:30pm CEST
Cafe / Lunch

3:00pm CEST

Personality Perception in Acoustic Virtual Reality
Wednesday July 1, 2026 3:00pm - 3:30pm CEST
This study investigates how perceived auditory distance in Virtual Reality (VR) influences social perception, specifically personality attribution. Building on research linking social and physical distance, the work explores whether speakers who sound closer or farther away are judged differently in terms of personality traits. Using the SONICOM 3D Speaker Personality Corpus, the study analysed 360 spatialised speech samples from 120 speakers. Each sample was evaluated by 10 listeners, who rated both perceived auditory distance and five personality traits (Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism). Ratings were averaged to obtain a single score per recording. The analysis proceeded in two stages. First, correlations between perceived distance and personality traits were examined. Results showed significant relationships for two traits: Extraversion and Agreeableness. Specifically, speakers perceived as closer were judged as less extraverted but more agreeable, while more distant speakers were perceived as more extraverted and less agreeable. Second, machine learning models were developed to predict personality trait scores from the speech data. A feedforward neural network achieved above-chance classification performance across all traits. When the model was extended to jointly predict both personality traits and perceived distance, performance improved significantly for all traits. This suggests that perceived distance and personality attribution are linked, and that this relationship includes non-linear patterns not captured by simple correlation analysis. Overall, this is the first study to demonstrate an interaction between perceived physical distance and speech-based personality judgments. The findings highlight the importance of spatial audio in shaping social perception in VR and Extended Reality (XR). They suggest that manipulating the perceived distance of virtual speakers could influence how users interpret social cues, potentially enhancing the design of virtual agents for roles such as teachers, assistants, or companions.
Wednesday July 1, 2026 3:00pm - 3:30pm CEST
Jussieu:Conf 1 4, place Jussieu Paris 5e

3:30pm CEST

Rendering 6DoF Audio in Augmented Reality: Perceptual Evaluation of Game Engines, Plugins, and Middleware
Wednesday July 1, 2026 3:30pm - 4:00pm CEST
Contemporary game engines such as Unreal Engine and Unity are widely used for Extended Reality (XR), yet their native audio pipelines often rely on simplified spatialization with limited acoustic control. In Augmented Reality (AR), virtual sound sources must integrate coherently with the physical environment to maintain perceptual plausibility, making accurate Six-Degrees-of-Freedom (6DoF) rendering critical. This study carried out a perceptual evaluation of multiple 6DoF audio rendering approaches, including Audiokinetic Wwise (Reflect and RoomVerb), Steam Audio, Meta XR Audio SDK, a dense 6DoF Room Impulse Response (RIR) interpolation method, and APLVirtuoso XR. A measured physical room was reconstructed in Unreal Engine 5, and the rendering pipelines were calibrated by matching the reverberation time (RT60) and direct-to-reverberant ratio (DRR) to the measured room within their respective just noticeable difference (JND) thresholds. The results showed significant differences in perceived spatial and timbral fidelity as well as plausibility and overall listening experience across the tested renderers. Despite the high accuracy achieved by the 6DoF interpolation method, an algorithmic renderer demonstrated comparable or superior performance. However, some other algorithmic renderers exhibited tradeoffs in terms of computational overhead and acoustic modelling accuracy. Our findings indicate that an optimised system prioritising a plausible auditory representation, rather than strict physical replication, may be sufficient and, in some cases, can yield superior perceptual outcomes.
Wednesday July 1, 2026 3:30pm - 4:00pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

3:30pm CEST

Coffee B
Wednesday July 1, 2026 3:30pm - 4:30pm CEST
Wednesday July 1, 2026 3:30pm - 4:30pm CEST
Cafe / Lunch

4:00pm CEST

Audiovisual Coherence Thresholds for Direct-to-Reverberant Energy Ratio
Wednesday July 1, 2026 4:00pm - 4:30pm CEST
Holographic calling systems aim to create the perceptual illusion that a remote interlocutor is physically present by combining augmented reality (AR) visualizations with spatial audio rendering. A key cIRCAM:Galleryenge in such systems is achieving audiovisual coherence when room-acoustic properties must be inferred, since this can lead to greater inaccuracies than when they are estimated from measurements. In this study, we investigate perceptual tolerance to mismatches in the direct-to-reverberant energy ratio (DRR), a critical cue for auditory distance perception. We conducted a listening experiment in which participants judged whether the audio and the visual presentation of a stimulus were coherent using a yes/no task. Stimuli included both a volumetric capture of a speaking human avatar, rendered at the correct direct sound level, and a loudspeaker reproducing wideband noise. For the loudspeaker, level roving was introduced to assess the influence of intensity cues on listener decisions. Results show that audiovisual coherence is maintained for half of the presentations within a DRR mismatch range of approximately 3 dB for too dry and 4.3 dB for too reverberant renderings. Within the limited number of participants included in the study, no evidence for significant differences was found between speaking avatars and loudspeakers reproducing noise, nor between different ranges of level roving in the loudspeaker condition. Nevertheless, the findings help to understand the consequences of DRR estimation mismatches for holographic AR calling experiences.
Wednesday July 1, 2026 4:00pm - 4:30pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

4:00pm CEST

Combining Synthetic Sound with Spatial Audio Rendering — Insights and Opportunities
Wednesday July 1, 2026 4:00pm - 5:00pm CEST
Procedural audio is an essential part of building interactive media. Game engines have long supported procedural techniques for designers and engineers. The demand for sounds that feel natural and realistic continues to grow, and spatial and interactive sound design play a key role in creating plausible auralizations. Many factors go into creating realistic virtual collision sounds across multiple objects. Speed of motion, size, material, and design intent from game mechanics all intertwine into a web of interactivity, spatialization, modeling, storytelling, and logic. The sound of footsteps changes with movement and with the materials beneath, pressing buttons or fabrics might require tactility, and explosives in a variety of rooms and different listening positions pose spatialization subtleties. What are the right tools to realize these intricacies? What does the current landscape look like for addressing remaining cIRCAM:Galleryenges? What developments are emerging in machine learning and AI? In this workshop, organized by the AES TC Spatial Audio and AES TC Interactive Media and Gaming, speakers will share workflows, systems, and their experience developing spatial, interactive, and generative sound design. The session is intended for those looking to expand their toolkit, rethink existing approaches, and better understand the practical and technical considerations behind procedural and spatial audio systems for virtual sounds.
Speakers
avatar for Can Murtezaoglu

Can Murtezaoglu

Research Assistant, Istanbul Technical University
Immersive audio recording and mixing techniques, audio design for visual media
Wednesday July 1, 2026 4:00pm - 5:00pm CEST
Jussieu:Room 3 4, place Jussieu Paris 5e

4:30pm CEST

The Effect of Authentic Spatial Sound on Verbal Working Memory in Online Virtual Reality Learning Environments
Wednesday July 1, 2026 4:30pm - 5:00pm CEST
This paper investigates the impact of authentic spatial audio on verbal working memory (WM) within a WebXR-based virtual reality learning environment (VRLE). While prior virtual reality (VR) research has predominantly focused on visual modalities, the influence of auditory realism, particularly authentic spatialised sound, on cognitive performance remains underexplored. To address this gap, a controlled within-subjects experiment was conducted using an adapted automated operation span (AOSPAN) task under two conditions: with and without authentic spatial sound. A total of 40 participants completed the study using a head-mounted display in a controlled laboratory setting. The VRLE was implemented using web-based technologies, incorporating ambisonics audio capture and real-time spatial sound rendering. Statistical analysis revealed no significant differences in WM performance across conditions for all measured metrics, including OSPAN score, total correct recall, and error rates. However, results consistently showed a non-significant trend toward improved performance in the presence of authentic spatial sound. In contrast, subjective measures indicated substantial enhancements in perceived presence, immersion, realism, and user preference when spatial ambient audio was enabled. These findings suggest that while authentic spatial sound does not significantly influence verbal WM performance, it enhances experiential quality without increasing cognitive load. The study highlights the importance of incorporating realistic auditory environments in VR design for education, supporting user engagement while maintaining cognitive neutrality.
Wednesday July 1, 2026 4:30pm - 5:00pm CEST
Jussieu:Conf 1 4, place Jussieu Paris 5e

4:30pm CEST

Assessing the Plausibility of Measurement-Based Auralization of Sound Transmission through Walls
Wednesday July 1, 2026 4:30pm - 5:00pm CEST
In building acoustics, modeling and auralizing sound transmission through walls is a relevant area of research related to residential well-being and workplace productivity. While most existing studies use auralization methods that focus on accurately reproducing spectrum and amplitude of the sound transmission in accordance with ISO standards, almost no studies have explicitly tried to evaluate the perceptual quality of wall transmission auralization. To address this, the present study applies the plausibility paradigm to evaluate a measurement-based auralization approach to validate it for further psychoacoustic experiments on residential well-being. Binaural Room Impulse Responses (BRIRs) were measured for three loudspeakers placed in adjacent rooms to a central listening room. In a subsequent listening test, participants were asked to evaluate the overall plausibility of the auralization. Results demonstrate that due to the absence of visual cues, the lower sound pressure level, the reduced signal-to-noise ratio, and the diffuse radiation characteristic of the source, high plausibility scores close to the guessing rate were achieved for all adjacent rooms. These findings suggest that due to the lack of a visual cue, lower sound pressure level, reduced signal to noise ratio and the diffuse radiation characteristics of the source, auralization of wall transmission using BRIRs can be used as a plausible method for psychoacoustic research on residential well-being, without the need for complex physical simulations.
Wednesday July 1, 2026 4:30pm - 5:00pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

5:00pm CEST

Assessing the Impact of Spatial Audio on Cognitive Load and Memory Retention for Virtual Training Simulation in Virtual Reality
Wednesday July 1, 2026 5:00pm - 5:30pm CEST
This paper examines whether spatialised audio improves cognitive load and memory retention in Virtual Reality training. Using a commercial VR public speaking module developed by BODYSWAPS, 1,350 real-world users were randomly assigned to either a standard audio (control) or fully spatialised audio with virtual acoustics (study) condition. The study ran over a three-year period, making this the largest study of its kind. Participants completed an exit survey rating five data points: comfort, concentration, realism, retention, and simulation. The spatialised audio group reported consistently higher scores overall, with a statistically significant improvement in perceived comfort (p = 0.006, d ≈ 0.44). Directional improvements were also observed in realism and retention, though these did not reach statistical significance. Gaze-time analysis revealed that the spatialised audio. The group spent more time looking at the primary coaching figures, suggesting that spatial audio may support sustained attentional focus on key instructional sources. The findings indicate that spatial audio design is a meaningful contributor to VR training quality, particularly in comfort and perceived realism, with promising trends for learning efficacy.
Wednesday July 1, 2026 5:00pm - 5:30pm CEST
Jussieu:Conf 1 4, place Jussieu Paris 5e

5:00pm CEST

The quest for decorrelation continues -- from stereo to immersive
Wednesday July 1, 2026 5:00pm - 6:00pm CEST
Using pitch, delay and modulation effects to perceptually spread a mono source across additional, adjacent playback channels is a staple of music production, born in stereo and extended to immersive. This tutorial begins with the historical origins of the effect in the mid 1970s, showing the signal processing chains used, measuring its impact on the signal, discussing its psychoacoustic merit, and demonstrating the resulting sound. The evolution from stereo through surround sound to immersive formats using contemporary production tools and techniques is demonstrated. The effect is still evolving as new tools are developed and creators explore what is possible across all immersive artforms – music, AR/VR, and games. It is hoped a deep dive into the first 50 years of this effect will inform and inspire immersive mixers for the future.
Speakers
Wednesday July 1, 2026 5:00pm - 6:00pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

5:30pm CEST

SCHuBERT: a real-time end-to-end model for piano music emotion recognition
Wednesday July 1, 2026 5:30pm - 6:00pm CEST
In this study, we present SCHuBERT, a real-time end-to-end Piano Music Emotion Recognition (PMER) system that operates directly on raw audio and fine-tunes DistilHuBERT for short-window classification on the Valence–Arousal (V–A) plane. Designed for low latency and high responsiveness, the system is particularly well suited for immersive applications such as Virtual and Augmented Reality (VR/AR). Compared with both audio- and symbolic-domain baselines, SCHuBERT achieves strong accuracy in four-quadrant (4Q) classification as well as in binary arousal and valence tasks, while maintaining low computational overhead for real-time operation.
Wednesday July 1, 2026 5:30pm - 6:00pm CEST
Jussieu:Conf 1 4, place Jussieu Paris 5e

6:30pm CEST

Banquet @ La Coupole
Wednesday July 1, 2026 6:30pm - 10:00pm CEST
Address : 102, boulevard du Montparnasse Paris 14e
Wednesday July 1, 2026 6:30pm - 10:00pm CEST
Off-site via Metro
 
Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
Filtered by Date -