BEGIN:VCALENDAR
VERSION:2.0
X-WR-CALNAME:aes2026avarigconference
X-WR-CALDESC:Event Calendar
METHOD:PUBLISH
CALSCALE:GREGORIAN
PRODID:-//Sched.com AES 2026 AVARIG Conference//EN
X-WR-TIMEZONE:UTC
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260630T070000Z
DTEND:20260630T080000Z
SUMMARY:Registration
DESCRIPTION:
CATEGORIES:SOCIAL EVENT
LOCATION:Jussieu Auditorium\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:bf09aa943274c56242cdeda780e899ef
URL:http://aes2026avarigconference.sched.com/event/bf09aa943274c56242cdeda780e899ef
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260630T073000Z
DTEND:20260630T080000Z
SUMMARY:Coffee
DESCRIPTION:
CATEGORIES:SOCIAL EVENT
LOCATION:Cafe / Lunch\, Sorbonne University\, Jussieu campus & IRCAM/STMS\, Paris\, France
SEQUENCE:0
UID:04583e2cca6ab9ce82e41073d8f0e767
URL:http://aes2026avarigconference.sched.com/event/04583e2cca6ab9ce82e41073d8f0e767
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260630T080000Z
DTEND:20260630T083000Z
SUMMARY:Opening ceremony
DESCRIPTION:
CATEGORIES:SOCIAL EVENT
LOCATION:Jussieu Auditorium\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:c45d5e02d8f1c5d0d8c0bd541005595f
URL:http://aes2026avarigconference.sched.com/event/c45d5e02d8f1c5d0d8c0bd541005595f
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260630T083000Z
DTEND:20260630T090000Z
SUMMARY:From AVAR to AVARIG: 10 Years of XR Audio with the AES
DESCRIPTION:This year marks the tenth anniversary of the biennial AES International Conference on Audio for Virtual and Augmented Reality. Starting with the inaugural event co-located with the 2016 AES Convention in Los Angeles\, we’ll recap highlights from the previous five AVAR conferences to celebrate where we’ve been\, where we are\, and where we’re going.
CATEGORIES:SOCIAL EVENT
LOCATION:Jussieu Auditorium\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:2d6e0a24e8f4c059bc15c43c2ef0c3be
URL:http://aes2026avarigconference.sched.com/event/2d6e0a24e8f4c059bc15c43c2ef0c3be
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260630T090000Z
DTEND:20260630T100000Z
SUMMARY:Immersive Sound Volume II - authors' roundtable
DESCRIPTION:The rapid mainstreaming of spatial audio has created a need for design frameworks that translate research into production-ready practice. This roundtable brings together contributing authors from Immersive Sound Volume II: The Design and Practice of Binaural and Multi-Channel Experiences for an open\, cross-disciplinary conversation about where the field stands and where it is headed. Participants will draw on the text's core themes: perceptual grounding\, system design\, and the creative practice of building immersive sound experiences. The session is structured to encourage dialogue between contributors and attendees\, surfacing points of debate\, unresolved questions\, and divergent perspectives across binaural and multi-channel workflows. Through case studies and cross-disciplinary perspectives\, the session offers a practical roadmap for audio engineers\, sound designers\, and researchers navigating the evolving immersive audio industry. Attendees will leave with concrete frameworks applicable to both studio production and real-time XR deployment.
CATEGORIES:BINAURAL
LOCATION:Jussieu Auditorium\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:aa3ce78ad0626a39c56ec884ec266ec8
URL:http://aes2026avarigconference.sched.com/event/aa3ce78ad0626a39c56ec884ec266ec8
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260630T100000Z
DTEND:20260630T110000Z
SUMMARY:Eclipsa Audio: Bringing Immersive Audio to Everyone
DESCRIPTION:Eclipsa Audio\, based on the Immersive Audio Model and Format (IAMF) specification developed by members of the Alliance for Open Media\, represents an open and royalty-free approach to immersive audio creation and delivery. Eclipsa Audio provides a growing ecosystem for producing and distributing spatial audio content\, with hardware integration and streaming platform support\, including YouTube\, actively being rolled out. This panel brings together practitioners\, researchers\, and engineers directly involved in the development of IAMF and Eclipsa Audio to inform the audio engineering community about the current state of the format and its evolving toolkit for immersive audio production and delivery. The presenters will discuss how the Eclipsa Audio ecosystem can continue growing in the live and interactive realms\, including 360 videos\, streaming\, gaming and the combination of both\, eg. in e-sports. Future directions for development will also include developers' perspective on how Eclipsa Audio can be embraced by interactive environments.
CATEGORIES:APPLICATION / GAMING
LOCATION:Jussieu:Conf 2 (Binaural)\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:ceebc60c8bad90107ea38bf5a91388bf
URL:http://aes2026avarigconference.sched.com/event/ceebc60c8bad90107ea38bf5a91388bf
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260630T100000Z
DTEND:20260630T103000Z
SUMMARY:Benchmarking Spatial Audio Reproduction Systems in Smart Glasses and XR Headsets: An Application-Driven Measurement Framework
DESCRIPTION:This paper presents an application-driven objective measurement framework for benchmarking spatial audio reproduction in smart glasses and extended reality (XR) headsets. Wearable XR devices render virtual spatial audio while users simultaneously perceive the physical acoustic environment\, creating evaluation cIRCAM:Galleryenges distinct from conventional headphone-based playback. Existing approaches are often inconsistent\, focusing on limited device classes or metrics\, and do not support unified cross-device benchmarking. The proposed framework derives benchmark attributes from two application dimensions: the acoustic role of the device and the usage context. Measurements are organized into four groups: baseline playback checks\, cue fidelity\, sound leakage\, and robustness to wearing variability. The framework adopts a system-level methodology that characterizes observable device behavior without requiring access to proprietary internal parameters\, enabling reproducible cross-device comParison. An illustrative application of the framework is presented in a companion paper.
CATEGORIES:MICARRAY / HEADWORN
LOCATION:Jussieu:Conf 1\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:6694db17f5665af57e6603fed1924cb7
URL:http://aes2026avarigconference.sched.com/event/6694db17f5665af57e6603fed1924cb7
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260630T100000Z
DTEND:20260630T110000Z
SUMMARY:Lunch A
DESCRIPTION:
CATEGORIES:SOCIAL EVENT
LOCATION:Cafe / Lunch\, Sorbonne University\, Jussieu campus & IRCAM/STMS\, Paris\, France
SEQUENCE:0
UID:3f426a2a7b606c4ad857712ef2f27e81
URL:http://aes2026avarigconference.sched.com/event/3f426a2a7b606c4ad857712ef2f27e81
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260630T103000Z
DTEND:20260630T110000Z
SUMMARY:Investigating the perceptual impact of head-worn devices for augmented reality using a dynamic task with continuous head–eye tracking
DESCRIPTION:Augmented reality (AR) systems require listeners to wear head-worn devices (HWDs) such as headphones and head-mounted displays (HMDs)\, which can alter spatial hearing by modifying the acoustic cues reaching the listeners’ ears. Although acoustical and perceptual effects have been reported for isolated HWDs\, most studies rely on simplified paradigms such as static sound source localisation tasks\, providing limited insight into spatial perception in more ecological settings. In everyday listening\, spatial perception is an active multisensory process in which listeners coordinate head and eye movements to build a stable representation of the environment\, which may be disrupted by altered auditory cues. In this work\, the perceptual impact of wearing HWDs was investigated using an auditory-aided visual search task with continuous tracking of head and eye movements. Multiple HWD configurations were compared\, including two pairs of headphones with and without an HMD\, to assess how scattering introduced by these devices affects spatial hearing in ecologically relevant AR scenarios. Results showed small but statistically significant effects of HWDs on exploration behaviour\, primarily reflected in increased eye-movement search time\, while head movements were only marginally affected. Across conditions\, eye movements preceded head movements\, with subtle differences in movement onset timing but limited impact on overall search performance. Overall\, the findings indicate that HWDs introduce measurable but moderate changes in eye–head coordination\, while largely preserving spatial search performance in ecologically valid listening conditions.
CATEGORIES:MICARRAY / HEADWORN
LOCATION:Jussieu:Conf 1\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:669521fa982350a0106e8d6e56497a8b
URL:http://aes2026avarigconference.sched.com/event/669521fa982350a0106e8d6e56497a8b
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260630T110000Z
DTEND:20260630T120000Z
SUMMARY:From Personalised Spatial Audio to Real-World XR Experiences – Results\, Tools\, and Open Research Outputs
DESCRIPTION:The rapid growth of extended reality (XR) technologies has highlighted the critical role of immersive audio in enabling natural\, effective\, and socially meaningful interactions. While visual realism has traditionally dominated XR development\, auditory perception remains a key driver of presence\, communication\, and behavioural response. The EU-funded SONICOM project (GA 101017743)\, ended in June 2026\, aimed at addressing this cIRCAM:Galleryenge by advancing the scientific\, technological\, and perceptual foundations of immersive audio for augmented and virtual reality\, with a particular focus on personalisation\, interaction\, and reproducibility. This workshop presents a comprehensive overview of SONICOM’s outcomes\, structured around its core Work Packages (WPs)\, and delivered directly by WP leaders. It is designed as an interactive session\, where each WP will briefly introduce not only scientific findings\, but also tangible outputs including datasets\, software tools\, hardware prototypes\, and open research resources. In addition\, SONICOM will be represented across the broader conference through multiple contributions\, including demonstrations\, papers\, posters\, and tutorials. This workshop will explicitly connect to these activities\, guiding participants towards relevant sessions where specific aspects of the project can be explored in greater depth. The first part of the workshop will focus on WP1 (Immersion)\, which has advanced the modelling of both listeners and environments for high-fidelity spatial audio rendering. Key contributions include novel approaches to Head-Related Transfer Function (HRTF) personalisation\, combining physical modelling\, machine learning\, and perceptual validation. The SONICOM HRTF dataset\, comprising over 300 subjects with associated 3D morphological data\, will be presented as a major open resource for the community\, supporting reproducible research and data-driven approaches to auditory modelling. Additional outputs include parametric pinna models\, numerical simulation frameworks\, and techniques for real/virtual acoustic blending\, addressing one of the central cIRCAM:Galleryenges of audio augmented reality. The workshop will then move to WP2 (Interaction)\, which investigates how spatial audio influences human behaviour\, perception\, and social interaction. This includes studies on auditory proxemics\, adaptation to non-individual HRTFs\, and speech intelligibility in complex environments. WP2 has generated a range of experimental paradigms and datasets linking acoustic features to behavioural and cognitive responses\, providing new insights into how sound shapes social dynamics in XR. In WP3 (Integration)\, research outcomes from WP1 and WP2 are consolidated into a unified technological framework. Central to this effort is the Binaural Rendering Toolkit (BRT)\, a flexible and extensible software platform enabling real-time\, personalised spatial audio rendering. The workshop will briefly present the architecture and capabilities of the BRT\, as well as its integration with novel hardware prototypes\, including self-personalising headphones equipped with multiple microphones and loudspeakers. WP4 (Experience) extends these technologies into ecologically valid scenarios\, evaluating immersive audio in realistic use cases ranging from social communication to professional XR applications. The workshop will present the design of these scenarios and early insights into how personalised audio affects user experience\, task performance\, and perceptual plausibility. The final technical section will focus on WP5 (Beyond)\, which ensures long-term impact through open science and community engagement. SONICOM has contributed to and expanded key infrastructures such as the Auditory Modelling Toolbox and SOFA standard\, while also developing its own ecosystem linking datasets\, models\, and tools. A key highlight is the Listener Acoustic Personalisation (LAP) cIRCAM:Galleryenge\, which engaged the international community in advancing data-driven approaches to HRTF personalisation. Throughout the workshop\, emphasis will be placed on open outputs and reuse: participants will be guided on how to access SONICOM datasets\, software\, and models\, and how these can be integrated into their own research and development pipelines. The session will conclude with a panel discussion involving all WP leaders\, focusing on future cIRCAM:Galleryenges in immersive audio\, including scalability of personalisation\, as well as integration with emerging AI technologies.
CATEGORIES:IMMERSIVE AUDIO
LOCATION:Jussieu:Room 3\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:7b62f3a540f070be0b8dd4dddd4fff38
URL:http://aes2026avarigconference.sched.com/event/7b62f3a540f070be0b8dd4dddd4fff38
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260630T110000Z
DTEND:20260630T113000Z
SUMMARY:Field-of-View Informed Binaural Signal Matching for Head-Worn Arrays
DESCRIPTION:Capturing acoustic scenes with head-worn microphone arrays is cIRCAM:Galleryenging due to a limited number of sensors and constrained placement flexibility. Nevertheless\, binaural reproduction based on these arrays has been recently proposed using binaural signal matching (BSM)\, showing high robustness and computational efficiency\, but inferior performance compared to the more computationally complex signal-dependent methods\, in particular at low reverberation conditions. To address this gap\, this paper investigates the Field-of-View informed Binaural Signal Matching (FoVi-BSM) method for far-field sources. FoVi-BSM incorporates a diagonal spatial weight matrix directly into the error formulation\, redefining the filter solution to prioritize spectral and spatial fidelity within a predefined FoV\, showing performance comparable to signal dependent methods but with the same computation complexity as BSM. The performance of the method is evaluated through objective single-source anechoic simulations\, multi-source reverberant Monte Carlo simulations\, and a subjective MUSHRA listening test. Results demonstrate that prioritizing the FoV improves rendering accuracy within the targeted region over the standard baseline BSM method\, achieving perceptual quality comparable to signal-dependent parametric methods while maintaining baseline-level performance outside the FoV.
CATEGORIES:MICARRAY / HEADWORN
LOCATION:Jussieu:Conf 2 (Binaural)\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:be2ceda87677d360a366d91ecab8ddd9
URL:http://aes2026avarigconference.sched.com/event/be2ceda87677d360a366d91ecab8ddd9
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260630T110000Z
DTEND:20260630T120000Z
SUMMARY:Lunch B
DESCRIPTION:
CATEGORIES:SOCIAL EVENT
LOCATION:Cafe / Lunch\, Sorbonne University\, Jussieu campus & IRCAM/STMS\, Paris\, France
SEQUENCE:0
UID:ebc54e29443404a32d2590570c17a6c3
URL:http://aes2026avarigconference.sched.com/event/ebc54e29443404a32d2590570c17a6c3
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260630T113000Z
DTEND:20260630T120000Z
SUMMARY:Interpolation of Sparsely Sampled Array Transfer Functions for Head-Worn Microphone Arrays
DESCRIPTION:Integrating microphone arrays into head-worn devices\, such as augmented reality (AR) and virtual reality (VR) headsets\, as well as hearing aids\, has become increasingly popular for capturing and reproducing acoustic scenes. A common requirement in many such systems is a dense set of array transfer functions~(ATFs). However\, dense ATFs are cumbersome to measure\, and practical setups commonly yield sparse grids rather than the uniform dense sampling often required. This motivates the use of interpolation to reconstruct dense ATF sets from sparse measurements. This paper evaluates spherical harmonics and natural-neighbor interpolation\, each combined with onset-based time-alignment and post-interpolation magnitude correction\, for a head-worn array across sampling densities. To examine how interpolation errors propagate to binaural rendering\, the interpolated ATFs are substituted into two recent filter design methods: signal-independent binaural signal matching (BSM) and a signal-dependent method combining COMPASS parametric spatial coding with BSM (COM). Results show that BSM remains largely robust to interpolation errors\, while COM substantially degrades under sparse sampling conditions with errors comparable to BSM at the lowest density\, but achieves considerably lower errors than BSM as the sampling grid density increases. This is because BSM averages errors across all steering directions\, while COM relies on individual steering vectors for source-directed beamforming.
CATEGORIES:MICARRAY / HEADWORN
LOCATION:Jussieu:Conf 2 (Binaural)\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:855126ec9a31ace06dca08f84c3acafe
URL:http://aes2026avarigconference.sched.com/event/855126ec9a31ace06dca08f84c3acafe
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260630T113000Z
DTEND:20260630T153000Z
SUMMARY:Sponsor demos
DESCRIPTION:Come see the newest developements of our sponsors
CATEGORIES:SPONSOR DEMOS
LOCATION:Jussieu:Room 2\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:c5937d385e12991b975818e6cef969dd
URL:http://aes2026avarigconference.sched.com/event/c5937d385e12991b975818e6cef969dd
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260630T120000Z
DTEND:20260630T123000Z
SUMMARY:A Baffleless Equatorial Ambisonic Microphone Array of Arbitrary Order
DESCRIPTION:We propose a baffleless circular array of radially outward facing cardioid microphones that produces standard ambisonic signals. The array produces an $N$th-order ambisonic signal from $2N+1$ microphones. It can be seen as a baffleless variant of the previously proposed equatorial microphone array\, which uses a rigid spherical baffle. The simplicity of the microphone arrangement comes at the price of not being able to extract certain spatial information from the captured sound field. The ambisonic output signal represents a horizontal projection of the captured sound field. We demonstrate that interaural elevation cues are maintained when binaural rendering is performed despite the horizontal projection.
CATEGORIES:MICARRAY / HEADWORN
LOCATION:Jussieu:Conf 2 (Binaural)\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:83f8d6c2ed722f40a9e6c84092431e6e
URL:http://aes2026avarigconference.sched.com/event/83f8d6c2ed722f40a9e6c84092431e6e
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260630T120000Z
DTEND:20260630T150000Z
SUMMARY:(D) BRT Application
DESCRIPTION:Demonstration of the Binaural Rendering Toolbox (BRT) capabilities through the BeRTA standalone application
CATEGORIES:SONICOM POSTER/DEMO SESSION
LOCATION:Jussieu:Room 1\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:02ed8fd983f2d3483179df1f620e2413
URL:http://aes2026avarigconference.sched.com/event/02ed8fd983f2d3483179df1f620e2413
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260630T120000Z
DTEND:20260630T150000Z
SUMMARY:(D) BRT Explorer
DESCRIPTION:A standalone VR experience for Meta Quest 3 showcasing practical applications of the Binaural Rendering Toolbox.
CATEGORIES:SONICOM POSTER/DEMO SESSION
LOCATION:Jussieu:Room 1\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:73ce2880785848a9a475b1cbb2cc766b
URL:http://aes2026avarigconference.sched.com/event/73ce2880785848a9a475b1cbb2cc766b
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260630T120000Z
DTEND:20260630T150000Z
SUMMARY:(D) Notre-Dame Whispers
DESCRIPTION:Immersive binaural audio guide around Notre-Dame Cathedral. Bring your phone if you can. Headphones supplied if needed.
CATEGORIES:SONICOM POSTER/DEMO SESSION
LOCATION:Jussieu:Room 1\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:ee49a48df0bf4750bc37d7bfb0b86560
URL:http://aes2026avarigconference.sched.com/event/ee49a48df0bf4750bc37d7bfb0b86560
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260630T120000Z
DTEND:20260630T150000Z
SUMMARY:(D) Remote music instruction demo
DESCRIPTION:A VR demo showcasing the impact of various low level immersive features such as (HRTF individualization\, reverberation\, source directivity) on the quality of the experience in a remote music lesson
CATEGORIES:SONICOM POSTER/DEMO SESSION
LOCATION:Jussieu:Room 1\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:8526130c08ebd2a407121773b5ca95ef
URL:http://aes2026avarigconference.sched.com/event/8526130c08ebd2a407121773b5ca95ef
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260630T120000Z
DTEND:20260630T150000Z
SUMMARY:(D) Spatial Audio Teleconferencing Demo
DESCRIPTION:Interactive demo showcasing spatialized audio rendering for multiple participants over a teleconference tool.
CATEGORIES:SONICOM POSTER/DEMO SESSION
LOCATION:Jussieu:Room 1\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:a02532941e76f40cb9b93a3162df8793
URL:http://aes2026avarigconference.sched.com/event/a02532941e76f40cb9b93a3162df8793
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260630T120000Z
DTEND:20260630T150000Z
SUMMARY:(P) Assessing Spatial Coherence and Plausibility in a Multi-Position Virtual Choir Experience
DESCRIPTION:This work reports an exploratory perceptual assessment of a binaurally rendered virtual choir\, focusing on how spatial\, audiovisual\, and performative (including conductor-related) dimensions shape perceived plausibility of the depicted musical event. A four-part a cappella passage from Purcell's "Dido and Aeneas" was recorded at six positions\, five within the choir and one at the conductor\, using first-order Ambisonic recorders in a controlled acoustic environment. These recording positions were later matched to six teleportation points in a virtual performance space. The experience was reproduced over open-back headphones via a dual-engine system coupling Unity (visuals\, interaction) with REAPER (audio rendering) via OSC\; head-tracked scene rotation and algorithmically added late reverberation were applied in real time\, with binaural decoding using an individualized KEMAR HRTF set. 24 participants with choral backgrounds wore a Meta Quest 3 headset and navigated among the teleportation points. Seven items were rated inside the virtual environment on 5-point scales: source--position connection\, conductor--music temporal coherence\, voice-part localizability\, reverberation--room congruence\, co-presence\, conductor naturalness\, and event plausibility. Results suggested an uneven perceptual profile where audio-spatial aspects of the scene were experienced as comparatively coherent and the depicted musical event as plausible\, whereas the animated conductor was the clearest limitation. The match between reverberation and visible room size was rated as moderate\, and co-presence with the choir and conductor varied widely across participants.
CATEGORIES:SONICOM POSTER/DEMO SESSION
LOCATION:Jussieu:Room 1\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:63b78cd33d44bd686c0d7cadd9f53576
URL:http://aes2026avarigconference.sched.com/event/63b78cd33d44bd686c0d7cadd9f53576
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260630T120000Z
DTEND:20260630T150000Z
SUMMARY:(P) Spatial Audio–Based Indoor Navigation for Blind and Visually Impaired Users: A SONICOM Demonstration
DESCRIPTION:Indoor navigation remains a significant cIRCAM:Galleryenge for blind and visually impaired (BVI) individuals\, particularly in complex environments where conventional assistive solutions rely on visual interfaces or dedicated infrastructure. Spatial audio offers a non-visual alternative by encoding navigational information as virtual sound sources that can be perceived and followed in three-dimensional space. This work presents a demonstration of an indoor navigation system developed within the SONICOM project\, using spatial audio rendering on standard smartphones with open-ear headphones. The system provides continuous directional guidance through virtual auditory targets\, updated in real time based on user motion and orientation\, without requiring additional physical infrastructure. A core research objective of SONICOM has been to improve the perceptual realism of spatial audio and to investigate its impact on navigation performance. In this context\, controlled experiments were conducted to evaluate how different rendering parameters—such as head-related transfer function individualization and rendering fidelity—affect user performance\, confidence\, and cognitive load during navigation tasks. The results of these experiments are presented in the accompanying poster. The demonstration allows participants to experience multiple audio rendering conditions\, including high-fidelity spatial audio\, simplified spatial cues\, and non-spatialized audio. By directly comparing these conditions\, participants can perceive how spatial audio realism influences orientation and movement. This work highlights the potential of spatial audio as an infrastructure-independent approach to accessible indoor navigation and provides an experiential complement to experimental findings on the role of auditory spatialization in assistive technologies.
CATEGORIES:SONICOM POSTER/DEMO SESSION
LOCATION:Jussieu:Room 1\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:c90a46729bdcf488cd3b00216dcdc0d0
URL:http://aes2026avarigconference.sched.com/event/c90a46729bdcf488cd3b00216dcdc0d0
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260630T120000Z
DTEND:20260630T150000Z
SUMMARY:(P) Vaulted Harmonies: from film to VR and Dome architectures
DESCRIPTION:The Vaulted Harmonies project explores the co-evolution of architecture\, acoustics\, and music in the cathedral Notre-Dame de Paris through immersive audiovisual experiences. Following the production of a one-hour animated feature film combining historically informed visual reconstructions and dynamic spatial audio renderings\, three additional dissemination formats were developed: a dome version for planetarium diffusion\, a standalone virtual reality experience\, and a web-based interactive experience. While derived from the same core assets and archaeoacoustic reconstruction workflow\, each platform required specific perceptual\, technical\, and distribution-oriented adaptations. This paper presents the shared adaptation workflow\, platform-specific constraints and rendering strategies\, and discusses scalable dissemination approaches for immersive heritage experiences.
CATEGORIES:SONICOM POSTER/DEMO SESSION
LOCATION:Jussieu:Room 1\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:3d6741ebe3a1dde3599e59cb5abb652c
URL:http://aes2026avarigconference.sched.com/event/3d6741ebe3a1dde3599e59cb5abb652c
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260630T123000Z
DTEND:20260630T133000Z
SUMMARY:Time\, Gender and Auditory Envelopment in Reproduced Sound: A Review
DESCRIPTION:We summarise recent clinical studies on neurosensory differences between humans across age and gender\, in particular related to Auditory Envelopment and reproduced sound. In audio engineering\, listening quality is generally explained and tested considering just snapshot (frequency domain) metrics: Frequency response\, sound pressure level\, distortion and direction of sound. However\, the two elusive dimensions\, time and change\, play the most important roles. Not only in hearing\, but in all our five primary senses. Human sensory physiology is summarised\, along with recent studies on a particular time-domain dimension\, Auditory Envelopment (AEV). Naive and professional subjects of any age and across genders\, point to AEV as a coherent and universal inducer of emotion in sound reproduction. With a conference in Paris attendees are furthermore able to experience AEV at one of most conducive places for it to emerge naturally: Inside the newly renovated Notre Dame cathedral. Using the quadrant model from Nordic universities\, it is discussed if audio engineering\, like medicine and other “objective” sciences\, may have been tending primarily to adult male needs and preferences. Our literature is abundant with investigation of frequency-domain attributes such as power\, punch and mobilisation\; but ignores physiological and mental effects of the time-domain modulation that happens on basically any time-scale\, as we listen. It appears stonemasons centuries ago knew important things about humans and hearing that would have been lost by now\, had it not been for their magnificent monuments. Because of those efforts\, however\, we still have guides to what sound might achieve on new platforms\, when we remember to listen slowly.
CATEGORIES:PERCEPTION
LOCATION:Jussieu:Conf 2 (Binaural)\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:9ce2bfc2e960062fcdda60da4c0ee9f1
URL:http://aes2026avarigconference.sched.com/event/9ce2bfc2e960062fcdda60da4c0ee9f1
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260630T123000Z
DTEND:20260630T133000Z
SUMMARY:Coffee A
DESCRIPTION:
CATEGORIES:SOCIAL EVENT
LOCATION:Cafe / Lunch\, Sorbonne University\, Jussieu campus & IRCAM/STMS\, Paris\, France
SEQUENCE:0
UID:aedee0c5c4443a09cf6da1c584c409b0
URL:http://aes2026avarigconference.sched.com/event/aedee0c5c4443a09cf6da1c584c409b0
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260630T130000Z
DTEND:20260630T133000Z
SUMMARY:Perceptual Assessment of Real-Time Diffraction Modelling in Augmented Reality
DESCRIPTION:Including diffraction modelling in an acoustic simulation is known to improve the plausibility of rendered room acoustics in Virtual Reality (VR). In VR\, acoustic rendering only needs to satisfy the expectations raised by the visual room impression. In Augmented Reality (AR)\, however\, the user’s natural acoustic environment provides an additional reference\, which typically increases the perceptual demands. This study assesses a selection of diffraction modelling approaches in an augmented reality (AR) setting in an L-shaped corridor. The participants rated the plausibility and similarity using a paired-comParison paradigm. ComParisons were included between acoustic simulations and between simulations and a real sound source. This is\, to the best of the authors’ knowledge\, the first experiment investigating diffraction perception in an AR context. The results indicated that room auralisation including diffraction was rated as more plausible than auralisation without\, similar to VR experiments. However\, the real sound source was rated as more plausible than all of the simulations. These observations suggest that the relative performance of room acoustic modelling is perceived similarly in VR and AR experiments\, but needs further improvement to be suitable for occlusion scenarios in AR\, where diffraction modelling might not be the main limitation. In general\, perceptually accurate acoustic modelling of a complex real environment remains a cIRCAM:Galleryenge in AR.
CATEGORIES:AURALIZATION / 6DOF
LOCATION:Jussieu:Conf 1\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:93f99bf667ef175f11d22db652d5b7c8
URL:http://aes2026avarigconference.sched.com/event/93f99bf667ef175f11d22db652d5b7c8
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260630T133000Z
DTEND:20260630T140000Z
SUMMARY:Assessing Efficient Auralization Methods in Architectural Virtual Environments
DESCRIPTION:Auralization enables multisensory evaluation of architectural designs in Virtual Reality (VR)\, yet physically accurate acoustic simulations remain computationally prohibitive for interactive workflows. This study investigates efficient artificial reverberation methods as lightweight proxies for different stages of VR-based architectural design. After assessing the predictive capabilities of geometrically informed models\, a hierarchical 3-Alternative-Forced-Choice listening experiment with a transferring task paradigm was conducted in VR using binaural audio. In this experiment\, the measured room impulse responses of a physical space in untreated and acoustically treated conditions were compared with those from five auralization techniques. These techniques ranged from industry-standard simulations to artificial reverberators\, all calibrated to the measured Energy Decay Curves. Statistical analysis revealed that the pure Image-Source Method was easily detected\, likely because the late reverberation's temporal density was insufficient. Conversely\, when incorporating a dense late reverberant tail\, computationally efficient methods achieved perceptual comparability with high-fidelity simulations. Participants prioritized the timbral quality of late reverberation over geometric early reflections. This suggests that computationally efficient models can serve as convincing\, scalable rendering tools for interactive design and presents this audiovisual VR paradigm as an ecologically valid platform for multisensory architectural assessment.
CATEGORIES:AURALIZATION / 6DOF
LOCATION:Jussieu:Conf 1\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:c836d09db2742bec811a4c85634421e3
URL:http://aes2026avarigconference.sched.com/event/c836d09db2742bec811a4c85634421e3
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260630T133000Z
DTEND:20260630T140000Z
SUMMARY:The M480: High-Resolution Spatial Audio Capture Using a 7th-Order Ambisonic Microphone
DESCRIPTION:The Brittany-based consortium HDaudio3D (LabSTICC\, Brest\; Noise Makers\, Rennes\; and Feichter Audio\, Lannion) has been working for five years on the development of a professional 3D audio recording solution: one offering sufficient spatial accuracy and signal-to-noise ratio\, compatibility with headphones and speakers\, and the ability to manipulate the audio scene in post-production. The solution is now operational and includes a 7th-order ambisonic sensor (with 480 MEMS on the surface of a 16 cm icosahedral tetrahedron)\, signal processing and a software suite. Pascal Rueff\, a sound engineer specialising in binaural sound and François Salmon\, research engineer at Noise Makers\, members of the HDaudio3D consortium\, will present the system and play excerpts from recordings made for various use cases.
CATEGORIES:MICARRAY / HEADWORN
LOCATION:Jussieu:Conf 2 (Binaural)\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:8f1a8b54b87847d298fdf3387e0ce8cf
URL:http://aes2026avarigconference.sched.com/event/8f1a8b54b87847d298fdf3387e0ce8cf
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260630T133000Z
DTEND:20260630T143000Z
SUMMARY:Coffee B
DESCRIPTION:
CATEGORIES:SOCIAL EVENT
LOCATION:Cafe / Lunch\, Sorbonne University\, Jussieu campus & IRCAM/STMS\, Paris\, France
SEQUENCE:0
UID:5c9b2d54659921298ea047e2c9d27c6f
URL:http://aes2026avarigconference.sched.com/event/5c9b2d54659921298ea047e2c9d27c6f
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260630T133000Z
DTEND:20260630T143000Z
SUMMARY:A Tutorial Guide on Sparse Sampling in Audio for Virtual and Augmented Reality
DESCRIPTION:In audio for virtual and augmented reality systems it is often necessary to measure very large data sets. For example\, HRTFs from different spatial angles\, or spatial impulse responses in a room as function of angle or position\, or even training sets for neural networks. This can result in a large number of measurements being required to collect the dataset with adequate fidelity. Even then compromises are often necessary to make the data set manageable. Sparse sampling is a technique that can overcome these limitations. Sparse Sampling allows one to sample the signal at apparently less than the Nyquist/Shannon limit of two times the highest frequency\, without losing any signal fidelity. If some loss of fidelity is allowed the signal can be sampled at an even lower average rate. How can this be? The answer is that the effective information rate is actually lower than the highest frequency. The purpose of this tutorial is to give a (mostly) non-mathematical introduction to Sparse Sampling\, and its application to audio for virtual and augmented reality systems. We will examine the difference between “Sparse” and “Dense” signals and define what is meant by “rate of innovation” and see how it relates to sample rate. We will then go on to see how we can create sparse signals either via transforms or filters to provide signals that can be sample at much lower rates. We will then show how some of these methods are already used in audio\, and suggest other areas of application\, such as measurements in audio for virtual and augmented reality systems\, and parameter sets for neural networks. The tutorial will be accessible to everyone\, you will not have to be an electronic engineer to understand the principles behind this alternative approach to sampling for audio for virtual and augmented reality systems.
CATEGORIES:SPARSE SAMPLING
LOCATION:Jussieu:Room 3\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:7edc749376a8fb1e2f2a9381c8a19862
URL:http://aes2026avarigconference.sched.com/event/7edc749376a8fb1e2f2a9381c8a19862
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260630T140000Z
DTEND:20260630T143000Z
SUMMARY:Adapting to Manipulated Acoustic Distance Laws
DESCRIPTION:This study investigates whether humans can adapt to manipulated auditory distance cues in virtual environments. While adaptation to remapped auditory localization cues is well established\, it remains unclear whether similar processes apply to distance perception\, particularly when natural acoustic cues are systematically modified. Virtual reality (VR) systems often employ non-ecological distance laws to improve the intelligibility of distant sound sources\, which can introduce conflicts between auditory and visual information. To examine perceptual adaptation under such conditions\, we modified a binaural real-time room acoustic simulation engine with rendering in six-degrees-of-freedom. The manipulation consisted of holding the direct sound level constant across distance while applying a distance-dependent low-pass filter. Over four consecutive days\, participants completed a training protocol combining alternating testing phases with gamified training sessions. Results show that participants successfully adapted to the altered distance cues\, with most learning occurring within the first two days. Initial exposure to the manipulation severely disrupted distance perception\, rendering participants unable to make reliable judgments. However\, following training\, perceived distance functions approached veridical performance for far distances\, exhibiting slopes close to unity. In contrast\, judgments at close distances remained highly variable\, suggesting that the available spectral cues were insufficient for accurate estimation in this range. These findings demonstrate that auditory distance perception can be recalibrated through short-term perceptual learning\, even when initial perceptual mappings are strongly degraded. Adaptation generalizes across contexts within the virtual environment\, although limitations persist for near-field perception.
CATEGORIES:AURALIZATION / 6DOF
LOCATION:Jussieu:Conf 2 (Binaural)\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:eb4a1e7a680e1a6a5a159dbbd4bd5989
URL:http://aes2026avarigconference.sched.com/event/eb4a1e7a680e1a6a5a159dbbd4bd5989
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260630T143000Z
DTEND:20260630T150000Z
SUMMARY:Real-Time Spatial Auralization for Collaborative Architectural Design in Virtual Reality
DESCRIPTION:Acoustic performance remains insufficiently addressed in early-stage architectural design\, where visual and spatial considerations typically guide decision-making. The present research examines the integration of real-time spatial auralization into multi-user virtual reality environments to facilitate collaborative evaluation of architectural acoustic performance during design exploration. A multi-user VR framework has been developed that embeds real-time binaural auralization within a collaborative design evaluation context. The framework supports shared inhabitation of virtual models\, auditory exploration of spatialized sound sources\, and systematic investigation of how spatial configurations and material properties affect perceived acoustic characteristics. Temporal soundscapes and acoustic implications of alternative materials and spatial arrangements can be evaluated through a gesture-driven interface. Spatialized voice communication further supports co-inhabitation of virtual environments and acoustically informed discussion. By embedding real-time auralization within collaborative virtual environments\, the framework repositions acoustic performance as an experiential dimension of architectural design\, enabling earlier incorporation of auditory considerations into the design process. An exploratory user study indicates that immersive\, real-time auralization can integrate acoustic feedback into collaborative architectural design workflows\, supporting multisensory evaluation and reducing dependence on late-stage corrective interventions.
CATEGORIES:AURALIZATION / 6DOF
LOCATION:Jussieu:Conf 2 (Binaural)\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:5fae3f7fde45e6daf23d03b37e7c738c
URL:http://aes2026avarigconference.sched.com/event/5fae3f7fde45e6daf23d03b37e7c738c
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260630T150000Z
DTEND:20260630T153000Z
SUMMARY:Comparing Three Movement Simulation Algorithms from Discrete Impulse Responses
DESCRIPTION:Reproducing the acoustic consequences of source or receiver motion from a set of discrete static room impulse responses (RIRs) is a fundamental cIRCAM:Galleryenge in spatial audio processing\, with direct relevance to the generation of training and evaluation data for machine learning systems operating on reverberant speech and audio. This paper presents a comparative evaluation of three offline algorithms for acoustic motion simulation\, two of which are ports of previously published methods and one of which is an independent implementation conceptually related to prior work. All three methods are evaluated against a common reference using two complementary validation approaches: an objective analysis based on interaural time difference (ITD) estimation from synthetic binaural signals\, and a perceptual evaluation conducted under the MUSHRA protocol using stimuli drawn from a controlled moving-receiver database. Results indicate that frequency-domain interpolation between neighbouring impulse responses provides the most accurate binaural cue reproduction and the highest perceptual similarity to the reference under spectrally demanding stimuli\, while nearest-neighbour switching produces the most pronounced artefacts under broadband excitation. Time-domain crossfading between fully convolved signals yields intermediate performance\, achieving parity with frequency-domain interpolation for speech and noise but falling significantly behind for music. The combination of ITD-based objective analysis and MUSHRA perceptual evaluation proved informative in characterising method differences\, and the two measures converged on a consistent performance ordering across methods.
CATEGORIES:AURALIZATION / 6DOF
LOCATION:Jussieu:Conf 2 (Binaural)\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:327e02b586be4c1a5830471a129c0e6c
URL:http://aes2026avarigconference.sched.com/event/327e02b586be4c1a5830471a129c0e6c
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260630T150000Z
DTEND:20260630T160000Z
SUMMARY:Head-tracked loudspeaker beamforming for spatial audio reproduction
DESCRIPTION:Multi-channel compact linear loudspeaker arrays combined with Crosstalk Cancellation (CTC) can deliver binaural audio to a user by directing sound precisely at the listener’s ears. This enables the reproduction of binaural\, and thus all given spatial audio formats\, without the need for headphones. The technique traditionally suffers from a small sweet-spot\, which can be overcome by combining real-time head-tracking and position-adaptive beamforming. This workshop introduces the concept of CTC and beamforming combined with user head-tracking\, covering both its theoretical foundations and the practical considerations of incorporating head tracking. A live demonstration will showcase head-tracked binaural audio delivered through a position-adaptive CTC soundbar.
CATEGORIES:BINAURAL
LOCATION:Jussieu:Room 3\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:ec992bc390897e54693315b56853045c
URL:http://aes2026avarigconference.sched.com/event/ec992bc390897e54693315b56853045c
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260630T153000Z
DTEND:20260630T160000Z
SUMMARY:Immersive recording with a virtual 3D microphone array using spatial information of virtual sound sources sampled in a target space.
DESCRIPTION:We have proposed a method called V2MA (VSVerb Virtual Microphone Array). This method virtually generates spatial room impulse responses (SRIRs) captured by a conventional microphone array using only a set of four impulse responses (IRs) measured by an A-format microphone in the target space. V2MA is based on the concept of geometrical acoustics\, that involves a virtual sound source\, also known as a mirror source. After measuring the four IRs using an A-format microphone\, we calculate the instantaneous sound intensities in the x\, y\, and z directions. The “source intensities\,” that contain sound source information\, are detected from these sound intensities. Then\, we estimate the locations\, strengths\, and phase characteristics of the sound sources. The spatial properties of the obtained virtual sound sources can be considered the fingerprint of a target space's reverberant characteristics. Using the manner of geometrical acoustics\, we can update the spatial properties of the virtual sound sources to match a neighboring receiver position and a desired directivity. Lastly\, we can obtain the SRIRs at any receiver (microphone) position with any directivity in the target room by translating the spatial information of the updated virtual sound sources into time responses. For immersive recording\, large microphone arrays are often used. Using V2MA\, we can virtually make an immersive recording using such a virtual microphone array\, provided that we measure four IRs using an A-format microphone in the target space. In our previous study\, we developed the framework of V2MA. To verify the plausibility of V2MA\, this paper compares the responses of virtual (V2MA) and real (conventional) microphone arrays using measurement results collected in a practice IRCAM:Gallery under the various conditions. The results show similar overall characteristics\, but also suggest the difficulty of a detailed evaluation. We also introduce practical examples of immersive recording using V2MA.
CATEGORIES:AURALIZATION / 6DOF
LOCATION:Jussieu:Conf 2 (Binaural)\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:fc9d6d022914e4385d5cf9a67fc0df2c
URL:http://aes2026avarigconference.sched.com/event/fc9d6d022914e4385d5cf9a67fc0df2c
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260630T163000Z
DTEND:20260630T190000Z
SUMMARY:Welcome cocktail
DESCRIPTION:
CATEGORIES:SOCIAL EVENT
LOCATION:Jussieu Auditorium\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:3eca03f4072910b10a3015470d3d3b3a
URL:http://aes2026avarigconference.sched.com/event/3eca03f4072910b10a3015470d3d3b3a
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T073000Z
DTEND:20260701T080000Z
SUMMARY:Coffee
DESCRIPTION:
CATEGORIES:SOCIAL EVENT
LOCATION:Cafe / Lunch\, Sorbonne University\, Jussieu campus & IRCAM/STMS\, Paris\, France
SEQUENCE:0
UID:c52f9bf3439978feb797a432dc5834df
URL:http://aes2026avarigconference.sched.com/event/c52f9bf3439978feb797a432dc5834df
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T080000Z
DTEND:20260701T090000Z
SUMMARY:Is now the right time for Procedural Audio?
DESCRIPTION:Procedural audio\, sometimes known as digital Foley\, is the real-time and controllable generation of sound effects. It is an alternative to sourcing sound effects from vast libraries of pre-recorded samples. It may be used to have sounds adapt to the changing game state\, and to dynamically generate all the sounds of a virtual world. However\, there are cIRCAM:Galleryenges concerning the diversity of sounds that may be generated\, the controllability of procedural audio models and the quality of the sounds that it produces. We address all of these aspects in this presentation. We showcase the opportunities that procedural audio offers and how the cIRCAM:Galleryenges can be surmounted\, while providing demonstrations of these concepts. The session opens with an introduction to the presenters before moving into a broad review of procedural audio and its history in game sound design\, covering core concepts\, prior uses\, and how the technology has developed over time. A video presentation accompanies this overview before the workshop turns to an honest examination of the key cIRCAM:Galleryenges facing the field: the diversity of sounds that can be generated\, the controllability of procedural models\, and the quality of their output. Recent advances tackling these limitations are then discussed\, followed by live demonstrations of state-of-the-art procedural audio systems from Nemisindo\, which generate dynamic\, immersive soundscapes in real time. The session closes with an open questions and answers segment. Attendees will leave with practical insight into how procedural audio can enhance and expand the creative process for game sound designers\, a clearer understanding of how to implement dynamic and adaptive sound in their own projects\, hands-on exposure to interactive soundscape techniques\, and concrete tips and tricks for improving their game audio practice. This session is suitable for sound designers\, game developers\, and anyone curious about the future of game audio\, no prior knowledge of procedural audio is needed.
CATEGORIES:APPLICATION / GAMING
LOCATION:Jussieu:Room 3\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:3c595ea7e81357fbce79868488253ef9
URL:http://aes2026avarigconference.sched.com/event/3c595ea7e81357fbce79868488253ef9
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T080000Z
DTEND:20260701T083000Z
SUMMARY:Designing Spatial Audio Environments for Autonomic Regulation: Toward a Framework of Social Sonic Design
DESCRIPTION:Research in spatial audio has traditionally focused on localization accuracy\, spatial realism\, and rendering algorithms. Comparatively little work has examined how intentionally designed spatial audio environments may influence listener physiological regulation and emotional perception. This paper introduces the concept of Social Sonic Design\, a framework that examines how spatially organized sonic information within and external to XR contexts may affect autonomic nervous system response and autobiographical memory. Spatial audio cues such as proximity\, elevation\, and diffuse reverberation influence listener perception of environmental stability and safety. Building on these perceptual principles\, the present study investigates whether object-based audio environments incorporating temporally structured and personally meaningful sound materials can influence listener physiological state. Spatial audio infrastructures were constructed from lullabies\, caregiving vocalizations\, and environmental sonic textures. Postpartum mothers were selected as an initial participant group because caregiving sound environments and lullaby traditions play a central role in maternal–infant interaction and emotional regulation. Immersive sonic infrastructures were produced using spatial audio capture and design techniques including Dolby Atmos multichannel rendering (7.1.2) and binaural headphone reproduction. Sound sources were modified to activate three targeted neuro-cognitive nodes and spatially distributed across the listening environment to create immersive auditory scenes incorporating foreground vocal sources\, diffuse environmental textures\, and spatialized reverberant fields. Participants experienced these environments in brief episodic listening sessions accompanied by visual media. The episodic presentation structure draws inspiration from media models such as those developed by Miguel Sabido\, in which repeated exposure to idealized sensory environments may influence perception and behavioral response over time. Physiological responses were monitored using measures associated with autonomic nervous system activity\, including heart rate and heart rate variability (HRV)\, alongside reports of perceived calm\, emotional response\, and autobiographical memory recall. Preliminary observations suggest that the sonic infrastructure of immersive lullaby environments evokes caregiving memories and perceived emotional grounding among participants\, indicating that spatially designed sonic environments may contribute to changes in listener perception and physiological regulation.
CATEGORIES:AUTHORING
LOCATION:Jussieu:Conf 2 (Binaural)\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:3f6b85f5eb5870913a2541ce597dec63
URL:http://aes2026avarigconference.sched.com/event/3f6b85f5eb5870913a2541ce597dec63
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T080000Z
DTEND:20260701T103000Z
SUMMARY:(P) Comparing Immersive VR and Tablet-Based Music Experience: Subjective and Physiological Responses Across Presentation Formats
DESCRIPTION:Immersive virtual reality (VR) is increasingly used to simulate concert experiences\, yet it remains unclear whether its experiential advantages are accompanied by corresponding physiological changes when audiovisual content is held constant. The present study compared head-mounted immersive VR concert playback with tablet-based video in a fully counterbalanced within-subject design. Musical content and audio reproduction were identical across conditions\, isolating the effect of visual immersion. Results showed consistently higher subjective ratings in VR across all measures\, including music-induced affect\, chills\, perceived presence\, and performance liking. In contrast\, physiological differences were comparatively small: electrodermal activity showed only a modest increase in VR\, and heart rate variability did not reliably differentiate between conditions. These findings suggest that immersive VR substantially enhances subjective music experience\, particularly in terms of presence and affective engagement\, while corresponding changes in autonomic physiology are limited under controlled conditions. The results indicate that subjective and physiological responses were differentially sensitive to the presentation-format manipulation.
CATEGORIES:COGNITION
LOCATION:Jussieu:Room 1\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:5ca666414bb32faff51e3e14dca9dbfe
URL:http://aes2026avarigconference.sched.com/event/5ca666414bb32faff51e3e14dca9dbfe
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T080000Z
DTEND:20260701T103000Z
SUMMARY:(P) Effects of Navigation and Perspective on Presence and Localization in Audio Augmented Reality
DESCRIPTION:Audio Augmented Reality (AAR) can be experienced through different navigation techniques that may influence presence and spatial perception. This paper investigates the effects of navigation type and listener perspective on exploration behavior\, presence\, and localization accuracy in AAR systems built with consumer hardware. A within-subjects study compared four conditions: virtual navigation\, virtual navigation with head tracking\, physical navigation\, and physical navigation with head tracking. Fifteen participants completed exploration and sound localization tasks in each condition. Results show that physical navigation increased presence and improved exploration behavior and localization performance compared to virtual navigation\, while head tracking paired with a non-individualized HRTF for binaural rendering did not produce significant effects.
CATEGORIES:COGNITION
LOCATION:Jussieu:Room 1\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:9c987f4dadfabcf66636f773c7550568
URL:http://aes2026avarigconference.sched.com/event/9c987f4dadfabcf66636f773c7550568
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T080000Z
DTEND:20260701T103000Z
SUMMARY:(P) Effects of Spatial Audiovisual Incongruency on Mental Workload and Task Performance in Immersive VR
DESCRIPTION:Virtual Reality has emerged as a promising medium for high-stakes training\, yet its predominantly visual design places disproportionate demands on attentional resources\, limiting capacity for other task-relevant information. Spatial audio cues exploit the underutilized auditory channel to redistribute this load\, with demonstrated improvements in reaction time\, search efficiency\, and situational awareness. However\, when audio cues are spatially incongruent with visual targets\, task performance degrades. The cognitive and behavioral costs of such incongruency\, particularly under increasing visual complexity\, remain underexplored. This pilot study examines how audiovisual spatial incongruency affects mental workload and task performance through a within-subjects VR experiment in which 15 participants complete a search-and-respond task across congruent and incongruent audiovisual conditions at three levels of visual complexity. Reaction time\, target accuracy\, timeouts\, and subjective workload are measured across 10 trials per participant. Audiovisual incongruency is hypothesized to increase mental workload and impair performance\, with effects amplified under higher visual complexity. Findings will inform spatial audio design for immersive training systems and motivate further investigation into tolerance thresholds for audiovisual misalignment.
CATEGORIES:COGNITION
LOCATION:Jussieu:Room 1\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:1126fc2e2de17bafdf24d4e6a8d4e717
URL:http://aes2026avarigconference.sched.com/event/1126fc2e2de17bafdf24d4e6a8d4e717
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T080000Z
DTEND:20260701T103000Z
SUMMARY:(P) MAV-C: A Framework for the Joint Objective Estimation of Audio-Visual Complexity in Immersive Virtual Environments
DESCRIPTION:This paper introduces MAV-C\, an offline\, signal-based framework for the joint objective estimation of Audio-Visual Complexity (AVC) in locally-rendered interactive games. MAV-C integrates entropy-based Acoustic Scene Complexity (ASC) features with multi-scale visual complexity metrics adapted to video via optical flow variance\, and fuses modality-specific scores via Minkowski pooling. Features are normalized to a common scale relative to analytical bounds\, ensuring cross-sequence comparability. We present the framework architecture\, report initial verification results on synthetic stimuli with known complexity properties\, and outline a parametric sensitivity analysis evaluating the effect of Entropy Weight Method (EWM) regularization\, motion scaling\, and pooling exponent on discriminability across gameplay sequences of varying complexity.
CATEGORIES:COGNITION
LOCATION:Jussieu:Room 1\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:72e267b66345ac15cd5b5d9fb773d6a8
URL:http://aes2026avarigconference.sched.com/event/72e267b66345ac15cd5b5d9fb773d6a8
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T080000Z
DTEND:20260701T103000Z
SUMMARY:(P) Complex Ratio Mask Ambisonics-to-Binaural Rendering with Intensity Vector Features and Perceptual Multi-Objective Optimization
DESCRIPTION:This paper presents a neural network architecture for binaural rendering of first-order Ambisonics (FOA) signals\, enabling headphone listeners to perceive immersive spatial audio from Ambisonic content without requiring individualized Head-Related Transfer Function measurements at the inference time. The model operates in the STFT domain using Complex Ratio Masks (CRM). Unlike magnitude-mask methods that process only the omnidirectional channel and discard phase\, the proposed model predicts a shared CRM pair (left and right ear) applied via complex-valued multiplication to all four FOA channels with directional weighting. The omnidirectional channel W contributes at unit weight while directional channels Y\, Z\, X are weighted at a reduced level\, preserving both magnitude and phase information from the full soundfield. The input representation extends standard spectral features with three intensity vector channels that encode sound arrival direction at each time-frequency bin\, providing the network with explicit spatial information alongside magnitude and phase cues. Training uses a multi-objective loss that combines waveform-level accuracy (SI-SDR)\, multi-resolution spectral reconstruction at three complementary time-frequency scales\, and interaural level and phase difference terms to jointly optimize signal fidelity and spatial cue preservation. The encoder-decoder backbone is a four-level UNet with residual convolutional blocks and channel-spatial attention at every level\, totaling approximately four million parameters. Evaluation against a prior magnitude-masking architecture with 28 million parameters shows that the CRM variant achieves comparable spatial cue preservation with a seven-fold parameter reduction while gaining access to phase information. Processing the signal in a single STFT-domain forward pass avoids the sequential inference of autoregressive time-domain models\, yielding computational efficiency suitable for real-time virtual reality deployment
CATEGORIES:HOA
LOCATION:Jussieu:Room 1\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:bcf7b2b07bf70782018120003969ddd4
URL:http://aes2026avarigconference.sched.com/event/bcf7b2b07bf70782018120003969ddd4
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T080000Z
DTEND:20260701T103000Z
SUMMARY:(P) Perceptual Limits of Ambisonic Order for Beamforming in Complex Acoustic Scenes
DESCRIPTION:This study investigates the perceptually sufficient ambisonic order for beamforming in complex acoustic scenes\, defined as the minimum spatial resolution above which no audible improvement is perceived. Two beamforming methods were evaluated: hypercardioid and MVDR beamforming. In contrast to previous studies\, the case of an ideal microphone array was considered\, in order to the evaluate the beamforming methods independently of ambisonic encoding error. Sound scenes were generated using room acoustic simulations and encoded into ambisonic signals. A perceptual evaluation was conducted using a three-interval/two-alternative forced choice (3I/2AFC) test design with an adaptive procedure. The experiment used a production-constrained reference (7th-order) and a high-order reference (19th-order). Results showed that the required order would depend on the beamforming method and characteristics of the sound scene. Diffuseness profiles can be used to analyze the influence of the ambisonic order on the sound field diffuseness and to evaluate whether the directional information available is sufficient to support effective adaptive beamforming.
CATEGORIES:HOA
LOCATION:Jussieu:Room 1\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:82f9189aaaa536367e33858b1f0568dc
URL:http://aes2026avarigconference.sched.com/event/82f9189aaaa536367e33858b1f0568dc
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T080000Z
DTEND:20260701T103000Z
SUMMARY:(P) The role of room acoustics in 6DoF interactive audio for XR
DESCRIPTION:Binaural rendering in extended reality (XR) often employs static acoustic profiles that may not correspond to the user’s visual environment\, potentially leading to cross-modal incongruence and the room divergence effect. However\, the influence of acoustic–visual mismatch on immersion and cognitive load in interactive six-degrees of-freedom (6DoF) environments remains unclear. This study investigated the impact of acoustic–visual divergence on presence and subjective workload during real-time object interaction. An ITU-R BS.1116-3 compliant critical listening room was reconstructed at 1:1 scale in Unreal Engine 5. Ten critical listeners navigated the environment using a Meta Quest 3 headset while performing a 6DoF hand-tracking task. Spatial audio with virtual acoustics was rendered through OSC. Three acoustic conditions were evaluated: acoustically matched (RT60 = 0.21 s)\, anechoic (RT60 = 0 s)\, and highly reverberant (RT60 = 2.0 s). Presence and workload were assessed using the IPQ and NASA-TLX. Results showed a significant reduction in Spatial Presence only between the matched and highly reverberant conditions\, while workload remained unaffected. The findings suggest that excessive reverberation disrupts environmental plausibility\, whereas reflection absence can be partially compensated by visual and sensorimotor cues.
CATEGORIES:PLAUSIBILITY
LOCATION:Jussieu:Room 1\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:c46ded410295f7b1e51c73f06152fc54
URL:http://aes2026avarigconference.sched.com/event/c46ded410295f7b1e51c73f06152fc54
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T080000Z
DTEND:20260701T103000Z
SUMMARY:(P) AT_WaveSpace: a Wave Field Synthesis Engine for Research and Authoring\, Applied to Near-Field Distance Perception of Focused Sources
DESCRIPTION:Wave Field Synthesis is a well-established spatialization technique that solves the sweet-spot limitation of conventional sound reinforcement and uniquely allows the synthesis of focused sources — virtual sources positioned between the loudspeaker array and the listener. Despite its potential for extended reality (XR)\, WFS has remained confined to specialized environments such as live performance\, installation or post-production workflows\, with no accessible open-source tooling for research and creative authoring. We present AT_WaveSpace\, an open-source WFS engine built on JUCE\, distributed under the MIT licence\, and integrated into Unity game engine\, designed to democratize WFS for researchers\, developers and creators. Building on the methodological framework of the SoundScape Renderer — which combined WFS engine development with a perceptual research platform — AT_WaveSpace serves simultaneously as a spatial audio delivery tool and as an experimental tool. A perceptual evaluation of near-field distance perception of focused WFS sources was conducted using this framework — a dimension absent from prior literature. Using a Midpoint ComParison procedure\, participants were unable to rank sources at 40–100 cm consistently\, while they ranked sources at 120–150 cm in correct order. Spectral centroid analysis reveals a distance-dependent timbral variation in the proximal zone whose physical origin remains unclear. Low-frequency ILD remains the primary candidate cue for correct ranking at 120–150 cm. Perspectives for further studies are outlined.
CATEGORIES:SOUNDFIELD
LOCATION:Jussieu:Room 1\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:925ebf541596bd5c91ff0bb2c5dbe357
URL:http://aes2026avarigconference.sched.com/event/925ebf541596bd5c91ff0bb2c5dbe357
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T080000Z
DTEND:20260701T103000Z
SUMMARY:Sponsor demos
DESCRIPTION:Come see the newest developements of our sponsors
CATEGORIES:SPONSOR DEMOS
LOCATION:Jussieu:Room 2\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:afb22feff4a5b4c2bc0f949255f607de
URL:http://aes2026avarigconference.sched.com/event/afb22feff4a5b4c2bc0f949255f607de
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T083000Z
DTEND:20260701T090000Z
SUMMARY:Sound as Cultural Memory: Participatory Immersive Audio Production for the Witness Blanket VR Environment
DESCRIPTION:This case study examines the Witness Blanket VR Experience to explore how Indigenous‑led immersive audio production can support the safeguarding of intangible cultural heritage in virtual environments. Grounded in Indigenous epistemologies of listening\, the study draws on participatory sound collection\, documentation of the audio production workflow\, and subjective evaluation through community‑engaged events. Results demonstrate how spatial audio and culturally grounded production protocols can enable relational storytelling\, ethical engagement\, and protocol‑informed VR design.
CATEGORIES:AUTHORING
LOCATION:Jussieu:Conf 2 (Binaural)\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:99251b3e4c0efc642d2edfa8b12300ce
URL:http://aes2026avarigconference.sched.com/event/99251b3e4c0efc642d2edfa8b12300ce
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T090000Z
DTEND:20260701T093000Z
SUMMARY:Mémoire Vive: Exploring the use of authentic rendering as a narrative process in an AR fiction
DESCRIPTION:This paper presents an exploratory mixed reality prototype investigating how low-cost spatial audio and XR technologies may already enable partially believable augmented reality experiences. Rather than pursuing maximal realism across all modalities\, the project relied on selective auditory realism\, intentionally degraded sounds and visuals\, and progressive perceptual trust building in order to create plausible auditory events within a mixed reality environment. Participants were introduced to a fictional experimental protocol progressively constructing increasingly dense auditory reconstruction around them. At the end of the experience\, all virtual reconstructions abruptly disappeared\, leaving participants alone in the now silent physical room\, revealing the extent to which virtual events had progressively contaminated their perception of reality. Qualitative observations suggest that coherent multimodal staging and expectation shaping play an important role in perceptual acceptance alongside rendering realism itself. Beyond the presented prototype\, the project highlights how current XR and spatial audio tools already enable new forms of immersive narrative experiences based on persistent ambiguity between reality and virtual reconstruction.
CATEGORIES:AUTHORING
LOCATION:Jussieu:Conf 2 (Binaural)\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:da8dbeb37c4d2c2eece441b15b88fb45
URL:http://aes2026avarigconference.sched.com/event/da8dbeb37c4d2c2eece441b15b88fb45
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T090000Z
DTEND:20260701T093000Z
SUMMARY:Assessing localisation and localisation uncertainty for off-centre listening in a stereo loudspeaker setup
DESCRIPTION:In loudspeaker-based reproduction\, the spatial quality deteriorates when the listeners move outside the sweet spot. While this seems well known in the spatial audio community\, perceptual data that allows quantifying this effect is not common\, which prevents suggesting solutions for off-centre listening. In this study\, we collected perceptual data to answer three main hypotheses: (H1) that localisation for stereo reproduction over loudspeakers and over headphones with binaural recordings of a dummy head would result in similar perceptual outcomes\, (H2) that translation of the listener is equivalent to adding the corresponding interchannel time and level differences in the sweet spot\, and (H3) that the spread of localisation responses is correlated to the localisation uncertainty perceived by the listener. Regarding H1\, the responses for binaural recordings and loudspeakers were equivalent within a 2° margin. Regarding H2\, localisation off-centre produced only a shift in the responses compared to interchannel time and level differences in the sweet spot. Regarding H3\, the spread in localisation responses strongly correlated with the perceived uncertainty ratings. Altogether\, the results suggest that a localisation test using binaural recordings in the sweet spot — including interaural time and level differences — may be sufficient to characterise off-centre localisation and localisation uncertainty for stereo reproduction.
CATEGORIES:STEREO
LOCATION:Jussieu:Conf 1\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:2465c35f3d0d889ad6abb67785a83ad5
URL:http://aes2026avarigconference.sched.com/event/2465c35f3d0d889ad6abb67785a83ad5
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T093000Z
DTEND:20260701T103000Z
SUMMARY:The Binaural Rendering Toolbox (BRT)
DESCRIPTION:This workshop introduces the Binaural Rendering Toolbox (BRT)\, a set of open-source (GPLv3) software libraries\, applications\, and definitions aimed as a virtual laboratory for spatial psychoacoustic experimentation. The BRT provides a flexible and modular framework for binaural spatialisation\, supporting multiple rendering models\, including convolution-based and geometric approaches\, as well as advanced features such as source directivity\, several room acoustics models\, individual HRTFs\, BRIRs\, near-field simulation\, and real-time control via OSC.
CATEGORIES:APPLICATION / GAMING
LOCATION:Jussieu:Conf 2 (Binaural)\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:4b277554ee7ca1b6393dcb769aeda03b
URL:http://aes2026avarigconference.sched.com/event/4b277554ee7ca1b6393dcb769aeda03b
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T093000Z
DTEND:20260701T103000Z
SUMMARY:Crosstalk Cancellation in Loudspeaker Arrays: Effects of Directivity\, Array Size\, and Listener Position
DESCRIPTION:Crosstalk Cancellation (CTC) is a technology that enables binaural audio reproduction over loudspeakers. The performance of a CTC system depends on multiple factors\, including the geometry of the system\, the characteristics of the loudspeakers\, and the accuracy of the plant models used to design the CTC filters. While previous studies have examined some of these factors\, the combined influence of loudspeaker directivity\, array size and listener position has received limited attention. This study models loudspeakers with a spherical pole cap and uses interpolated Neumann KU 100 head-related transfer functions to generate accurate plant responses. CTC filters are computed using a Tikhonov-regularised pseudoinverse approach\, and numerical simulations are performed to evaluate the impact of directivity\, array geometry and listener orientation on CTC performance.
CATEGORIES:TRANSAURAL
LOCATION:Jussieu:Conf 1\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:4e3eb4efb887686fb746f93d337f519d
URL:http://aes2026avarigconference.sched.com/event/4e3eb4efb887686fb746f93d337f519d
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T100000Z
DTEND:20260701T110000Z
SUMMARY:Lunch A
DESCRIPTION:
CATEGORIES:SOCIAL EVENT
LOCATION:Cafe / Lunch\, Sorbonne University\, Jussieu campus & IRCAM/STMS\, Paris\, France
SEQUENCE:0
UID:941413f203eba40e8133420cb4c5ef90
URL:http://aes2026avarigconference.sched.com/event/941413f203eba40e8133420cb4c5ef90
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T100000Z
DTEND:20260701T103000Z
SUMMARY:The illusion of elevated lateral sources using loudspeakers in the horizontal plane
DESCRIPTION:Spectral manipulation techniques offer a means of generating virtual sound-source elevation using horizontal loudspeakers. In comParison to cross-talk cancellation systems\, these techniques can be more flexible and operate with even a single loudspeaker. However\, the azimuthal stability of such approaches remains uncharacterised. This study evaluates the effectiveness of magnitude-based difference-spectrum filtering across lateral source positions\, including intermediate positions rendered via amplitude panning\, in loudspeaker-based reproduction. Direction-dependent filters derived from a mean HRTF magnitude response were applied over a horizontal-plane loudspeaker array\, with physically elevated loudspeakers at matched azimuths serving as perceptual references. Perceived virtual elevation was quantified using the illusion ratio\, a novel metric expressing virtual elevation shift as a proportion of the physical elevation shift at each azimuth. Virtual elevation reached approximately 50% of the physical elevation shift at central azimuths\, decreasing significantly with lateral displacement\, consistent with the reduced effectiveness of monaural spectral cues at lateral positions. A greater virtual elevation effect was observed for ipsilateral rather than contralateral source positions relative to the filter ear. Stimulus class did not significantly alter the azimuth-dependent structure of the effect. These results demonstrate that magnitude-based spectral elevation synthesis produces a measurable and robust elevation effect\, most pronounced for central sources.
CATEGORIES:TRANSAURAL
LOCATION:Jussieu:Conf 1\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:bf42d0a946fd08d78f238d32022aff2f
URL:http://aes2026avarigconference.sched.com/event/bf42d0a946fd08d78f238d32022aff2f
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T103000Z
DTEND:20260701T110000Z
SUMMARY:Extending Audio Accessibility Toward Structured and Immersive Auditory Interaction in Gameplay
DESCRIPTION:This work investigates how audio accessibility in games can be reconceptualized as a structured and immersive auditory interaction paradigm\, rather than a collection of discrete assistive cues. While existing approaches have improved access to gameplay information for visually impaired players\, they remain largely event-driven and fragmented\, often presenting auditory signals as isolated notifications. Such approaches may limit perceptual continuity and fail to reflect the dynamic\, layered nature of interactive environments. The proposed system introduces an auditory information architecture that organizes gameplay information into four continuous layers: navigation\, interaction\, salience\, and environment. Each layer represents a distinct yet interrelated perceptual function. Navigation encodes spatial orientation and movement\, interaction conveys player actions and system responses\, environment reflects ambient and contextual information\, and salience integrates perceptually relevant events—including hazards\, state transitions\, and attention-driven signals—into a unified and context-sensitive layer. By structuring auditory output across these layers\, gameplay information is represented as continuously evolving auditory processes rather than discrete cues. The system is implemented using Unreal Engine and Audiokinetic Wwise\, with Max/MSP and RNBO used to extend real-time audio processing. Rather than prioritizing novelty through fully generative synthesis\, the approach focuses on transforming and reorganizing existing sound materials through continuous parameter mapping. This enables adaptive auditory behavior while maintaining perceptual clarity and consistency with the game’s sonic identity. Interaction design is extended through a multi-source input framework. A camera-based input layer combines webcam-based motion capture with analysis of screen-mediated interactions\, including controller inputs (e.g.\, mouse\, gamepad\, touch) and player-character movement within the game environment. These inputs are translated into perceptual features and mapped to auditory parameters\, forming a bidirectional interaction loop in which player behavior directly influences auditory output. A user study is planned to evaluate the effectiveness of the proposed system in non-visual navigation tasks. The study will compare the layered auditory architecture with conventional cue-based approaches. Evaluation metrics will include objective measures such as navigation accuracy and task completion time\, as well as subjective measures including perceived spatial awareness\, perceptual continuity\, and immersion. In addition\, the study will examine the impact of camera-based interaction on engagement and perceived agency. The evaluation is designed to investigate whether continuous auditory representation improves coherence between auditory feedback and gameplay experience. A central contribution of this work lies in the aesthetic integration of accessibility. Rather than functioning as an external assistive layer\, accessibility-oriented audio is embedded within the core sound design. Informational signals emerge through transformations of existing sound materials\, allowing perceptual clarity to be achieved without disrupting immersion. This reframes audio accessibility as an integral component of auditory interaction design. From a practical perspective\, the system is structured as a modular and parameter-driven framework\, allowing scalable implementation across different platforms. Potential constraints related to computational load—particularly in real-time processing and camera-based input—are considered\, with an emphasis on efficient parameter mapping and system optimization for resource-limited environments such as mobile and virtual reality.
CATEGORIES:APPLICATION / GAMING
LOCATION:Jussieu:Conf 2 (Binaural)\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:eb3ddf3598e94330b68c4083a48aa8e2
URL:http://aes2026avarigconference.sched.com/event/eb3ddf3598e94330b68c4083a48aa8e2
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T103000Z
DTEND:20260701T110000Z
SUMMARY:Trading of interaural time and level differences for stimuli presented using a novel two-listener virtual imaging system
DESCRIPTION:Extensive research has investigated the relative influence of interaural level and time differences (ILDs and ITDs) on the perceived position of aural stimuli. Historically\, these cues have been compared using trading methods with stimuli presented over headphones. For the purpose of virtual audio applications using multichannel techniques\, it is important to establish whether interaural cues are exploited similarly in such listening conditions. In this work\, trading experiments were carried out both with stimuli presented over headphones and using a novel two-listener crosstalk cancellation array. Listener responses revealed similar trading behaviour in the crosstalk cancellation case when compared to the headphones case. At the lowest frequency tested\, the measured trading behaviour is considered less reliable due to inaccuracies in reproduction of the target stimuli. With this exception\, this work demonstrates that the general trends observed in historical ILD/ITD trading experiments also apply to stimuli presented using crosstalk cancellation\, namely increased sensitivity to ILD and decreased sensitivity to ITD with increasing frequency.
CATEGORIES:TRANSAURAL
LOCATION:Jussieu:Conf 1\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:c5036fca4b617086f08ff973683f7926
URL:http://aes2026avarigconference.sched.com/event/c5036fca4b617086f08ff973683f7926
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T110000Z
DTEND:20260701T143000Z
SUMMARY:(P) Listening from the Booth: Multi-Perspective Auralisation and Theatrical Sound Engineering
DESCRIPTION:Sound engineers doing live mixing for theatre must manage the balance between on-stage acoustic sources and electroacoustic sounds diffused in the IRCAM:Gallery\, triggering\, spatialising and mixing pre-recorded and live sounds while actors perform. Typically confined to the control booth of unfamiliar venues\, they need to adapt to a listening perspective that differs significantly from the audience's experience in the stalls or the balconies. This work engages with sound studies\, virtual acoustics\, and archival practices to investigate complementary questions. How does the acoustic dissociation between the control booth and the rest of the venue influence the technical and aesthetic decisions of sound engineers? How would contemporary engineers interpret archived theatrical soundtracks when guided by annotated scripts? To address these questions\, the research unfolds in four stages: capture multiple High Order Ambisonics Impulse Responses from emblematic theatres in São Paulo\, Brazil\, combining flexible sources setups and multiple listening positions\; use them to build a real-time convolution engine\, integrating the IRs with actors' voices and archival soundtracks from the collection of Brazilian theatrical sound designer Tunica Teixeira\; invite sound engineers to perform mixing tasks in a virtual acoustic environment\, guided by Tunica's annotated scripts\; use the task metrics and structured questionnaires to assess the impact of multi-perspective listening on their technical and aesthetic decisions.
CATEGORIES:APPLICATION / GAMING
LOCATION:Jussieu:Room 1\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:3892cfd70b60ed2047c18044ce6af1d1
URL:http://aes2026avarigconference.sched.com/event/3892cfd70b60ed2047c18044ce6af1d1
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T110000Z
DTEND:20260701T143000Z
SUMMARY:(P) Now We're Getting Somewhere – A First Prototype for a Co-Designed\, Blind-accessible Auditory Navigation Toolkit for 3D Open-World Video Game Environments
DESCRIPTION:Navigation tasks are often used as a fun and engaging method of exploring and interacting with video game environments. 3D open-world games afford curiosity-driven navigation for players\, providing opportunities to follow their agency and interact with points of interest within an environment. However\, this is commonly a visually-motivated task that is seldom accessible to Blind and Low Vision (BLV) gamers. Given the impact of this barrier\, it is imperative to design navigation systems for games that are driven by auditory information to provide equal opportunities for BLV gamers to engage with open world game environments. There is a noted lack of understanding from game developers currently that evidences the need for dialogue between researchers\, BLV gamers and developers. In collaboration with both BLV gamers and developers\, we present the early\, first co-designed prototypes for a customisable\, Blind-accessible auditory navigation toolkit in 3D open-world video game environments. We build on a series of dialogic discussions with Disabled gamers who have experienced barriers in their gameplay experiences and preset three navigation tools. We document the design of these tools and present theme explorations from analysis both each co-design phases. We present discussions on including player agency\, action precision\, gameplay fluidity and cognitive load\, categorisation and identification\, sound preference\, and tutorialisation and learnability. From these themes\, we derive design insights that highlight the barriers and considerations for auditory navigation in video games.
CATEGORIES:APPLICATION / GAMING
LOCATION:Jussieu:Room 1\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:343536325e843e2be33db31d2e287f3e
URL:http://aes2026avarigconference.sched.com/event/343536325e843e2be33db31d2e287f3e
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T110000Z
DTEND:20260701T143000Z
SUMMARY:(P) Pleyel.exe
DESCRIPTION:Pleyel.exe is an interactive documentary presented as a video game\, exploring the evolving landscape of the Carrefour Pleyel district in Saint-Denis. Through free navigation within immersive 3D scans generated from gaussian splatting\, visitors can wander through sites in transition. As they explore\, they encounter residents’ testimonies\, drawn from in-situ recorded and carefully edited interviews\, offering personal perspectives on the neighborhood and its ongoing transformations.
CATEGORIES:APPLICATION / GAMING
LOCATION:Jussieu:Room 1\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:7bcbb14f3af8d823d2e90876c12c89c0
URL:http://aes2026avarigconference.sched.com/event/7bcbb14f3af8d823d2e90876c12c89c0
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T110000Z
DTEND:20260701T143000Z
SUMMARY:(P) A framework for 6DoF 4pi reverberation generation using wave-based-derived virtual sound sources.
DESCRIPTION:Realistic reproduction of spatial reverberation is essential for immersive audio applications\, including virtual reality and interactive gaming. While geometrical acoustics methods enable efficient rendering\, they do not fully capture wave phenomena such as low-frequency modal behavior and diffraction\, which are particularly significant in small spaces. Wave-based simulations provide higher physical accuracy but at substantial computational cost. This paper extends VSVerb\, a 4pi sampling reverberator based on virtual sound sources (VS) extracted via sound intensity analysis\, to use pressure and three-axis particle velocity computed by a discontinuous Galerkin finite element method (dG-FEM) simulation\, enabling reverberation that reflects the wave-based acoustic characteristics of virtual spaces to be generated. Experiments conducted in a university lecture room demonstrate that simulation-based VS distributions and their corresponding impulse responses closely match those derived from actual measurements. ComParison with measured impulse responses and geometrical acoustics ray tracing shows that the proposed method produces room acoustic parameters\, including clarity and definition\, closer to the measured reference across most metrics and frequency bands. A tendency to underestimate reverberation time was observed\, which may be addressed through improved simulation modeling or post-processing. Furthermore\, the VS distribution extracted from a single simulation can be adapted to different receiver positions by re-estimating the geometric contribution of each VS\, enabling 6DoF navigation support without additional simulation. These results indicate the potential of the proposed framework for wave-based interactive reverberation in virtual spaces.
CATEGORIES:AURALIZATION / 6DOF
LOCATION:Jussieu:Room 1\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:4e43666cd97861ddb6d196847b90268d
URL:http://aes2026avarigconference.sched.com/event/4e43666cd97861ddb6d196847b90268d
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T110000Z
DTEND:20260701T143000Z
SUMMARY:(P) An Immersive Sequencer in Virtual Reality with Natural Interaction and Hybrid Audio Reproduction
DESCRIPTION:We present Music of the Spheres (MOTS)\, an immersive virtual reality (VR) sequencer that integrates natural interaction with hybrid spatial audio reproduction for music composition and performance. MOTS enables users to create and manipulate sound objects arranged in a 3D step sequencer surrounding the user. Using hand gestures\, users can instantiate\, position\, and remove sounds\, simultaneously composing both temporal and spatial musical structures. The system combines binaural reproduction for private preview in the headset and Ambisonic loudspeaker reproduction for shared listening in audience-oriented experiences. In this paper\, we discuss the implementation of MOTS and highlight design considerations for intuitive musical interfaces that are uniquely crafted for VR. We also present the results of a survey of 27 participants at a public exhibition\, which indicate positive responses in terms of immersion and usability\, as well as a coherent spatial audio experience across the hybrid reproduction system. Finally\, we outline future directions\, including expanded controls\, collaborative functionality\, and improved spatial audio rendering.
CATEGORIES:AUTHORING
LOCATION:Jussieu:Room 1\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:948dc3cc5dd21419fc5ec217bf15bbea
URL:http://aes2026avarigconference.sched.com/event/948dc3cc5dd21419fc5ec217bf15bbea
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T110000Z
DTEND:20260701T143000Z
SUMMARY:(P) An Open-Source Toolchain for Atmos Transcoding and Immersive Playback on Irregular Loudspeaker Arrays
DESCRIPTION:Audio Definition Model (ADM) metadata is central to contemporary object-based audio production and sits at the core of Dolby Atmos workflows. Yet in open research\, rapid prototyping\, immersive media development\, and playback on irregular loudspeaker arrays\, Atmos-derived material remains difficult to inspect\, translate\, and deploy without relying on proprietary tooling. This creates a persistent gap between the spatial audio formats used in industry and the open systems available to researchers\, developers\, and immersive venues. This paper presents CULT DSP\, an open-source spatial audio toolchain designed to address that gap by separating transcoding\, scene exchange\, playback\, and authoring into distinct but interoperable roles. CULT ingests spatial audio exports\, extracts and normalizes scene metadata\, and exports it for later use\; in the authoring direction\, the same module packages LUSID scene data and mono stems back into ADM/BWF output. LUSID provides a stable scene and package structure shared across the stack. Spatial Root is a layout-agnostic playback engine for real-time and offline rendering on custom loudspeaker arrays. Its EngineSession API exposes the runtime as a C++ interface used by the GUI\, CLI\, and external host applications. Four implementation projects extend the toolchain: Spatial Seed uses CULT and LUSID for procedural authoring from stems\; LUSIDstreamer treats LUSID frames as lightweight scene-state packets\; immersive-allo-root embeds Spatial Root in an AlloLib audiovisual application\; and ue-root prototypes a game-engine-facing host path. Together\, they show how Atmos-derived metadata can be reused for playback\, authoring\, inspection\, and immersive media development rather than used only for final delivery.
CATEGORIES:AUTHORING
LOCATION:Jussieu:Room 1\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:28e6792ab10f41436cdb3066c6ae2e48
URL:http://aes2026avarigconference.sched.com/event/28e6792ab10f41436cdb3066c6ae2e48
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T110000Z
DTEND:20260701T113000Z
SUMMARY:Effect of the template on the predicted sagittal-plane sound-localization performance
DESCRIPTION:In the state-of-the art models of sagittal-plane sound localization\, template head-related transfer functions (HRTFs) are used to reflect the listener's internal calibration of auditory space decoding\, and thus determine the prediction quality. The effect of the template HRTFs has not yet been investigated directly. Here\, a model was calibrated separately to two HRTF measurements of the same listeners and its predictions were compared to behavioral localization responses of these listeners obtained in three listening conditions: two acoustically measured HRTF sets (those used for model calibration)\, and an additional condition (unseen during the calibration) used to test the model's ability to generalize. We analyzed the quadrant error rates (QE) and local polar errors (PEs) from eight listeners. The predicted errors were similar in both calibration conditions and increased in the unseen condition. The quality of the predictions\, however\, varied significantly with the template\, more for PE than for QE\, slightly preferring one template over the other when predicting the unseen condition. Our findings suggest that small differences in HRTFs used for the template may influence the prediction quality\, especially when applied to unseen listening conditions.
CATEGORIES:BINAURAL
LOCATION:Jussieu:Conf 2 (Binaural)\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:849b757a3c7b2c027f859d38c632615c
URL:http://aes2026avarigconference.sched.com/event/849b757a3c7b2c027f859d38c632615c
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T110000Z
DTEND:20260701T143000Z
SUMMARY:(P) Benchmarking Spatial Audio Reproduction Systems in Smart Glasses and XR Headsets: Illustrative Results and Interpretation
DESCRIPTION:This paper presents an illustrative cross-device evaluation of spatial audio reproduction in smart glasses and XR headsets using binaural in-ear recordings and external sound-level measurements on four anonymized commercial devices. The evaluation is organized around baseline playback behavior\, cue fidelity\, sound leakage\, and robustness to wearing variability\, with metrics derived from broadband-noise and swept-sine measurements. The results reveal distinct device behaviors\, including differences in channel balance\, interchannel signal behavior\, preservation of HRTF-encoded binaural cues\, perturbation of real-world acoustic cues\, external sound radiation\, and sensitivity to reseating. Rather than establishing a product ranking\, this study demonstrates how the benchmark supports structured cross-device interpretation of wearable XR spatial audio systems.
CATEGORIES:MICARRAY / HEADWORN
LOCATION:Jussieu:Room 1\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:9761a84f647caad726e3eb9c094a8599
URL:http://aes2026avarigconference.sched.com/event/9761a84f647caad726e3eb9c094a8599
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T110000Z
DTEND:20260701T143000Z
SUMMARY:(P) Vikk64: A planar microphone array for audio-visual reproduction in virtual reality
DESCRIPTION:A virtual artificial head (VAH) can be used to imprint a listener’s head-related transfer functions (HRTFs) onto a recording using a filter-and-sum beamforming approach. The previous version of the so-called Vikk\, consisting of 24 microphones\, was able to recreate HRTFs with low interaural errors\, including temporal and spectral distortions up to \SI{5}{kHz}. A simulation of a revised topology demonstrated an increased frequency range up to \SI{8}{kHz}\, motivating us to examine whether the range could be extended further\, ideally beyond the audible range. We simulated two microphone topologies with different arrangement strategies based on either a Golomb ruler or Vogel’s spiral. In addition\, scaling and weighting were applied to create a denser microphone placement in the centre of the array. Vogel’s spiral achieved results comparable to the Golomb ruler with 24 microphones and is easier to rescale with a larger number of microphones and parametric weighting. For this reason\, we selected a weighted Vogel’s spiral to investigate how the number of microphones affects temporal and spectral distortions. Increasing the number of microphones to 32 reduced temporal and spectral distortions\, although spectral distortions on the contralateral ear remained above \SI{10}{kHz}. Further increasing the number to 64 microphones reduced spectral distortions and extended the usable frequency range up to \SI{16}{kHz}. These results demonstrate the suitability of the Vikk64 for high-quality reproduction of binaural auralisations in the horizontal plane. Additionally\, we outline how combining the Vikk64 with a VR180 camera enables the recording of audiovisual scenes that can be reproduced in virtual reality.
CATEGORIES:MICARRAY / HEADWORN
LOCATION:Jussieu:Room 1\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:bb6dbb85f37f7fcaf7fe95b5bdccdfc5
URL:http://aes2026avarigconference.sched.com/event/bb6dbb85f37f7fcaf7fe95b5bdccdfc5
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T110000Z
DTEND:20260701T120000Z
SUMMARY:Lunch B
DESCRIPTION:
CATEGORIES:SOCIAL EVENT
LOCATION:Cafe / Lunch\, Sorbonne University\, Jussieu campus & IRCAM/STMS\, Paris\, France
SEQUENCE:0
UID:360faa5f742cd00b250052dfd182c10d
URL:http://aes2026avarigconference.sched.com/event/360faa5f742cd00b250052dfd182c10d
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T113000Z
DTEND:20260701T120000Z
SUMMARY:Eliciting the Effectiveness of Binaural Renderer Enhancement on the Horizontal Plane
DESCRIPTION:Binaural rendering is central to spatial audio reproduction via headphones and wearable devices\, yet systematic evaluation of enhancement techniques remains methodologically inconsistent across the literature. This paper presents and applies a subjective evaluation methodology designed to consistently elicit five perceptual attributes of headphone-based spatial audio: externalization\, elevation fidelity\, front/back distinction\, room tone\, and spectral coloration. The methodology combines absolute judgment and relative comParison protocols\, taking advantage of their complementary capabilities to capture both the absolute perceptual quality of individual rendering conditions\, and the salience of perceived differences between them. It is applied in a controlled experiment comparing conventional HRTF-based binaural rendering against two enhancement variants that each superpose a masked spatially diffuse sound field component into the binaural output. Stimuli were spatialized across eleven azimuthal positions in the horizontal plane using a generic dummy-head dataset\, and presented over closed-back headphones to participants. The results validate the proposed methodology as a tool for revealing perceptually relevant differences among binaural rendering conditions. Relative comParison tests reveal additional performance difference details between rendering methods. Both enhancement methods significantly improve externalization and front/back distinction relative to unenhanced binaural rendering\, with the largest gains at lateral azimuths\, and without a statistically significant increase in perceived spectral coloration beyond the baseline effect of HRTF filtering. However\, the results do not indicate conclusively whether the binaural rendering methods examined here exhibit or mitigate a "spurious elevation" artifact associated with frontal sound presentation.
CATEGORIES:BINAURAL
LOCATION:Jussieu:Conf 2 (Binaural)\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:e9fbb7abffcad5e4dfb9a98765dab6a5
URL:http://aes2026avarigconference.sched.com/event/e9fbb7abffcad5e4dfb9a98765dab6a5
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T113000Z
DTEND:20260701T153000Z
SUMMARY:Sponsor demos
DESCRIPTION:Come see the newest developements of our sponsors
CATEGORIES:SPONSOR DEMOS
LOCATION:Jussieu:Room 2\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:f87c16d991955c4ca2fbc7313b97de72
URL:http://aes2026avarigconference.sched.com/event/f87c16d991955c4ca2fbc7313b97de72
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T120000Z
DTEND:20260701T123000Z
SUMMARY:Primary Source Dominance and Acoustic Scene Complexity in 6DoF VR Audio Evaluation
DESCRIPTION:In virtual reality (VR) experiences\, a primary source often refers to a sound object designated for a central role within a scene\, contrasted with contextual background sources. While such sources are typically assumed to guide perceptual attention\, it remains unclear whether a designated primary source maintains dominance in overall audio quality evaluation as acoustic scene complexity increases\, particularly in six-degrees-of freedom (6DoF) scenarios. This study investigates how per-source rendering quality and scene complexity influence overall audio quality evaluation in 6DoF VR. Rendering quality was manipulated independently for a primary source and background sources\, and scene complexity was varied based on the number of sources. Rank-order elimination-by-aspects (EBA) was applied to test dominance patterns across conditions. Results indicate that under low scene complexity\, overall evaluation mainly depended on primary source rendering quality. However\, in high complexity multisource scenes\, this dominance was no longer observed\, and evaluation dependence became distributed across sources.
CATEGORIES:COGNITION
LOCATION:Jussieu:Conf 1\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:36791bf27cce7b28f19571d8496f7960
URL:http://aes2026avarigconference.sched.com/event/36791bf27cce7b28f19571d8496f7960
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T120000Z
DTEND:20260701T123000Z
SUMMARY:How representative are Dummy head HRTFs? A subjective comParison of mannequin and human datasets
DESCRIPTION:Head-related transfer functions (HRTFs) are central to convincing binaural rendering in virtual and augmented reality applications. While individual HRTFs offer the highest perceptual fidelity\, the practical difficulty of personal HRTF acquisition drives widespread use of dummy head (mannequin) measurements as a non-individualized substitute. Despite their ubiquity\, systematic perceptual benchmarking of dummy head HRTFs against human HRTFs remains limited\, particularly with respect to whether consistent trends emerge across listeners irrespective of individual HRTF preference. This study extends prior work on subjective HRTF evaluation methodologies and perceptual ranking by applying an established trajectory-based quality assessment paradigm to a mixed set of dummy head and human HRTFs. Participants were presented with predefined auditory trajectories rendered via binaural synthesis and asked to rate the perceptual quality of each rendering with respect to adherence to the prescribed trajectory. HRTFs were presented in randomised order across two sets of eight\, with repeated items serving as an inter-set normalisation anchors. The HRTF pool encompassed human measurements alongside a range of dummy head types: simplified head-only geometries\, head-and-torso simulators (HATS)\, and models incorporating absorptive materials (hair\, clothing analogues). The primary research question is whether\, despite well-documented listener-dependent variability in HRTF suitability\, population-level trends differentiate dummy head HRTFs from human ones\, and further\, whether acoustic complexity of the mannequin (torso\, absorptive surfaces) correlates with perceptual performance. Results are discussed in terms of implications for HRTF database design and substitute HRTF selection strategies for immersive audio applications.
CATEGORIES:HRTFS
LOCATION:Jussieu:Conf 2 (Binaural)\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:2ec5fd141f12675235c57def9216f9df
URL:http://aes2026avarigconference.sched.com/event/2ec5fd141f12675235c57def9216f9df
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T123000Z
DTEND:20260701T130000Z
SUMMARY:Investigating the Perceptual Relevance of Voice Directivity in Virtual Vocal Instruction Environments
DESCRIPTION:Given the limited research on the use of extended reality (XR) technologies in remote music instruction from the perspective of music tutors\, this work examines the perceptual importance of voice directivity within a virtual reality (VR) environment. In particular\, the perceptual ability to discriminate differences between a measured vocal directivity pattern and a slightly modified omnidirectional directivity pattern is investigated. Two listening tests were conducted to probe directivity perception under (i) static and (ii) dynamic listener conditions within a simulated music practice room\, integrating 3rd order Ambisonics Room Impulse Responses (RIRs) and head-tracked binaural reproduction within an interactive Unity-based interface. The static test used an ABX discrimination model to assess directivity detectability as a function of location and stimulus content. The dynamic test involved free navigation around a virtual singer and the evaluation of the perceived directional plausibility\, naturalness of the sound emission\, and the adequacy of the experience for the assessment of the singer’s vocal characteristics. The results suggest that while listeners can detect differences between vocal directivity patterns under controlled listening conditions\, such differences may become less perceptually salient during dynamic interaction within a virtual environment. Nevertheless\, the overall positive evaluations in the dynamic listening test indicate that the implemented spatial audio approach provides a plausible and effective auditory experience\, supporting its potential use in XR-based applications for remote music instruction and performance evaluation.
CATEGORIES:COGNITION
LOCATION:Jussieu:Conf 1\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:50dc1a38bb965a66cd05f5472d324526
URL:http://aes2026avarigconference.sched.com/event/50dc1a38bb965a66cd05f5472d324526
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T123000Z
DTEND:20260701T133000Z
SUMMARY:Understanding Ambisonics Through Practical Decisions and Listening
DESCRIPTION:Ambisonics is a scene-based spatial audio format that has been around since the 1970s. In recent years its popularity has increased\, with inclusion in game engines (such as Unity and Unreal) and distribution standards (like ADM or IAMF). Despite this\, many practitioners view Ambisonics as being complex\, mathematical\, or academic. This tutorial explains Ambisonics through the lens of practical decision-making. Instead of equations\, it covers the choices audio professionals are required to make when working on a project\, with a particular emphasis on the audible consequences of those choices. The tutorial enables attendees to develop an intuitive understanding of Ambisonics through explanations of theory\, combined with listening examples and workflow demonstrations. The topics covered in this tutorial are: • Fundamentals: What Ambisonics is and how it differs from channel-based and object-based formats\, and why it is well suited to VR\, AR\, and game audio. • Encoding: How audio sources are converted to Ambisonics\, and how to choose the Ambisonic order based on perceptual and computational trade-offs\, as well as delivery constraints. • Conventions: Common channel ordering (ACN vs FuMa) and gain normalisation (SN3D vs N3D) conventions\, and what happens when things get mismatched. • Processing: What kinds of effects can be used on Ambisonic signals while preserving the spatial integrity. • Decoding and binaural rendering: How Ambisonic signals are converted to loudspeaker or binaural signals. The impact of head-tracking and HRTF selection on the binaural rendering. • Mixed-order projects: What the options are when working with mixed-order sources\, and the audible artefacts that can arise. The tutorial will provide brief practical demonstrations of setting up an Ambisonics project in Pro Tools and Reaper\, two widely used DAWs for immersive audio. By the end of the tutorial attendees will have a practical understanding of the main concepts of Ambisonics\, as well as knowing how the practical choices they make will impact the final audio. They will also be familiar with the main workflow pitfalls and how to avoid them. The tutorial assumes familiarity with general audio production concepts (DAW use\, signal routing\, mixing). However\, no prior experience with Ambisonics or spatial audio formats is required. It is suitable for sound designers\, composers\, and audio engineers working in or interested in immersive media.
CATEGORIES:HOA
LOCATION:Jussieu:Conf 2 (Binaural)\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:2cff7e26ebb6512ee9dd911fc3d2b474
URL:http://aes2026avarigconference.sched.com/event/2cff7e26ebb6512ee9dd911fc3d2b474
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T123000Z
DTEND:20260701T133000Z
SUMMARY:Coffee A
DESCRIPTION:
CATEGORIES:SOCIAL EVENT
LOCATION:Cafe / Lunch\, Sorbonne University\, Jussieu campus & IRCAM/STMS\, Paris\, France
SEQUENCE:0
UID:5ba9b43e40acb4f9935d8a1ec306e366
URL:http://aes2026avarigconference.sched.com/event/5ba9b43e40acb4f9935d8a1ec306e366
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T130000Z
DTEND:20260701T133000Z
SUMMARY:Personality Perception in Acoustic Virtual Reality
DESCRIPTION:This study investigates how perceived auditory distance in Virtual Reality (VR) influences social perception\, specifically personality attribution. Building on research linking social and physical distance\, the work explores whether speakers who sound closer or farther away are judged differently in terms of personality traits. Using the SONICOM 3D Speaker Personality Corpus\, the study analysed 360 spatialised speech samples from 120 speakers. Each sample was evaluated by 10 listeners\, who rated both perceived auditory distance and five personality traits (Openness\, Conscientiousness\, Extraversion\, Agreeableness\, and Neuroticism). Ratings were averaged to obtain a single score per recording. The analysis proceeded in two stages. First\, correlations between perceived distance and personality traits were examined. Results showed significant relationships for two traits: Extraversion and Agreeableness. Specifically\, speakers perceived as closer were judged as less extraverted but more agreeable\, while more distant speakers were perceived as more extraverted and less agreeable. Second\, machine learning models were developed to predict personality trait scores from the speech data. A feedforward neural network achieved above-chance classification performance across all traits. When the model was extended to jointly predict both personality traits and perceived distance\, performance improved significantly for all traits. This suggests that perceived distance and personality attribution are linked\, and that this relationship includes non-linear patterns not captured by simple correlation analysis. Overall\, this is the first study to demonstrate an interaction between perceived physical distance and speech-based personality judgments. The findings highlight the importance of spatial audio in shaping social perception in VR and Extended Reality (XR). They suggest that manipulating the perceived distance of virtual speakers could influence how users interpret social cues\, potentially enhancing the design of virtual agents for roles such as teachers\, assistants\, or companions.
CATEGORIES:COGNITION
LOCATION:Jussieu:Conf 1\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:6e8f631f63820e9c7baba49a44f40cd9
URL:http://aes2026avarigconference.sched.com/event/6e8f631f63820e9c7baba49a44f40cd9
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T133000Z
DTEND:20260701T140000Z
SUMMARY:Rendering 6DoF Audio in Augmented Reality: Perceptual Evaluation of Game Engines\, Plugins\, and Middleware
DESCRIPTION:Contemporary game engines such as Unreal Engine and Unity are widely used for Extended Reality (XR)\, yet their native audio pipelines often rely on simplified spatialization with limited acoustic control. In Augmented Reality (AR)\, virtual sound sources must integrate coherently with the physical environment to maintain perceptual plausibility\, making accurate Six-Degrees-of-Freedom (6DoF) rendering critical. This study carried out a perceptual evaluation of multiple 6DoF audio rendering approaches\, including Audiokinetic Wwise (Reflect and RoomVerb)\, Steam Audio\, Meta XR Audio SDK\, a dense 6DoF Room Impulse Response (RIR) interpolation method\, and APLVirtuoso XR. A measured physical room was reconstructed in Unreal Engine 5\, and the rendering pipelines were calibrated by matching the reverberation time (RT60) and direct-to-reverberant ratio (DRR) to the measured room within their respective just noticeable difference (JND) thresholds. The results showed significant differences in perceived spatial and timbral fidelity as well as plausibility and overall listening experience across the tested renderers. Despite the high accuracy achieved by the 6DoF interpolation method\, an algorithmic renderer demonstrated comparable or superior performance. However\, some other algorithmic renderers exhibited tradeoffs in terms of computational overhead and acoustic modelling accuracy. Our findings indicate that an optimised system prioritising a plausible auditory representation\, rather than strict physical replication\, may be sufficient and\, in some cases\, can yield superior perceptual outcomes.
CATEGORIES:PLAUSIBILITY
LOCATION:Jussieu:Conf 2 (Binaural)\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:cf0686c787194bda782750fe790d4de5
URL:http://aes2026avarigconference.sched.com/event/cf0686c787194bda782750fe790d4de5
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T133000Z
DTEND:20260701T143000Z
SUMMARY:Coffee B
DESCRIPTION:
CATEGORIES:SOCIAL EVENT
LOCATION:Cafe / Lunch\, Sorbonne University\, Jussieu campus & IRCAM/STMS\, Paris\, France
SEQUENCE:0
UID:9fce6b11fb75122cdd2664b7b3c70884
URL:http://aes2026avarigconference.sched.com/event/9fce6b11fb75122cdd2664b7b3c70884
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T140000Z
DTEND:20260701T150000Z
SUMMARY:Combining Synthetic Sound with Spatial Audio Rendering — Insights and Opportunities
DESCRIPTION:Procedural audio is an essential part of building interactive media. Game engines have long supported procedural techniques for designers and engineers. The demand for sounds that feel natural and realistic continues to grow\, and spatial and interactive sound design play a key role in creating plausible auralizations. Many factors go into creating realistic virtual collision sounds across multiple objects. Speed of motion\, size\, material\, and design intent from game mechanics all intertwine into a web of interactivity\, spatialization\, modeling\, storytelling\, and logic. The sound of footsteps changes with movement and with the materials beneath\, pressing buttons or fabrics might require tactility\, and explosives in a variety of rooms and different listening positions pose spatialization subtleties. What are the right tools to realize these intricacies? What does the current landscape look like for addressing remaining cIRCAM:Galleryenges? What developments are emerging in machine learning and AI? In this workshop\, organized by the AES TC Spatial Audio and AES TC Interactive Media and Gaming\, speakers will share workflows\, systems\, and their experience developing spatial\, interactive\, and generative sound design. The session is intended for those looking to expand their toolkit\, rethink existing approaches\, and better understand the practical and technical considerations behind procedural and spatial audio systems for virtual sounds.
CATEGORIES:IMMERSIVE AUDIO
LOCATION:Jussieu:Room 3\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:eca860297ccadcaeecc122592cb8fd9a
URL:http://aes2026avarigconference.sched.com/event/eca860297ccadcaeecc122592cb8fd9a
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T140000Z
DTEND:20260701T143000Z
SUMMARY:Audiovisual Coherence Thresholds for Direct-to-Reverberant Energy Ratio
DESCRIPTION:Holographic calling systems aim to create the perceptual illusion that a remote interlocutor is physically present by combining augmented reality (AR) visualizations with spatial audio rendering. A key cIRCAM:Galleryenge in such systems is achieving audiovisual coherence when room-acoustic properties must be inferred\, since this can lead to greater inaccuracies than when they are estimated from measurements. In this study\, we investigate perceptual tolerance to mismatches in the direct-to-reverberant energy ratio (DRR)\, a critical cue for auditory distance perception. We conducted a listening experiment in which participants judged whether the audio and the visual presentation of a stimulus were coherent using a yes/no task. Stimuli included both a volumetric capture of a speaking human avatar\, rendered at the correct direct sound level\, and a loudspeaker reproducing wideband noise. For the loudspeaker\, level roving was introduced to assess the influence of intensity cues on listener decisions. Results show that audiovisual coherence is maintained for half of the presentations within a DRR mismatch range of approximately 3 dB for too dry and 4.3 dB for too reverberant renderings. Within the limited number of participants included in the study\, no evidence for significant differences was found between speaking avatars and loudspeakers reproducing noise\, nor between different ranges of level roving in the loudspeaker condition. Nevertheless\, the findings help to understand the consequences of DRR estimation mismatches for holographic AR calling experiences.
CATEGORIES:PLAUSIBILITY
LOCATION:Jussieu:Conf 2 (Binaural)\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:384857d83c05e7303a28ec4448a303e6
URL:http://aes2026avarigconference.sched.com/event/384857d83c05e7303a28ec4448a303e6
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T143000Z
DTEND:20260701T150000Z
SUMMARY:The Effect of Authentic Spatial Sound on Verbal Working Memory in Online Virtual Reality Learning Environments
DESCRIPTION:This paper investigates the impact of authentic spatial audio on verbal working memory (WM) within a WebXR-based virtual reality learning environment (VRLE). While prior virtual reality (VR) research has predominantly focused on visual modalities\, the influence of auditory realism\, particularly authentic spatialised sound\, on cognitive performance remains underexplored. To address this gap\, a controlled within-subjects experiment was conducted using an adapted automated operation span (AOSPAN) task under two conditions: with and without authentic spatial sound. A total of 40 participants completed the study using a head-mounted display in a controlled laboratory setting. The VRLE was implemented using web-based technologies\, incorporating ambisonics audio capture and real-time spatial sound rendering. Statistical analysis revealed no significant differences in WM performance across conditions for all measured metrics\, including OSPAN score\, total correct recall\, and error rates. However\, results consistently showed a non-significant trend toward improved performance in the presence of authentic spatial sound. In contrast\, subjective measures indicated substantial enhancements in perceived presence\, immersion\, realism\, and user preference when spatial ambient audio was enabled. These findings suggest that while authentic spatial sound does not significantly influence verbal WM performance\, it enhances experiential quality without increasing cognitive load. The study highlights the importance of incorporating realistic auditory environments in VR design for education\, supporting user engagement while maintaining cognitive neutrality.
CATEGORIES:COGNITION
LOCATION:Jussieu:Conf 1\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:bacc17516b84f054041d344c286516ff
URL:http://aes2026avarigconference.sched.com/event/bacc17516b84f054041d344c286516ff
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T143000Z
DTEND:20260701T150000Z
SUMMARY:Assessing the Plausibility of Measurement-Based Auralization of Sound Transmission through Walls
DESCRIPTION:In building acoustics\, modeling and auralizing sound transmission through walls is a relevant area of research related to residential well-being and workplace productivity. While most existing studies use auralization methods that focus on accurately reproducing spectrum and amplitude of the sound transmission in accordance with ISO standards\, almost no studies have explicitly tried to evaluate the perceptual quality of wall transmission auralization. To address this\, the present study applies the plausibility paradigm to evaluate a measurement-based auralization approach to validate it for further psychoacoustic experiments on residential well-being. Binaural Room Impulse Responses (BRIRs) were measured for three loudspeakers placed in adjacent rooms to a central listening room. In a subsequent listening test\, participants were asked to evaluate the overall plausibility of the auralization. Results demonstrate that due to the absence of visual cues\, the lower sound pressure level\, the reduced signal-to-noise ratio\, and the diffuse radiation characteristic of the source\, high plausibility scores close to the guessing rate were achieved for all adjacent rooms. These findings suggest that due to the lack of a visual cue\, lower sound pressure level\, reduced signal to noise ratio and the diffuse radiation characteristics of the source\, auralization of wall transmission using BRIRs can be used as a plausible method for psychoacoustic research on residential well-being\, without the need for complex physical simulations.
CATEGORIES:PLAUSIBILITY
LOCATION:Jussieu:Conf 2 (Binaural)\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:265e474f079da4268340630a0427c22f
URL:http://aes2026avarigconference.sched.com/event/265e474f079da4268340630a0427c22f
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T150000Z
DTEND:20260701T160000Z
SUMMARY:The quest for decorrelation continues -- from stereo to immersive
DESCRIPTION:Using pitch\, delay and modulation effects to perceptually spread a mono source across additional\, adjacent playback channels is a staple of music production\, born in stereo and extended to immersive. This tutorial begins with the historical origins of the effect in the mid 1970s\, showing the signal processing chains used\, measuring its impact on the signal\, discussing its psychoacoustic merit\, and demonstrating the resulting sound. The evolution from stereo through surround sound to immersive formats using contemporary production tools and techniques is demonstrated. The effect is still evolving as new tools are developed and creators explore what is possible across all immersive artforms – music\, AR/VR\, and games. It is hoped a deep dive into the first 50 years of this effect will inform and inspire immersive mixers for the future.
CATEGORIES:APPLICATION / GAMING
LOCATION:Jussieu:Conf 2 (Binaural)\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:cf3b447d1e4cd854fd21557da4b4e4ee
URL:http://aes2026avarigconference.sched.com/event/cf3b447d1e4cd854fd21557da4b4e4ee
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T150000Z
DTEND:20260701T153000Z
SUMMARY:Assessing the Impact of Spatial Audio on Cognitive Load and Memory Retention for Virtual Training Simulation in Virtual Reality
DESCRIPTION:This paper examines whether spatialised audio improves cognitive load and memory retention in Virtual Reality training. Using a commercial VR public speaking module developed by BODYSWAPS\, 1\,350 real-world users were randomly assigned to either a standard audio (control) or fully spatialised audio with virtual acoustics (study) condition. The study ran over a three-year period\, making this the largest study of its kind. Participants completed an exit survey rating five data points: comfort\, concentration\, realism\, retention\, and simulation. The spatialised audio group reported consistently higher scores overall\, with a statistically significant improvement in perceived comfort (p = 0.006\, d ≈ 0.44). Directional improvements were also observed in realism and retention\, though these did not reach statistical significance. Gaze-time analysis revealed that the spatialised audio. The group spent more time looking at the primary coaching figures\, suggesting that spatial audio may support sustained attentional focus on key instructional sources. The findings indicate that spatial audio design is a meaningful contributor to VR training quality\, particularly in comfort and perceived realism\, with promising trends for learning efficacy.
CATEGORIES:COGNITION
LOCATION:Jussieu:Conf 1\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:c883dc21a67fa3337a81265f1978ba1b
URL:http://aes2026avarigconference.sched.com/event/c883dc21a67fa3337a81265f1978ba1b
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T153000Z
DTEND:20260701T160000Z
SUMMARY:SCHuBERT: a real-time end-to-end model for piano music emotion recognition
DESCRIPTION:In this study\, we present SCHuBERT\, a real-time end-to-end Piano Music Emotion Recognition (PMER) system that operates directly on raw audio and fine-tunes DistilHuBERT for short-window classification on the Valence–Arousal (V–A) plane. Designed for low latency and high responsiveness\, the system is particularly well suited for immersive applications such as Virtual and Augmented Reality (VR/AR). Compared with both audio- and symbolic-domain baselines\, SCHuBERT achieves strong accuracy in four-quadrant (4Q) classification as well as in binary arousal and valence tasks\, while maintaining low computational overhead for real-time operation.
CATEGORIES:COGNITION
LOCATION:Jussieu:Conf 1\, 4\, place Jussieu Paris 5e
SEQUENCE:0
UID:39c1123c0003212fdec412bbdfd9ee87
URL:http://aes2026avarigconference.sched.com/event/39c1123c0003212fdec412bbdfd9ee87
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260701T163000Z
DTEND:20260701T200000Z
SUMMARY:Banquet @ La Coupole
DESCRIPTION:Address : 102\, boulevard du Montparnasse Paris 14e
CATEGORIES:SOCIAL EVENT
LOCATION:Off-site\, via Metro
SEQUENCE:0
UID:7c9b08d1d761d9d4420ec3d860b9b6b2
URL:http://aes2026avarigconference.sched.com/event/7c9b08d1d761d9d4420ec3d860b9b6b2
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260702T073000Z
DTEND:20260702T080000Z
SUMMARY:Coffee
DESCRIPTION:
CATEGORIES:SOCIAL EVENT
LOCATION:IRCAM:Gallery\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:63d28f220c25d3daa54616d8ce51a749
URL:http://aes2026avarigconference.sched.com/event/63d28f220c25d3daa54616d8ce51a749
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260702T080000Z
DTEND:20260702T081500Z
SUMMARY:Re-opening ceremony / Orientation
DESCRIPTION:
CATEGORIES:SOCIAL EVENT
LOCATION:IRCAM:ESPRO (HOA)\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:d0aef575d12f532cf9418ae37ecce7b1
URL:http://aes2026avarigconference.sched.com/event/d0aef575d12f532cf9418ae37ecce7b1
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260702T081500Z
DTEND:20260702T090000Z
SUMMARY:XR Futures: Recording Our World for Posterity
DESCRIPTION:In the September 1975 issue of the JAES\, Michael Gerzon summarized the benefits of emerging Ambisonic audio technology and its potential for the spatial capture of concert IRCAM:Gallery impulse responses in his short paper Recording Concert IRCAM:Gallerys for Posterity. Gerzon was reflecting upon the cIRCAM:Galleryenge that Richard Heyser wrote about in 1974\, of defining and encoding into our recordings “a Rosetta Stone” signal that would allow us to unlock the data embedded with the audio and so undo “the spectral\, spatial and dynamics limitations of a recording of a great artist” (Heyser\, JAES\, May 1974). Angelo Farina\, in collaboration with Waves in 2003\, also presented a paper titled Recording Concert IRCAM:Gallerys for Posterity. This paper was named in honour of Gerzon and reported on the development of a high-quality library of spatial room impulse responses from famous concert IRCAM:Gallerys and opera houses around the world\, just as Gerson and Heyser had envisaged. Arguably this paper reintroduced Ambisonic technology and spherical harmonics as a compact\, efficient and flexible means of encoding and decoding an acoustic environment for use in modern music production using emerging real-time convolution audio effects plugins. However\, it was the arrival of the Oculus Rift Virtual Reality headset in 2013 that really started to bring the interactive benefits of Ambisonics to a much wider audience through new audio workflows and game engine development platforms. This interactive game engine technology has in turn unlocked cIRCAM:Galleryenges and brought new opportunities in how creatives envisage and build future immersive extended reality (XR) experiences\, and these foundational Ambisonic tools have become a fundamental part of our audio programming pipelines and sound design workflows. This XR Futures presentation will reflect upon these and related developments in immersive audio for virtual/augmented reality and immersive games\, and how research at the University of York’s AudioLab has taken a parallel path into future extended reality (XR) experience design through the XR Stories and CoSTAR Live Lab projects. What role does our “Rosetta Stone for audio” now have in unlocking the potential of future extended reality experiences?
CATEGORIES:KEYNOTE
LOCATION:IRCAM:ESPRO (HOA)\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:b2829e1e98500e30dbe5a14ccd563ef0
URL:http://aes2026avarigconference.sched.com/event/b2829e1e98500e30dbe5a14ccd563ef0
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260702T090000Z
DTEND:20260702T093000Z
SUMMARY:Residual Learning for Neural Ambisonics Encoders
DESCRIPTION:Emerging wearable devices such as smartglasses and extended reality headsets demand high-quality spatial audio capture from compact\, head-worn microphone arrays. Ambisonics provides a device-agnostic spatial audio representation by mapping array signals to spherical harmonic (SH) coefficients. In practice\, however\, accurate encoding remains cIRCAM:Galleryenging. While traditional linear encoders are signal-independent and robust\, they amplify low-frequency noise and suffer from high-frequency spatial aliasing. On the other hand\, neural network approaches can outperform linear encoders but they often assume idealized microphones and may perform inconsistently in real-world scenarios. To leverage their complementary strengths\, we introduce a residual-learning framework that refines a linear encoder with corrections from a neural network. Using measured array transfer functions from smartglasses\, we compare a UNet-based encoder from the literature with a new recurrent attention model. Our analysis reveals that both neural encoders only consistently outperform the linear baseline when integrated within the residual learning framework. In the residual configuration\, both neural models achieve consistent and significant improvements across all tested metrics for in-domain data and moderate gains for out-of-domain data. Yet\, coherence analysis indicates that all neural encoder configurations continue to struggle with directionally accurate high-frequency encoding.
CATEGORIES:HOA
LOCATION:IRCAM:ESPRO (HOA)\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:161246bd154e819d292466443dab512f
URL:http://aes2026avarigconference.sched.com/event/161246bd154e819d292466443dab512f
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260702T090000Z
DTEND:20260702T103000Z
SUMMARY:Sponsor demos
DESCRIPTION:Come see the newest developements of our sponsors
CATEGORIES:SPONSOR DEMOS
LOCATION:IRCAM:Studio 5\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:3cb184628cfd784d33718d4a241a4182
URL:http://aes2026avarigconference.sched.com/event/3cb184628cfd784d33718d4a241a4182
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260702T093000Z
DTEND:20260702T100000Z
SUMMARY:Evaluation of a Higher Order Ambisonic Renderer with Reverberation Compensation via Crosstalk Inversion
DESCRIPTION:In this work\, the authors evaluate a higher-order Ambisonic (HOA) renderer that compensates for reverberant characteristics of the intended listening room\; this is accomplished by decoding a HOA signal to control points distributed around a boundary surrounding the listening area\, then convolving the control signal with a compensation filter derived via matrix inversion of room impulse responses (RIR) from loudspeakers to control points in the frequency domain. First\, a comParison is performed over renderers utilizing increasing control point density and evaluated using simulated RIRs. Then\, robustness of the renderer to simulation inaccuracy is evaluated experimentally in a listening room. Metrics of reconstructed soundfield directionality and reverberation are compared to those obtained from a conventional HOA decoder\, and results demonstrate an increase in source directivity\, and a reduction in reverberation time for both directional and diffuse stimuli.
CATEGORIES:HOA
LOCATION:IRCAM:ESPRO (HOA)\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:78237175ce0242eb3789e67f30dc2a23
URL:http://aes2026avarigconference.sched.com/event/78237175ce0242eb3789e67f30dc2a23
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260702T093000Z
DTEND:20260702T103000Z
SUMMARY:Credibilitizing in Immersive Audio
DESCRIPTION:The tools for building social capital in any career are based on networking\, mentorship\, and role models. For immersive audio\, underrepresented groups are upskilling\, teaching others\, and innovating in order to pursue their ambitions. Dr. Leslie Gaston-Bird talks about her initiative "Immersive and Inclusive Audio"\, which has been running for over five years\, and how the Pro Tools | Dolby Atmos Certification plays a role in the efforts of women and minorities to "leak up\, not out" of the immersive audio career pipeline.
CATEGORIES:IMMERSIVE AUDIO
LOCATION:IRCAM:Stravinsky\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:a4bd71c65d8b0e4ce69be98e189c69b9
URL:http://aes2026avarigconference.sched.com/event/a4bd71c65d8b0e4ce69be98e189c69b9
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260702T100000Z
DTEND:20260702T103000Z
SUMMARY:Investigating the “Ring of Silence” in Loudspeaker and Binaural Reproduction Using Advanced Ambisonic Decoding Strategies
DESCRIPTION:Higher-Order Ambisonics (HOA) reproduction with conventional mode-matching decoders can exhibit the so-called “ring of silence\,” characterised by sound level reduction in specific spatial or spectral regions. This effect arises in loudspeaker reproduction when the number of loudspeakers exceeds that required by the Ambisonic order\, and in binaural rendering when head-related transfer functions (HRTFs) are sampled at a higher spatial resolution than supported by the input signal. This paper investigates the extent to which advanced Ambisonic decoding strategies can mitigate this artefact. In particular\, decoders based on Lasso regularisation and magnitude least-squares (magLS) are evaluated through numerical simulations in both loudspeaker and binaural reproduction scenarios. The results show that both approaches significantly reduce the prominence of the ring of silence compared to conventional minimum-norm mode-matching decoders. In loudspeaker reproduction\, a more uniform spatial distribution of SPL is obtained\, while in binaural rendering\, spectral consistency is improved. An interpretation of these results is proposed\, linking the observed behaviour to the underlying optimisation criteria of the decoding process. The results indicate that the ring of silence is not an inherent limitation of Ambisonics\, but rather a consequence of the decoding strategy\, and can be effectively mitigated through appropriate decoder design.
CATEGORIES:HOA
LOCATION:IRCAM:ESPRO (HOA)\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:afc6e72196175fd6f20446fc191aa07f
URL:http://aes2026avarigconference.sched.com/event/afc6e72196175fd6f20446fc191aa07f
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260702T103000Z
DTEND:20260702T113000Z
SUMMARY:Lunch
DESCRIPTION:
CATEGORIES:SOCIAL EVENT
LOCATION:IRCAM:Gallery\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:fb09f8cb59ff8b76ab8dc4241322d446
URL:http://aes2026avarigconference.sched.com/event/fb09f8cb59ff8b76ab8dc4241322d446
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260702T113000Z
DTEND:20260702T123000Z
SUMMARY:Interoperable Spatial Audio Workflows for XR and Live Systems with Grapes 3D Audio Control
DESCRIPTION:Immersive audio production for XR\, virtual environments\, and live performance is increasingly defined by a diversity of spatial formats and rendering systems\, including Ambisonics and object-based approaches. While these enable complex spatial experiences\, they also result in fragmented workflows and limited interoperability across production and playback contexts. This workshop explores spatial audio as a flexible and transferable practice rather than a system-bound process. It introduces Grapes 3D Audio Control as a system-independent control approach that enables users to work with spatial audio across different environments without being tied to a specific rendering pipeline. Participants will engage in hands-on exercises to create\, control\, and adapt spatial audio scenes across multiple contexts\, including XR applications\, media installations\, and live setups. The focus lies on maintaining spatial intent while working across heterogeneous systems and technical conditions. The workshop combines practical exploration with short demonstrations and structured discussion. It explicitly creates space for exchange on different workflows and production strategies\, bringing together perspectives from sound design\, audio engineering\, live operation\, and XR development. By focusing on interoperability\, workflow design\, and real-world application\, the workshop aims to provide participants with practical strategies for working with spatial audio across systems\, while contributing to a broader discussion on how immersive audio production can become more flexible\, portable\, and sustainable.
CATEGORIES:APPLICATION / GAMING
LOCATION:IRCAM:ESPRO (HOA)\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:67cca5215e8f4b38e0b300cb92c2f92e
URL:http://aes2026avarigconference.sched.com/event/67cca5215e8f4b38e0b300cb92c2f92e
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260702T113000Z
DTEND:20260702T130000Z
SUMMARY:(P) A Compact Inverse Auditory Model for Binaural Signal Reconstruction
DESCRIPTION:Binaural signal synthesis is typically formulated as forward modelling using head-related transfer functions (HRTFs). We explore an inverse auditory modelling perspective in which binaural ear signals are estimated directly from a source signal and its azimuth. We present a lightweight complex-valued neural network that predicts frequency-domain binaural filters from the input source spectrum and azimuthal direction\, which are then applied to synthesize binaural signals. Controlled experiments evaluate how excitation bandwidth and angular sampling density affect reconstruction and generalization. Results show accurate spectral reconstruction and interpolation to unseen source directions even when training uses sparse angular grids\, while bandwidth strongly influences problem conditioning and error behaviour. This work focuses on characterizing compact signal-conditioned inverse models as efficient components for binaural signal generation.
CATEGORIES:BINAURAL
LOCATION:IRCAM:Gallery\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:e8facc98ec34d193a0830f39f56c65f9
URL:http://aes2026avarigconference.sched.com/event/e8facc98ec34d193a0830f39f56c65f9
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260702T113000Z
DTEND:20260702T130000Z
SUMMARY:(P) Short-Term VR Sound-Localization Training under Simulated Single-Sided Deafness: Evaluation of an Enhanced HRTF
DESCRIPTION:Single-sided deafness (SSD) reduces access to binaural cues and can make spatial-audio localization difficult in virtual reality (VR). This study investigated short-term localization training under simulated SSD in a VR task using generic\, non-individualized head-related transfer function (HRTF) rendering with head-movement-contingent auditory updating\, and examined whether an enhanced HRTF could improve performance by emphasizing monaurally available spectral cues at the better-hearing ear. The rationale was that\, although directional judgment in normal binaural listening depends strongly on interaural differences\, monaural listening must rely more heavily on direction-dependent spectral characteristics that remain available at the better-hearing ear. Twenty normal-hearing participants performed a 13-source horizontal-plane localization task using a VR headset and headphones under simulated SSD. Participants were assigned to either normal-HRTF training or enhanced-HRTF training (n = 10 each). The experiment comprised pre-test\, three training sessions\, and post-test\, and all participants were tested with both normal and enhanced HRTFs\, yielding four train-test combinations. Performance was evaluated using accuracy (ACC)\, mean absolute error (MAE)\, and response time (RT). Localization performance improved with training under the present VR simulated-SSD condition. ACC increased and MAE decreased from pre-test to post-test\, whereas RT showed no clear change. No significant overall between-group difference in cumulative improvement was observed. However\, during training\, the enhanced-HRTF group showed a significant first-session advantage\, and matched train-test combinations showed descriptively larger gains than mismatched combinations. These results suggest that short-term VR localization training can improve directional judgment under simulated SSD and that enhancing monaural spectral cues may provide an early benefit by making direction-specific patterns easier to associate with source direction. The findings are limited to localization performance in the present VR task under simulated SSD and should not be directly generalized to clinical SSD populations\, real-world auditory rehabilitation\, or broader everyday 3D spatial-audio experience.
CATEGORIES:BINAURAL
LOCATION:IRCAM:Gallery\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:d83cb74f3641ec1bf142f35df3c7096d
URL:http://aes2026avarigconference.sched.com/event/d83cb74f3641ec1bf142f35df3c7096d
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260702T113000Z
DTEND:20260702T130000Z
SUMMARY:(P) An evaluation benchmark of artificial intelligence models for estimating head-related transfer functions (HRTFs) from ear shape representations
DESCRIPTION:Head-related transfer functions (HRTFs) are fundamental to spatial audio via binaural rendering. Personalized HRTFs have been shown to improve localization accuracy and reduce perceptual artifacts and directional ambiguities. However\, acquiring such HRTFs is time-consuming and requires costly measurement setups. To address this limitation\, this article investigates the use of deep learning models to estimate personalized HRTFs from ear shape representations. We propose and evaluate three different architectures with various types of input data and identify the minimum achievable spectral distance error when predicting true HRTFs magnitude spectra. The best model we evaluated achieves a test Log Spectral Distortion (LSD) of 4.93 dB. We also established a performance ranking based on input data types and architectural choices.
CATEGORIES:HRTFS
LOCATION:IRCAM:Gallery\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:3715951db585ae32d3d3608f349ed243
URL:http://aes2026avarigconference.sched.com/event/3715951db585ae32d3d3608f349ed243
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260702T113000Z
DTEND:20260702T130000Z
SUMMARY:(P) Investigating the Effect of Sample Rate Variation on the Accuracy of Sound Source Localisation Using a Neural Network
DESCRIPTION:This paper describes an experiment to investigate how the localisation performance of a neural network for Sound Source Localisation named `SampleDOA\_SR' would be affected by reducing the sample rate of the audio training data. Reducing the sample rate has several benefits\; most notably a reduction in training time. The goal is to determine an appropriate sample rate which balances both localisation accuracy and training time. This information will be used to inform the future training of a neural network for Sound Source Localisation which will be used in a stereo upmixing pipeline. The results of this experiment indicate reducing the sample rate from 48kHz down to below 4kHz results in a significant decrease in localisation accuracy. However\, above 4kHz\, the decrease in localisation accuracy is minimal whilst training time is reduced significantly. This suggests providing the particular application for the model does not require the highest level of accuracy\, a minimal reduction in localisation performance may be acceptable to obtain a large reduction in training time which would also reduce the environmental impact of the model training. A sample rate of 16kHz is suggested as a suitable balance between accuracy and training time.
CATEGORIES:HRTFS
LOCATION:IRCAM:Gallery\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:2c400f70df9df794f9144a3419c813ed
URL:http://aes2026avarigconference.sched.com/event/2c400f70df9df794f9144a3419c813ed
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260702T113000Z
DTEND:20260702T130000Z
SUMMARY:(P) Optimising HRTFs to Improve Spatial Release from Masking
DESCRIPTION:Binaural hearing supports effective communication in complex acoustic environments by enabling listeners to segregate spatially separated sound sources\, a benefit referred to as spatial release from masking (SRM). The spatial cues that give rise to SRM are determined by the head-related transfer function (HRTF). Although individual HRTFs are generally considered optimal for accurate localisation\, prior work suggests they do not necessarily maximise performance across all aspects of spatial perception\, including SRM. This motivates the concept of application-specific HRTFs. Here\, we propose an application-specific HRTF augmentation method to improve speech intelligibility in cocktail-party scenarios\, focusing on front–back configurations where SRM is limited. HRTFs are parameterised using principal component analysis and optimised via a differentiable auditory-model-based objective to enhance spectral cues while constraining interaural level differences. The method yields model-predicted SRM gains of 4–9 dB without inducing substantial predicted lateralisation artefacts.
CATEGORIES:HRTFS
LOCATION:IRCAM:Gallery\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:90353f7feb2d6cf44dce21e7d2a20908
URL:http://aes2026avarigconference.sched.com/event/90353f7feb2d6cf44dce21e7d2a20908
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260702T113000Z
DTEND:20260702T130000Z
SUMMARY:(P) Which Tracking Characteristics from an Audio Only VR Onboarding Predict the Best Performance HRTFs?
DESCRIPTION:This study is motivated by an ambition to determine the ‘best’-matching HRTFs during an onboarding task for an audio-only virtual reality (VR) experience using a ‘shooting down sound sources’ task. The study is motivated by the needs of blind and visually impaired gamers\, who may rely more crucially on accurate rendering of auditory spatial cues for succeeding in the audio-only VR experience. We present an exploratory study applying an experimental VR test platform that renders ‘target’ sound sources in a virtual environment and logs tracking characteristics of head\, hand-held controller and body while participants localise and ‘shoot’ audible ‘targets’ that are visible (for task familiarisation) and invisible. Four game-relevant sound stimuli and three different HRTFs were tested across eight sessions on two separate days. In this study\, we show data collected from fifteen seeing participants\, which demonstrate an ability to localise the sound sources accurately. The tracking data suggests various search patterns (e.g. hemisphere swaps and direction reversals) associated with ‘weak’ localisation cues and possible ambiguities. The search patterns are likely all quantifiable via angular error\, response time\, path length\, search directions\, number of reversals\, and search speed as determined from the tracking characteristics.
CATEGORIES:HRTFS
LOCATION:IRCAM:Gallery\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:9ceab297617bab3699d90c9c226d3285
URL:http://aes2026avarigconference.sched.com/event/9ceab297617bab3699d90c9c226d3285
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260702T113000Z
DTEND:20260702T123000Z
SUMMARY:The SONICOM Ecosystem
DESCRIPTION:The SONICOM Ecosystem is a repository dedicated to spatial hearing and binaural audio. It provides means to store data as databases and tools (including their metadata)\, to create relations between them\, and to enable specific data visualization tailored to the needs of the auditory community. It also enables persistent publications via digital object identifiers (DOIs) and supports the authors along their typical process of publishing scientific articles. In this workshop\, we will guide the participants through the key features of the SONICOM Ecosystem and show how the Ecosystem can support researchers during their publication workflow.
CATEGORIES:IMMERSIVE AUDIO
LOCATION:IRCAM:Stravinsky\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:8155c1b826a28e1a3bd8b1c064a37b1b
URL:http://aes2026avarigconference.sched.com/event/8155c1b826a28e1a3bd8b1c064a37b1b
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260702T113000Z
DTEND:20260702T140000Z
SUMMARY:Sponsor demos
DESCRIPTION:Come see the newest developements of our sponsors
CATEGORIES:SPONSOR DEMOS
LOCATION:IRCAM:Studio 5\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:31efe2fbd039c4b7da82f0ee2b5452ed
URL:http://aes2026avarigconference.sched.com/event/31efe2fbd039c4b7da82f0ee2b5452ed
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260702T123000Z
DTEND:20260702T133000Z
SUMMARY:The Role of Source Directivity in Spatial Audio Rendering for AR/VR/XR Environments
DESCRIPTION:Source directivity constitutes a fundamental acoustic property of musical instruments\, describing the variation of radiated sound pressure as a function of direction. This behavior is dependent on the geometry\, material properties\, and excitation mechanisms of the instrument\, and plays an important role in spatial sound perception. In the real world\, the directional characteristics of a source contribute significantly to how sound is localized\, how timbre is perceived across different listening positions\, how sound is captured with different microphone techniques and placements\, and how sound interacts with the surrounding environment. Yet\, despite its importance\, source directivity is often simplified or neglected in contemporary spatial audio rendering approaches\, particularly within AR/VR/XR applications where computational constraints and system complexity frequently dictate design choices. Directivity describes the angular dependence of radiated sound pressure and constitutes a defining acoustic signature of each instrument. Acoustic directivity measurements are based on demanding and carefully controlled procedures. Typically\, they are conducted in anechoic or low-reverberation environments using dense microphone arrays\, and rely on excitation mechanisms\, in order to improve measurement accuracy and repeatability. It should be acknowledged\, however\, that there exists a gap between acoustic research and its practical integration into immersive media technologies. Many current XR applications rely on simplified or generic source models\, prioritizing computational efficiency and ease of implementation over acoustic accuracy. While there is a clear benefit on the use of simplified directivity approaches\, such practices reduce the perceptual realism and fidelity of the reproduced sound field. This raises critical questions: To what extent does accurate directivity contribute to perceptual realism? Are approximations sufficient\, and under what conditions do they compromise the experience? This workshop addresses these questions by exploring both the scientific foundations and practical implications of incorporating source directivity into AR/VR/XR systems. It is structured in three parts\, offering theoretical information and practical perspectives on the role of sound source directivity in immersive audio applications. The first part discusses source directivity and its importance in sound emission\, perception\, and spatial realism. Emphasis will be given on recent research involving the capture and analysis of directivity patterns of the human signing voice across different music genres and traditional Greek musical instruments. Two directivity databases dedicated to this research\, which are publicly available through the SONICOM Ecosystem repository (https://ecosystem.sonicom.eu/) will be also presented\, along with an overview of their structure\, content\, and potential applications. The second part focuses on the integration of directivity data into spatial audio rendering pipelines for AR/VR/XR environments. Participants will be introduced to the latest updates of the SOFA (Spatially Oriented Format for Acoustics) conventions specifically created for storing and exchanging directivity information. In addition\, the Binaural Rendering Toolbox (BRT)\, developed within the SONICOM project\, will be presented as a practical tool that facilitates the implementation of directivity-aware rendering workflows. The third part concerns a critical discussion on the practical implications of using accurate or approximated directivity data in immersive audio applications. Drawing on results from selected case studies\, the session will evaluate the perceptual and computational trade-offs involved\, offering guidance on when high-precision data is necessary and when simplified models may suffice in AR/VR/XR applications.
CATEGORIES:IMMERSIVE AUDIO
LOCATION:IRCAM:Stravinsky\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:1ddc370fbc55bdacf29d8dc1768c3e58
URL:http://aes2026avarigconference.sched.com/event/1ddc370fbc55bdacf29d8dc1768c3e58
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260702T123000Z
DTEND:20260702T130000Z
SUMMARY:Needlets for Spatial Audio: Objective Evaluations
DESCRIPTION:This study introduces needlets\, a specific class of spherical wavelets\, for spatial audio applications. Needlets are constructed in the spherical harmonic domain\, are mathematically well defined\, possess good localisation properties\, and facilitate multiresolution analysis. However\, because they form a tight frame\, they are redundant and therefore require sparsification for practical applications. We propose a comprehensive spatial audio framework based on needlets\, spanning encoding through to head-tracking-enabled binaural rendering. In this framework\, a sound scene is encoded into a redundant needlet dictionary\, which is subsequently sparsified using a novel algorithm. The resulting sparse representation is then decoded for headphone reproduction. Scene rotation is achieved by applying SO(3) rotation matrices to the sparse representation. The perceptual implications of the framework’s design parameters were evaluated using objective metrics and compared with those of Ambisonics. Initial results show that the proposed framework can achieve better tonal and spatial fidelity than third- and fourth-order Ambisonics Magnitude Least-Squares decoding while using a similar number of channels. Moreover\, the proposed framework has been shown to allow users to tune the reproduced sound scene while maintaining fidelity.
CATEGORIES:SOUNDFIELD
LOCATION:IRCAM:ESPRO (HOA)\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:2d01ac934c7845e3aa9e52616f13accc
URL:http://aes2026avarigconference.sched.com/event/2d01ac934c7845e3aa9e52616f13accc
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260702T130000Z
DTEND:20260702T133000Z
SUMMARY:Coffee
DESCRIPTION:
CATEGORIES:SOCIAL EVENT
LOCATION:IRCAM:Gallery\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:c0361bdce3b4b3f6396951c389899f5c
URL:http://aes2026avarigconference.sched.com/event/c0361bdce3b4b3f6396951c389899f5c
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260702T130000Z
DTEND:20260702T133000Z
SUMMARY:Neural Regularization for Personal Sound Zones
DESCRIPTION:Pressure-matching (PM) for personal sound zone (PSZ) can achieve high contrast at nominal control points\, but the performance may degrade when transfer functions are mismatched. We introduce a neural method that maps transfer functions to loudspeaker weights using a single-frequency input network with parameters shared across frequencies. We evaluate the robustness under position shifts\, additive transfer-function noise\, and added reflections\, and compare against PM with Tikhonov regularization. Results show improved robustness to structured perturbations such as listener displacement\, whereas regularized PM remains more resilient to unstructured random transfer-function noise and reverberation. We further explain these results using a singular value decomposition based perturbation projection. Finally\, we analyze different regularization mechanisms induced by the network and derive practical guidelines for neural PSZ filter optimization.
CATEGORIES:SOUNDFIELD
LOCATION:IRCAM:ESPRO (HOA)\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:8e651cd085c39663a8f7bc977f32df9e
URL:http://aes2026avarigconference.sched.com/event/8e651cd085c39663a8f7bc977f32df9e
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260702T133000Z
DTEND:20260702T140000Z
SUMMARY:Sub-band Neural Interpolation of Binaural Room Impulse Responses for Personal Sound Zone Control
DESCRIPTION:To enable dynamic control in transaural personal sound zone (PSZ) systems\, accurate binaural room impulse responses (BRIRs) at various listener positions are needed. Since it is impractical to measure BRIRs at all possible positions\, interpolation from a sparse set of measured positions can be used. Although numerous BRIR interpolation methods exist\, their effectiveness in sound field control applications remains unclear. In this paper\, we propose a sub-band interpolation method that combines linear interpolation for frequencies lower than 2000 Hz with sinusoidal representation networks for frequencies higher than 2000 Hz. The interpolated BRIRs are then applied in a PSZ control system. Simulation results demonstrate that this hybrid approach significantly improves system performance at a wider frequency range.
CATEGORIES:SOUNDFIELD
LOCATION:IRCAM:ESPRO (HOA)\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:613173d6dfbdfebe9b0f10afdf65ff09
URL:http://aes2026avarigconference.sched.com/event/613173d6dfbdfebe9b0f10afdf65ff09
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260702T140000Z
DTEND:20260702T150000Z
SUMMARY:Self-Transit to Cité des Sciences: Planetarium
DESCRIPTION:Attendees should make their way to the planetarium. Late arrivals will not gain entrance.
CATEGORIES:DEMO
LOCATION:Off-site\, via Metro
SEQUENCE:0
UID:0a91ed19b8d1f440981e291c8e5fbc67
URL:http://aes2026avarigconference.sched.com/event/0a91ed19b8d1f440981e291c8e5fbc67
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260702T150000Z
DTEND:20260702T160000Z
SUMMARY:Vaulted Harmonies 360° @ Cité des Sciences et de l'Industrie
DESCRIPTION:Special premiere projection of Vaulted Harmonies 360° for AVARIG attendees at the planetarium\, Science and Industry Museum. \n Address : 30\, avenue Corentin-Cariou Paris 19e. \n AVARIG provided ticket required for entry.
CATEGORIES:DEMO
LOCATION:Off-site\, via Metro
SEQUENCE:0
UID:e8dd1418e520c4159c5db463e47a8aa5
URL:http://aes2026avarigconference.sched.com/event/e8dd1418e520c4159c5db463e47a8aa5
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260703T073000Z
DTEND:20260703T080000Z
SUMMARY:Coffee
DESCRIPTION:
CATEGORIES:SOCIAL EVENT
LOCATION:IRCAM:Gallery\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:6110abf3246093f76afe1dc61f58f0f3
URL:http://aes2026avarigconference.sched.com/event/6110abf3246093f76afe1dc61f58f0f3
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260703T080000Z
DTEND:20260703T090000Z
SUMMARY:Vaulted Harmonies : Archaeoconcert at Notre-Dame film projection
DESCRIPTION:The Vaulted Harmonies project reconstructs the acoustic and musical heritage of Notre-Dame de Paris through immersive audio-visual experiences spanning multiple centuries of the cathedral's architectural evolution\, developed as part of the Past Has Ears at Notre-Dame (PHEND) research project. Building upon the dome screening presented earlier at AVARIG 2026\, this presentation focuses on the spatial audio production workflow underpinning the project and the perceptual trade-offs involved in adapting it across dissemination formats. Starting from room impulse responses derived from geometrical acoustic simulations and convolved with anechoic multichannel musical recordings using RoomZ\, a dynamic room impulse response panner developed as part of the project\, higher-order ambisonic renderings were generated through continuous interpolation along cinematically designed camera trajectories. The 360° dome version\, premiered at the Planetarium of the Cité des Sciences et de l'Industrie\, decoded these renderings to a 5.1+1 loudspeaker layout. This presentation complements that screening by examining the underlying HOA production chain and presenting selected scenes in third-order ambisonic reproduction\, enabling direct perceptual comParison between the two output formats\, though in a conventional frontal video projection context. Topics discussed include the design of spatially coherent auralisation trajectories\, maintaining a coherent audio-visual narrative across successive historical reconstructions of the cathedral\, and the trade-offs related to output format and deployment context\, situated within a broader workflow designed to support wide public dissemination of immersive heritage experiences.
CATEGORIES:IMMERSIVE AUDIO
LOCATION:IRCAM:ESPRO (HOA)\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:565b826474c822770faab286f44fe6dd
URL:http://aes2026avarigconference.sched.com/event/565b826474c822770faab286f44fe6dd
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260703T080000Z
DTEND:20260703T103000Z
SUMMARY:Sponsor demos
DESCRIPTION:Come see the newest developements of our sponsors
CATEGORIES:SPONSOR DEMOS
LOCATION:IRCAM:Studio 5\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:55029a721a1e14e1a17b0ae509efeb13
URL:http://aes2026avarigconference.sched.com/event/55029a721a1e14e1a17b0ae509efeb13
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260703T083000Z
DTEND:20260703T090000Z
SUMMARY:Perceptual Modeling of Binaural vs. Stereo Music Mixes: A Pairwise Differential Approach with Dimension-wise Attention
DESCRIPTION:Evaluating binaural rendering against stereo mixes is frequently confounded by "content bias\," where listeners' inherent musical preferences obscure spatial quality assessments. To address this\, we propose an interpretable predictive model utilizing a pairwise differential approach (Delta Strategy) and a dimension-wise attention neural network. The model achieves a competitive sign accuracy of 68.4%\, outperforming traditional baselines. Crucially\, the attention mechanism provides retrospective interpretability\, revealing fundamental acoustic trade-offs in spatial upmixing: aggressive decorrelation for image widening compromises localization precision and timbral fullness\, whereas successful externalization heavily depends on mid-side energy redistribution. This framework offers a robust evaluation tool for spatial algorithms and actionable psychoacoustic guidance for immersive audio production.
CATEGORIES:PERCEPTION
LOCATION:IRCAM:Stravinsky\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:83090d66b5f039cf5c58bf1dbf818916
URL:http://aes2026avarigconference.sched.com/event/83090d66b5f039cf5c58bf1dbf818916
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260703T090000Z
DTEND:20260703T103000Z
SUMMARY:(P) Convolving the Convoluted: Acoustigrammetry for Immersive Virtual Reality
DESCRIPTION:Numerous approaches have been taken to address the problem of generating navigable virtual models for multi-volume acoustic spaces. The general practice for creating empirically informed interactive models of multi-volume acoustic spaces\, as embodied by the Spatially Oriented Format for Acoustics\, is to discretely sample emitter-receiver pair positions. For a user to then navigate between these discrete positions involves cross-fading\, blending\, or otherwise perceptually interpolating between corresponding zones. This paper outlines a new approach which instead involves the continuous three-dimensional sampling of acoustic spaces\, much as is done with 3D visual spaces in photogrammetry. To achieve this result\, a first-of-its-kind consolidated ambisonic impulse response capturing apparatus has been designed and built. This apparatus combines a 3rd-order ambisonic microphone array with a 2nd-order ambisonic loudspeaker array and is designed to be moved through a space with maximal ease. AD/DA conversion\, playback\, and recording are all handled on a central compute platform. In parallel\, a software workflow has been developed which can be implemented in Unreal Engine\, as well as other game engines. To solve general issues of spatial audio in game engines\, a custom encoding and decoding framework has been implemented. Then\, to map the continuous ambisonic impulse response onto a virtual space\, a spline mirroring the sampling path is drawn through the space. On the DSP side\, an impulse response is extracted from any arbitrary point along the spline by way of the Common-slope Model for coupled spaces. Future work for better addressing early reflections and minimizing the theoretical intermediary of the Common-slope Model is discussed. Additionally\, a special use case for visualizing acoustic energy in architectural acoustics is explored.
CATEGORIES:AURALIZATION / 6DOF
LOCATION:IRCAM:ESPRO (HOA)\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:5c90b90492652fd29d1fd7624579a958
URL:http://aes2026avarigconference.sched.com/event/5c90b90492652fd29d1fd7624579a958
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260703T090000Z
DTEND:20260703T103000Z
SUMMARY:(P) Immersive Drum Circle: A Tool For Performing and Composing Spatial Music
DESCRIPTION:Despite significant advances in the development and adoption of spatial audio\, many musicians do not embed the technology within their creative processes. Instead\, spatial audio technologies are more often used to create immersive adaptations of fundamentally frontal compositions or performances. This paper presents and evaluates a means of spatial music making\, referred to as the immersive drum circle. The system facilitates group performance and composition\, in which participants stand in a circle and perform on electronic percussion pads\, with sound spatialised so that the listener experiences the music as if positioned within the ensemble. The system’s design is presented alongside implementation details\, as well as feedback from musicians obtained as part of an educational workshop which aimed to inform how spatial audio can be used creatively in music. In addition to interacting with the system\, participants auditioned the resulting spatial music across three playback scenarios representing: gaming\, tracked and non-tracked headphone-based music consumption\, and a live concert environment. The results show that the immersive drum circle system is a viable tool for music creation and a practical means of inspiring future compositional techniques.
CATEGORIES:AUTHORING
LOCATION:IRCAM:ESPRO (HOA)\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:1edf8e67b12d553d464ed282510064fa
URL:http://aes2026avarigconference.sched.com/event/1edf8e67b12d553d464ed282510064fa
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260703T090000Z
DTEND:20260703T103000Z
SUMMARY:(P) Ambimix: a Scene-Based Approach to Interactive Mixing
DESCRIPTION:Channel-based mixing has long been the standard paradigm for audio professionals in both studio and live performance contexts\, owing to its intuitive\, signal-oriented workflow. While this approach excels in conventional stereo and multichannel formats\, it offers limited native support for advanced spatial applications. As immersive audio formats become increasingly popular for virtual and augmented reality\, new mechanisms are needed to allow engineers to work directly within spatial domains rather than adapting channel-based setups to fit immersive content. This paper presents an interactive software system for mixing First-Order Ambisonic (FOA) soundfields in real time. The system accepts A-format recordings from tetrahedral microphones\, including the Sennheiser Ambeo and the Core Sound TetraMic. It converts them to B-format using a directional beamforming approach\, in which the individual dimensions (left\, right\, front\, back\, top\, bottom) are independently crossfaded between sources. Each directional beam is extracted via a dot product with a corresponding spherical harmonic steering vector\, crossfaded between the two input soundfields\, and reencoded into a new B-format using an outer product reconstruction. The program architecture is designed to accommodate future extensions to additional Ambisonic microphone formats\, including the Røde SoundField\, the Zylia ZM-1\, and the Eigenmike. This positions the platform toward a microphone-agnostic encoding and mixing platform. By parameterizing the encoding stage through a gain-factored matrix\, the directional steering vectors can be reconfigured to reflect the capsule geometry of any tetrahedral target microphone array. Future iterations of the program can extend the beamforming framework beyond the first-order WXYZ dimensions\, enabling the integration of higher-order ambisonic encoding to improve spatial resolution and directional accuracy. The primary contribution of this program is a streamlined interface for ambisonic processing\, designed to make scene-based mixing accessible to engineers and sound designers working in immersive audio production.
CATEGORIES:HOA
LOCATION:IRCAM:ESPRO (HOA)\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:94ebf49dcb3c280a4bdfe7030e9baec5
URL:http://aes2026avarigconference.sched.com/event/94ebf49dcb3c280a4bdfe7030e9baec5
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260703T090000Z
DTEND:20260703T103000Z
SUMMARY:(P) Perceptual Evaluation of Higher-Order Ambisonic Codecs on Both Synthetic Mixing and Native Recordings
DESCRIPTION:Spatial audio is spreading in applications such as virtual and augmented reality and immersive games. The higher-order ambisonic (HOA) format is particularly useful in this context. Transmitting spatial information requires multiple channels\, e.g.\, 16 channels for third-order ambisonics\, resulting in increased memory requirements for storage and higher bitrates for communication. Therefore\, efficient compression algorithms are necessary for those contents. The recently standardized IVAS codec allows the coding of HOA content for communication use-cases. Here\, we propose to evaluate it in comParison with a basic multi-mono approach across a variety of contents and spatialization methods. Results show that IVAS outperforms the multi-mono approach at the same bitrate. In particular\, this codec exploits inter-channel correlation to reduce the bitrate. We point out that it is therefore especially robust for signals with a high interchannel correlation\, such as those composed of a limited number of plane waves. Conversely\, the multi-mono approach is unable to exploit this correlation and performs poorly on this type of signal.
CATEGORIES:PERCEPTION
LOCATION:IRCAM:ESPRO (HOA)\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:5096ce4804633cfc43bc419b03dcbf8a
URL:http://aes2026avarigconference.sched.com/event/5096ce4804633cfc43bc419b03dcbf8a
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260703T090000Z
DTEND:20260703T093000Z
SUMMARY:The Impact of User Expertise on Immersion and Usability in an Interactive VR Music Experience
DESCRIPTION:Designing interactive music systems in Virtual Reality (VR) requires balancing intuitive entry points with expressive depth\, yet it remains unclear how domain-specific knowledge (Music Expertise) and medium-specific experience (VR Familiarity) distinctly shape the user experience within these environments. This paper investigates how user expertise impacts engagement with an interactive VR music experience. We conducted a mixed-methods study with 32 participants\, categorized by these two factors\, to systematically evaluate their influence on perceived usability\, immersion\, and interaction behavior. Results indicate that Music Expertise significantly enhanced perceived usability\, whereas VR Familiarity had no significant effect. Perceived immersion was reported as universally high across all groups\, regardless of background. Behavioral data revealed distinct engagement patterns: Experts and VR-familiar users focused more on 6DoF spatial mixing controls\, while novices required significantly more time and physical exploration. These findings suggest that for creative VR tools\, domain knowledge is a stronger predictor of usability than technical fluency. We discuss the success of a ‘Low Floor\, High Ceiling\, and Wide Walls’ design and propose critical design implications for onboarding\, interaction metaphors\, and aligning user intent in embodied music systems.
CATEGORIES:PERCEPTION
LOCATION:IRCAM:Stravinsky\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:895969c71fd0a7794100ee2e95d45a7d
URL:http://aes2026avarigconference.sched.com/event/895969c71fd0a7794100ee2e95d45a7d
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260703T093000Z
DTEND:20260703T100000Z
SUMMARY:The Influence of Listener's Background on Virtual Source Detection in a 6DoF Spatial Audio Task
DESCRIPTION:The perceptual evaluation of spatial and immersive audio systems commonly relies on listening tests\, where the role of listener-related factors is often treated as secondary. While previous studies have shown that listener expertise can influence performance in virtual audio tasks\, this has not been systematically investigated in more complex mixed real–virtual and dynamic listening scenarios. This study examines the role of listener background in a six-degrees-of-freedom (6DoF) spatial detection task involving virtual and real sound sources. Eighteen participants identified the presence of a virtual speech source among concurrent targets and distractors while freely navigating a loudspeaker-based scene. Listener background was characterised by years of musical training and self-reported experience with spatial audio technologies\, used to categorise participants as expert or naïve. Results show above-chance performance\, with reduced accuracy in spatially adjacent conditions. Listeners with greater musical training and spatial audio experience achieved higher percent-correct scores. These findings are consistent with prior work on listener-dependent localisation performance\, and extend them to a 6DoF mixed real–virtual context. The results highlight the importance of explicitly considering and reporting participant expertise in the design\, analysis\, and interpretation of spatial audio perception studies.
CATEGORIES:PERCEPTION
LOCATION:IRCAM:Stravinsky\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:e44a67640d3e07cd5a7a68a17ddc3e98
URL:http://aes2026avarigconference.sched.com/event/e44a67640d3e07cd5a7a68a17ddc3e98
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260703T100000Z
DTEND:20260703T103000Z
SUMMARY:Choir Performance in Virtual Versus Real Rooms: The Influence of Acoustic Modality on Singers’ Performance and Perception
DESCRIPTION:Several studies suggest that singers adapt their vocal production to room acoustics\, and virtual reality (VR) has increasingly been used to investigate such interactions under controlled conditions. However\, questions remain regarding the ecological validity of virtual acoustic environments for studying musicians’ behavior. While prior research has primarily focused on solo singers\, the present study explores the impact of acoustic modality (real vs. virtual) on choral performance. A professional four-singer ensemble performed five different choral pieces across five acoustic conditions. Recordings were conducted both in situ\, within different spaces of a church\, and under corresponding virtual acoustic simulations using auralization techniques. Acoustic and physiological data were collected using close microphones and electroglottography\, while subjective perceptions were assessed through questionnaires. Comparative analyses between real and virtual conditions aim to examine how acoustic modality (real or virtual) influences singers’ musical and physiological adaptations\, as well as their subjective perceptions.
CATEGORIES:PERCEPTION
LOCATION:IRCAM:Stravinsky\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:79d4b0c894216cd0f1bd76506262cb05
URL:http://aes2026avarigconference.sched.com/event/79d4b0c894216cd0f1bd76506262cb05
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260703T103000Z
DTEND:20260703T113000Z
SUMMARY:Lunch
DESCRIPTION:
CATEGORIES:SOCIAL EVENT
LOCATION:IRCAM:Gallery\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:064e7fde2ecb20f284bd44e159795f70
URL:http://aes2026avarigconference.sched.com/event/064e7fde2ecb20f284bd44e159795f70
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260703T113000Z
DTEND:20260703T123000Z
SUMMARY:Streaming for mixed reality venues
DESCRIPTION:A demonstration of a newly developed network spatial audio engine and its client software to show how an object-based audio performance can presented simultaneously locally and in a virtual venue. I’ll play multiple tracks of audio and position data from a laptop\, as a surrogate for a local performance\, and stream this object-based audio into a 6 DoF virtual audio space. Then show how a remote audience can join the same audio space via web browsers\, listen to the music and explore the space. And finally I’ll show streaming back a resolved ambisonic mix of the original performance and the sounds of the remote audience into the auditorium and play it out. I'll walk it through and show how all audio and data transfer is done with simple data structures and standard non-proprietary\, streaming protocols.
CATEGORIES:APPLICATION / GAMING
LOCATION:IRCAM:ESPRO (HOA)\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:db7f8fd432711f9d98dc710ad37711b5
URL:http://aes2026avarigconference.sched.com/event/db7f8fd432711f9d98dc710ad37711b5
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260703T113000Z
DTEND:20260703T120000Z
SUMMARY:On the influence of headphone cup acoustics on individual pinna cues
DESCRIPTION:In head-related transfer functions (HRTFs)\, spectral cues due to the individual pinna geometry are known to contribute to elevation perception and externalization. The pinna component of an HRTF is referred to as a pinna-related transfer function (PRTF). Some headphone concepts aim to excite individual PRTF cues by placing the headphone transducer away from the traditional position on the interaural axis\, e.g. tilted in front of the pinna. However\, it is not clear to which extent the individual PRTF is preserved when the pinna is placed inside a headphone cup enclosed by a baffle and a cushion. In this study\, multiple prototype setups successively approximating a headphone cup and allowing for variable transducer positions are analyzed using a set of silicone pinna replicas. PRTF perturbations are analyzed in near field measurements and the impact of headphone cup acoustics is discussed. Based on the observation that the perturbations are systematic\, an equalization scheme restoring the free field PRTF based on the median of measurements with several pinnae is proposed.
CATEGORIES:HRTFS
LOCATION:IRCAM:Stravinsky\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:35cafc8292166f46d5b40bfdfa66aa57
URL:http://aes2026avarigconference.sched.com/event/35cafc8292166f46d5b40bfdfa66aa57
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260703T113000Z
DTEND:20260703T140000Z
SUMMARY:(P) Acoustic and Perceptual Evaluation of Integrated Near-Ear Speakers vs. Over Head Headphones in VR Environments
DESCRIPTION:Virtual reality (VR) technologies have become increasingly widespread\, extending beyond their traditional military and professional training applications to areas such as education\, simulation\, gaming\, and entertainment. Most modern VR headsets are equipped with built-in near-ear speakers\, commonly called nearphones. Between conventional headphones and loudspeakers\, these devices and nearphones offer a convenient and lightweight audio solution without physically enclosing the ear. However\, their impact on spatial audio perception and localization performance remains underexplored. This study explores how nearphones integrated into head-mounted displays (HMDs) perform relative to traditional headphones\, focusing on identifying the specific acoustic and perceptual factors that enhance or hinder immersive audio experiences in virtual reality. Using the Oculus Quest 3 as a test platform\, the research was divided into two parts. First\, the frequency response of both headphones and earphones was measured to assess differences in sound quality. Second\, a VR first-person shooter game was developed in Unreal Engine to evaluate sound localization. Participants identified targets based on audio cues alone\, and performance metrics such as target accuracy and reaction times were collected to compare localization effectiveness. Besides localization accuracy\, this research explored which users prefer audio devices. The results suggest that while traditional headphones generally offer more accurate spatial localization\, nearphones provide greater comfort and convenience\, highlighting a trade-off between acoustic precision and user ergonomics in VR applications.
CATEGORIES:PERCEPTION
LOCATION:IRCAM:Gallery\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:71b9bbe5d2c01a64c4242168576fc81a
URL:http://aes2026avarigconference.sched.com/event/71b9bbe5d2c01a64c4242168576fc81a
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260703T113000Z
DTEND:20260703T140000Z
SUMMARY:(P) Increasing Accessibility of Auditory Research: A 6-DoF Motion-Capture-Based Interface for Localisation Testing
DESCRIPTION:Perceptual evaluation of auditory localisation typically relies on graphical user interfaces\, pointing devices\, or touch screens to capture listener responses. These modalities implicitly require functional vision and/or manual dexterity\, excluding participation of\, for instance\, people with visual impairments. This paper presents a solution for absolute sound-source localisation testing that uses head rotation\, tracked by a six-degrees-of-freedom (6-DoF) optical motion-capture system as the response interface and relies solely on auditory cues for calibration and pointing. The paradigm builds on the natural coupling between auditory spatial attention and head orientation. Individual systematic bias is characterised via a mandatory training block in which stimuli are presented at discrete loudspeaker positions. A per-participant linear regression fitted to head-centred training responses provides a bias model that is applied to main-experiment trials\, enabling decomposition of localisation error (LE) into constant error (CE\, reflecting accuracy) and random error (RE\, reflecting precision)\, following the established accuracy--precision framework for spatial hearing assessment. The specific use case simulates off-sweet-spot listening positions to inform development of a renderer aimed at enhancing the experience of visually impaired audiences consuming audio-described broadcast content. Preliminary data from the control group consisting of sighted participants are presented. The interface design\, calibration procedure\, and analysis pipeline are offered as a contribution towards inclusive spatial audio evaluation practice.
CATEGORIES:PERCEPTION
LOCATION:IRCAM:Gallery\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:4b35230e32fa8bfbd7a25470c255fd35
URL:http://aes2026avarigconference.sched.com/event/4b35230e32fa8bfbd7a25470c255fd35
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260703T113000Z
DTEND:20260703T140000Z
SUMMARY:(P) The impact of audio spatialisation reproduction on the neurophysiological responses of music listeners
DESCRIPTION:Research into listeners’ emotional experience of different audio formats heavily relies on subjective\, self-report measures. However\, little is known about neural and physiological responses. As such\, this feasibility study utilised electroencephalography (EEG)\, Heart Rate (HR) and Galvanic Skin Response (GSR)\, to explore the objective neurophysiological impacts of mono\, stereo and spatial audio formats\, across different music genres. In a within-subjects design\, participants listened to 27 randomised stimuli\, each comprising of a 30 second music excerpt across the three audio formats. Results were not significant but trends did arise in the data. While mono formats were shown to elevate cognitive load and arousal\, spatial audio elicited a decrease in physiological arousal\, promoting a more relaxed state. However\, the effects overall were very genre-dependent. Differences in physiological response between static and dynamic spatial reproduction of different music genres are discussed. While limited by the lack of subjective validation and sample size\, this study highlights interesting relationships between audio format and the physiological responses of music listeners.
CATEGORIES:PERCEPTION
LOCATION:IRCAM:Gallery\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:14a73580e9170977e52db439a311a3d8
URL:http://aes2026avarigconference.sched.com/event/14a73580e9170977e52db439a311a3d8
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260703T113000Z
DTEND:20260703T140000Z
SUMMARY:(P) Virtualising SPHERE: active listening in 3D sound localisation
DESCRIPTION:Spatial hearing emerges from the integration of auditory\, multisensory\, and motor information\, and is enhanced in natural conditions through active listening\, where head and body movements provide dynamic cues that improve localisation accuracy and perceptual stability. This principle is central to immersive audio research in Virtual and Augmented Reality (VR/AR)\, where binaural rendering based on Head-Related Transfer Functions (HRTFs) and room acoustic cues enables the reproduction of interaural\, monaural\, and distance information. Beyond acoustics\, bodily engagement (i.e.\, reaching toward sound sources) further supports spatial adaptation. These technologies enable controlled experimental protocols for assessment and training in both normal-hearing and hearing-impaired populations. One such paradigm is SPHERE\, originally developed to study three-dimensional sound localisation in ecologically valid conditions and later applied to training and rehabilitation\, including for cochlear implant users. In its original implementation\, participants localise sounds presented via a physically moved loudspeaker and respond either through active exploration or under static listening constraints\, while head\, eye\, and hand movements are tracked to analyse localisation accuracy\, motor behaviour\, and search strategies. However\, reliance on a human operator limits reproducibility and scalability. This work introduces a fully virtualised SPHERE implementation using an immersive binaural rendering framework\, preserving the original spatial configuration while enabling real-time multimodal tracking. The system also evaluates the impact of HRTF individualisation by comparing generic and personalised filters. Performance is validated against the original loudspeaker-based paradigm to assess ecological validity. Preliminary results regarding the system’s effectiveness as a research and clinical tool will be presented at the conference.
CATEGORIES:PERCEPTION
LOCATION:IRCAM:Gallery\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:ecc625096a1eb94759ca04b761dd3712
URL:http://aes2026avarigconference.sched.com/event/ecc625096a1eb94759ca04b761dd3712
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260703T113000Z
DTEND:20260703T140000Z
SUMMARY:Sponsor demos
DESCRIPTION:Come see the newest developements of our sponsors
CATEGORIES:SPONSOR DEMOS
LOCATION:IRCAM:Studio 5\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:6b43d776daf4fd9c205da3e4fcaeb380
URL:http://aes2026avarigconference.sched.com/event/6b43d776daf4fd9c205da3e4fcaeb380
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260703T120000Z
DTEND:20260703T123000Z
SUMMARY:Personalized Head-Related Transfer Function Modeling Using a Neural Operator
DESCRIPTION:Virtual\, augmented\, and mixed reality experiences are becoming more commonplace as consumer-grade devices proliferate. Head-Related Transfer Functions (HRTFs) are used to create realistic spatial audio in virtual and augmented environments. Mathematically\, HRTFs represent solutions to acoustic boundary-value scattering problems governed by the Helmholtz equation. Neural operators are neural networks designed to learn the solutions of partial differential equations (PDEs). The present work proposes an operator-learning framework based on the Deep Operator Network (DeepONet) for individualized HRTF prediction. By implementing a non-uniform sampling strategy for 3-D head meshes and data compression along the frequency axis\, the framework achieves high-fidelity predictions while reducing data dimensionality. Our method shows low log-spectral distortion\, generalizes to unseen spatial grids\, and infers an entire head’s HRTF field in ~0.3 seconds. Objective evaluations demonstrate the framework's effectiveness in personalization and spatial interpolation. Furthermore\, robust performance on unseen subjects and coordinates highlights the model's generalization capability\, offering a computationally efficient alternative for HRTFs personalization.
CATEGORIES:HRTFS
LOCATION:IRCAM:Stravinsky\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:e59f6273ab95636dea241f5317cb8241
URL:http://aes2026avarigconference.sched.com/event/e59f6273ab95636dea241f5317cb8241
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260703T123000Z
DTEND:20260703T130000Z
SUMMARY:An investigation of AI integration in sound designer workflows and experiences.
DESCRIPTION:Artificial intelligence is increasingly being integrated into professional audio production workflows\, yet a gap persists between the tools developers produce and the requirements of practising sound designers. This paper investigates this gap through a mixed-methods study comprising a survey of 76 practitioners and follow-up semi-structured interviews with 20 industry professionals. Results were analysed using descriptive statistical analysis and thematic analysis to identify patterns across both datasets. Five themes emerged from our analysis: Context\, Workflow\, Potential\, Risks\, and Right Use. Our work indicates that current AI tools perform adequately in fast-consumption media contexts but lack the narrative sophistication required for high-end sound design (films\,immersive experiences etc). Practitioners demonstrate a preference for assistive\, task-specific applications\, particularly in audio restoration and library management\, over end-to-end generative systems. This work contributes to the on-going discussion on the use of AI and AI-enhanced tools in the creative industries. We report on the current status of the field from the point of view of sound designers and creative audio practitioners\, and offer a set of recommendation for sound technologist and developers based on our findings to guide the development of more informed AI tools for sound design.
CATEGORIES:AUTHORING
LOCATION:IRCAM:ESPRO (HOA)\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:4a2c56b26d7639baea8b5c0b3d828f9b
URL:http://aes2026avarigconference.sched.com/event/4a2c56b26d7639baea8b5c0b3d828f9b
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260703T123000Z
DTEND:20260703T130000Z
SUMMARY:The Influence of Binauralizer and HRTF Preprocessing on Objective Loudness in Ambisonics
DESCRIPTION:Accurate loudness estimation is essential for audio production\, quality control\, and loudness compliance\, but no established recommendation exists for binaural playback over headphones. This paper investigates the influence of binauralizers and HRTF processing on objective loudness estimation for binauralized Ambisonics content. Two experiments were conducted using 163 Ambisonics clips binauralized with two open-source renderers and three HRTF sets under three HRTF preprocessing conditions. Objective loudness metrics were compared against ground truth loudness data derived from 7.1+4 loudspeaker feeds according to ITU-R BS.1770. Results reveal small to moderate differences in Integrated Loudness and larger differences in the True Peak values between the evaluated binauralizers\, and that diffuse-field equalization can effectively eliminate loudness and True Peak differences across binauralizers and across sets of HRTFs. The findings can help to better predict and ensure loudness compliance in binauralized audio consumption in XR and gaming\, especially when importing 3rd-party HRTFs is supported.
CATEGORIES:HRTFS
LOCATION:IRCAM:Stravinsky\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:9cea94c92452de174875cec99870b020
URL:http://aes2026avarigconference.sched.com/event/9cea94c92452de174875cec99870b020
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260703T130000Z
DTEND:20260703T133000Z
SUMMARY:Audio Formgiving: Sound Zones as Spatial Structures in Mixed Reality
DESCRIPTION:Spatial audio in extended reality (XR) has traditionally been framed as a localization tool\, guiding users toward discrete virtual objects or events. This paper reframes this object-centered paradigm by presenting audio formgiving\, an approach in which sound defines continuous zones demarcated by boundaries that users encounter through embodied movement. We present a mixed-reality study that investigates how participants perceive\, reconstruct\, and navigate such sound zones. We report our findings on reconstruction accuracy and boundary ambiguities across different sound zone shapes and sizes\, and how movement trajectories relate to zone recognition\, as well as participants’ strategies for navigating and identifying different types of sound zones.
CATEGORIES:AUTHORING
LOCATION:IRCAM:ESPRO (HOA)\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:bff34ef4f543c5daeff3e75f89b64ffe
URL:http://aes2026avarigconference.sched.com/event/bff34ef4f543c5daeff3e75f89b64ffe
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260703T130000Z
DTEND:20260703T133000Z
SUMMARY:Direction-Dependent Ear Canal Transmission at High Frequencies: A Multi-Subject Study using 3D-Printed Replicas
DESCRIPTION:Head-Related Transfer Functions (HRTFs) are commonly measured at the blocked ear canal entrance\, assuming that the ear canal transfer function is direction-independent. While this assumption holds well at low and mid frequencies\, its validity at high frequencies has been questioned. A recent pilot study on a single pair of 3D-printed ear replicas found evidence of directional effects above 9 kHz\, but was limited in scope. This study extends that work using 3D-printed ear replicas of ten subjects from the IHA database\, mounted on a dummy head. Ear canal transfer functions were measured across a full spherical grid of 1944 incidence angles. Results reveal significant directional variability above 6–7 kHz\, with standard deviations of 6 –8 dB at resonant frequencies. High measurement repeatability confirms these are genuine directional effects rather than measurement artifacts. The directional behavior is consistently observed across all subjects and appears linked to the second and higher ear canal resonances. These findings suggest that the current state of the art blocked-canal HRTF measurements may omit spatially relevant spectral information above 7 kHz.
CATEGORIES:HRTFS
LOCATION:IRCAM:Stravinsky\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:5be3e17134e11cbec395c3d9896941c2
URL:http://aes2026avarigconference.sched.com/event/5be3e17134e11cbec395c3d9896941c2
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260703T133000Z
DTEND:20260703T140000Z
SUMMARY:An Open-Source Auracast Platform for Selective Listening in Assistive Hearing Applications
DESCRIPTION:Modern auditory rehabilitation faces significant cIRCAM:Galleryenges in speech discrimination within complex\, noisy acoustic environments. The use of Augmented Reality interfaces based on "virtual sound objects" proposes the separation and selective enhancement of audio sources\, while the Auracast standard (Bluetooth LE Audio) emerges as the ideal mechanism to distribute these independent streams with low latency. However\, the advancement of such selective listening strategies is strictly limited by proprietary commercial ecosystems and a complete lack of open-source research platforms that adhere to the physical and power constraints of wearable devices. To bridge this gap\, this work develops an open-source Auracast application on the Tiresias open-hardware platform\, establishing an accessible "front-end" infrastructure for auditory interaction. The architecture was implemented on the Nordic nRF5340 SoC utilizing the Zephyr RTOS. Preliminary evaluations on a development kit successfully validated the protocol stack integration\, demonstrating stream stability. Ongoing work focuses on porting the firmware to the Tiresias board and integrating the ADAU1787 audio codec\, aiming to empirically quantify the end-to-end latency and energy efficiency of the embedded system.
CATEGORIES:APPLICATION / GAMING
LOCATION:IRCAM:ESPRO (HOA)\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:36115e513b14400a00119fba7c50616c
URL:http://aes2026avarigconference.sched.com/event/36115e513b14400a00119fba7c50616c
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260703T133000Z
DTEND:20260703T140000Z
SUMMARY:A survey of HRTF dataset use in academia and industry reveals no de facto standard
DESCRIPTION:Head-related transfer functions (HRTFs) are crucial for plausible binaural audio playback for virtual\, augmented\, and mixed-reality applications. In such applications\, humans showed higher sound-localisation accuracy\, higher perceived externalisation\, and experience less colouration when using their individual HRTFs compared to non-individual HRTFs. Because high-quality individual HRTFs require cumbersome measurements in specialised facilities\, applications often use non-indivdual or dummy-head HRTFs as a practical alternative. Humans are able to adapt to non-individual HRTFs\, which leads to a localisation performance comparable to that achieved with individual HRTFs. Therefore\, adaptation to non-individual HRTFs could be a practical alternative whenever individual HRTFs are unavailable\; However\, this would only be possible if the same non-individual standard HRTF was used across different applications. To find out if this is the case\, we conducted a survey on HRTF usage among 76 professionals working in the field of spatial audio. The findings suggest that there is currently no de facto standard HRTF. Surprisingly\, only half of those with access to individual HRTFs are actually using them\, and most would be willing to switch to a default HRTF set if one was established.
CATEGORIES:HRTFS
LOCATION:IRCAM:Stravinsky\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:fbd5180c3d509f493cfcef1903716f45
URL:http://aes2026avarigconference.sched.com/event/fbd5180c3d509f493cfcef1903716f45
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260703T140000Z
DTEND:20260703T143000Z
SUMMARY:VR Unseen: An Audio-Haptic Data Framework for Accessible Virtual Storytelling for visually impaired audiences
DESCRIPTION:While Virtual Reality offers transformative potential for immersive storytelling\, the heavy reliance on visual stimuli often excludes Blind and Visually Impaired audiences. Conventional accessibility methods\, such as linear Audio Description\, frequently struggle to keep pace with the non-linear\, explorative nature of virtual environments\, resulting in an "accessibility chasm" where traditional two-dimensional solutions fail to support non-visual navigation. This research addresses these limitations through a User-Centred Design approach\, centred on the thematic analysis of semi-structured focus groups involving twelve experienced Blind and Visually Impaired videogame players from the Royal National Institute of Blind People. The inquiry explored four themes: spatial sound navigation\, audio description integration\, haptic efficacy\, and the social dimensions of virtual interfaces. Findings indicate that non-visual spatial exploration requires a multifaceted auditory system utilizing 3D-sound\, predictable sound effects\, and abstract sound signifiers\, paired with a hybrid audio description model balancing functional and affective narration. To mitigate the risk of cognitive overload\, participants identified haptic feedback as a critical tool for tactile confirmation and attentional guidance\, serving as a non-auditory anchor that complements the primary soundscape. These user-led insights and real life examples seen on accessible video games inform the development of the ‘Description Spheres’: interactive virtual objects embedded within virtual environments that serve as multi-sensory hubs. By integrating spatialized audio\, localized haptics\, and experimental audio description\, the system enables a transition to a dynamic\, exploratory model that translates complex visual-spatial data into intuitive\, non-visual sensory ecosystems\, offering a scalable blueprint for inclusive design.
CATEGORIES:APPLICATION / GAMING
LOCATION:IRCAM:ESPRO (HOA)\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:4405e35364006565422be74223924e91
URL:http://aes2026avarigconference.sched.com/event/4405e35364006565422be74223924e91
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260703T140000Z
DTEND:20260703T143000Z
SUMMARY:Evaluation of Head-Related Transfer Functions Across Five Levels of Individualisation in Virtual Reality
DESCRIPTION:Head-related transfer functions (HRTFs) underpin spatial hearing in virtual and augmented reality systems. Whilst individual HRTFs capture listener-specific morphology\, their practical limitations have led to widespread use of generic HRTFs and growing interest in synthetic approaches. Yet their relative perceptual impact remains rarely compared within a single study. In this study\, twenty listeners completed two virtual reality sound localisation experiments with complementary subsets of interleaved HRTF conditions enabling within-subject comParison of five conditions: individually measured\, KEMAR\, randomly selected non-individual measured\, high-resolution scan-based synthetic and photogrammetry-based synthetic HRTFs. Test–retest stability of the individually measured baseline across sessions supported pooling across experiments and attributing differences to perceptual rather than session effects. Across HRTF conditions\, lateral localisation metrics were largely insensitive to HRTF type\, whereas polar-domain metrics and confusion rates showed strong HRTF dependence. Random HRTFs outperformed KEMAR on several polar metrics. High-resolution synthetic HRTFs matched individual measured performance\, whilst photogrammetry-based synthetic HRTFs\, alongside KEMAR\, showed the greatest degradation. These findings clarify practical choices for non-individual baselines and highlight the importance of mesh resolution when using numerical synthesis for elevation-dependent localisation tasks.
CATEGORIES:HRTFS
LOCATION:IRCAM:Stravinsky\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:91f7c55228c9a64c4ef778adb92d03d9
URL:http://aes2026avarigconference.sched.com/event/91f7c55228c9a64c4ef778adb92d03d9
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260616T195836Z
DTSTART:20260703T143000Z
DTEND:20260703T150000Z
SUMMARY:Closing ceremony
DESCRIPTION:
CATEGORIES:SOCIAL EVENT
LOCATION:IRCAM:ESPRO (HOA)\, 1\, place Igor Stravinsky Paris 4e
SEQUENCE:0
UID:c8d5c821f23f58206408df442495462e
URL:http://aes2026avarigconference.sched.com/event/c8d5c821f23f58206408df442495462e
END:VEVENT
END:VCALENDAR