AES 2026 AVARIG Conference: Full Schedule

Schedule as of May 2026 - subject to change

Default Time Zone is EDT - Eastern Daylight Time

arrow_back View All Dates

10:00am CEST

Designing Spatial Audio Environments for Autonomic Regulation: Toward a Framework of Social Sonic Design

Wednesday July 1, 2026 10:00am - 10:30am CEST

Research in spatial audio has traditionally focused on localization accuracy, spatial realism, and rendering algorithms. Comparatively little work has examined how intentionally designed spatial audio environments may influence listener physiological regulation and emotional perception. This paper introduces the concept of Social Sonic Design, a framework that examines how spatially organized sonic information within and external to XR contexts may affect autonomic nervous system response and autobiographical memory. Spatial audio cues such as proximity, elevation, and diffuse reverberation influence listener perception of environmental stability and safety. Building on these perceptual principles, the present study investigates whether object-based audio environments incorporating temporally structured and personally meaningful sound materials can influence listener physiological state. Spatial audio infrastructures were constructed from lullabies, caregiving vocalizations, and environmental sonic textures. Postpartum mothers were selected as an initial participant group because caregiving sound environments and lullaby traditions play a central role in maternal–infant interaction and emotional regulation. Immersive sonic infrastructures were produced using spatial audio capture and design techniques including Dolby Atmos multichannel rendering (7.1.2) and binaural headphone reproduction. Sound sources were modified to activate three targeted neuro-cognitive nodes and spatially distributed across the listening environment to create immersive auditory scenes incorporating foreground vocal sources, diffuse environmental textures, and spatialized reverberant fields. Participants experienced these environments in brief episodic listening sessions accompanied by visual media. The episodic presentation structure draws inspiration from media models such as those developed by Miguel Sabido, in which repeated exposure to idealized sensory environments may influence perception and behavioral response over time. Physiological responses were monitored using measures associated with autonomic nervous system activity, including heart rate and heart rate variability (HRV), alongside reports of perceived calm, emotional response, and autobiographical memory recall. Preliminary observations suggest that the sonic infrastructure of immersive lullaby environments evokes caregiving memories and perceived emotional grounding among participants, indicating that spatially designed sonic environments may contribute to changes in listener perception and physiological regulation.

Speakers

Carolyn Malachi

Wednesday July 1, 2026 10:00am - 10:30am CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

Authoring, Lecture

10:30am CEST

Sound as Cultural Memory: Participatory Immersive Audio Production for the Witness Blanket VR Environment

Wednesday July 1, 2026 10:30am - 11:00am CEST

Jussieu:Conf 2 (Binaural)

This case study examines the Witness Blanket VR Experience to explore how Indigenous‑led immersive audio production can support the safeguarding of intangible cultural heritage in virtual environments. Grounded in Indigenous epistemologies of listening, the study draws on participatory sound collection, documentation of the audio production workflow, and subjective evaluation through community‑engaged events. Results demonstrate how spatial audio and culturally grounded production protocols can enable relational storytelling, ethical engagement, and protocol‑informed VR design.

Speakers

Kirk McNally

Carey Newman

Wednesday July 1, 2026 10:30am - 11:00am CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

Authoring, Lecture

11:00am CEST

Mémoire Vive: Exploring the use of authentic rendering as a narrative process in an AR fiction

Wednesday July 1, 2026 11:00am - 11:30am CEST

Jussieu:Conf 2 (Binaural)

This paper presents an exploratory mixed reality prototype investigating how low-cost spatial audio and XR technologies may already enable partially believable augmented reality experiences. Rather than pursuing maximal realism across all modalities, the project relied on selective auditory realism, intentionally degraded sounds and visuals, and progressive perceptual trust building in order to create plausible auditory events within a mixed reality environment. Participants were introduced to a fictional experimental protocol progressively constructing increasingly dense auditory reconstruction around them. At the end of the experience, all virtual reconstructions abruptly disappeared, leaving participants alone in the now silent physical room, revealing the extent to which virtual events had progressively contaminated their perception of reality. Qualitative observations suggest that coherent multimodal staging and expectation shaping play an important role in perceptual acceptance alongside rendering realism itself. Beyond the presented prototype, the project highlights how current XR and spatial audio tools already enable new forms of immersive narrative experiences based on persistent ambiguity between reality and virtual reconstruction.

Speakers

Raphaël Revault

David Poirier-Quinot

Wednesday July 1, 2026 11:00am - 11:30am CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

Authoring, Lecture

11:30am CEST

The Binaural Rendering Toolbox (BRT)

Wednesday July 1, 2026 11:30am - 12:30pm CEST

Jussieu:Conf 2 (Binaural)

This workshop introduces the Binaural Rendering Toolbox (BRT), a set of open-source (GPLv3) software libraries, applications, and definitions aimed as a virtual laboratory for spatial psychoacoustic experimentation. The BRT provides a flexible and modular framework for binaural spatialisation, supporting multiple rendering models, including convolution-based and geometric approaches, as well as advanced features such as source directivity, several room acoustics models, individual HRTFs, BRIRs, near-field simulation, and real-time control via OSC.

Speakers

Arcadio Reyes-Lecuona

Daniel Gonzalez-Toledo

Maria Cuevas-Rodriguez

Katarina Poole

Lorenzo Picinali

Wednesday July 1, 2026 11:30am - 12:30pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

Application / Gaming, Workshop

12:30pm CEST

Extending Audio Accessibility Toward Structured and Immersive Auditory Interaction in Gameplay

Wednesday July 1, 2026 12:30pm - 1:00pm CEST

Jussieu:Conf 2 (Binaural)

This work investigates how audio accessibility in games can be reconceptualized as a structured and immersive auditory interaction paradigm, rather than a collection of discrete assistive cues. While existing approaches have improved access to gameplay information for visually impaired players, they remain largely event-driven and fragmented, often presenting auditory signals as isolated notifications. Such approaches may limit perceptual continuity and fail to reflect the dynamic, layered nature of interactive environments. The proposed system introduces an auditory information architecture that organizes gameplay information into four continuous layers: navigation, interaction, salience, and environment. Each layer represents a distinct yet interrelated perceptual function. Navigation encodes spatial orientation and movement, interaction conveys player actions and system responses, environment reflects ambient and contextual information, and salience integrates perceptually relevant events—including hazards, state transitions, and attention-driven signals—into a unified and context-sensitive layer. By structuring auditory output across these layers, gameplay information is represented as continuously evolving auditory processes rather than discrete cues. The system is implemented using Unreal Engine and Audiokinetic Wwise, with Max/MSP and RNBO used to extend real-time audio processing. Rather than prioritizing novelty through fully generative synthesis, the approach focuses on transforming and reorganizing existing sound materials through continuous parameter mapping. This enables adaptive auditory behavior while maintaining perceptual clarity and consistency with the game’s sonic identity. Interaction design is extended through a multi-source input framework. A camera-based input layer combines webcam-based motion capture with analysis of screen-mediated interactions, including controller inputs (e.g., mouse, gamepad, touch) and player-character movement within the game environment. These inputs are translated into perceptual features and mapped to auditory parameters, forming a bidirectional interaction loop in which player behavior directly influences auditory output. A user study is planned to evaluate the effectiveness of the proposed system in non-visual navigation tasks. The study will compare the layered auditory architecture with conventional cue-based approaches. Evaluation metrics will include objective measures such as navigation accuracy and task completion time, as well as subjective measures including perceived spatial awareness, perceptual continuity, and immersion. In addition, the study will examine the impact of camera-based interaction on engagement and perceived agency. The evaluation is designed to investigate whether continuous auditory representation improves coherence between auditory feedback and gameplay experience. A central contribution of this work lies in the aesthetic integration of accessibility. Rather than functioning as an external assistive layer, accessibility-oriented audio is embedded within the core sound design. Informational signals emerge through transformations of existing sound materials, allowing perceptual clarity to be achieved without disrupting immersion. This reframes audio accessibility as an integral component of auditory interaction design. From a practical perspective, the system is structured as a modular and parameter-driven framework, allowing scalable implementation across different platforms. Potential constraints related to computational load—particularly in real-time processing and camera-based input—are considered, with an emphasis on efficient parameter mapping and system optimization for resource-limited environments such as mobile and virtual reality.

Speakers

Jiwon Kwak

Wednesday July 1, 2026 12:30pm - 1:00pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

Application / Gaming, Workshop

1:00pm CEST

Effect of the template on the predicted sagittal-plane sound-localization performance

Wednesday July 1, 2026 1:00pm - 1:30pm CEST

Jussieu:Conf 2 (Binaural)

In the state-of-the art models of sagittal-plane sound localization, template head-related transfer functions (HRTFs) are used to reflect the listener's internal calibration of auditory space decoding, and thus determine the prediction quality. The effect of the template HRTFs has not yet been investigated directly. Here, a model was calibrated separately to two HRTF measurements of the same listeners and its predictions were compared to behavioral localization responses of these listeners obtained in three listening conditions: two acoustically measured HRTF sets (those used for model calibration), and an additional condition (unseen during the calibration) used to test the model's ability to generalize. We analyzed the quadrant error rates (QE) and local polar errors (PEs) from eight listeners. The predicted errors were similar in both calibration conditions and increased in the unseen condition. The quality of the predictions, however, varied significantly with the template, more for PE than for QE, slightly preferring one template over the other when predicting the unseen condition. Our findings suggest that small differences in HRTFs used for the template may influence the prediction quality, especially when applied to unseen listening conditions.

Speakers

Felix Perfler

Piotr Majdak

Wednesday July 1, 2026 1:00pm - 1:30pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

Binaural, Lecture

1:30pm CEST

Eliciting the Effectiveness of Binaural Renderer Enhancement on the Horizontal Plane

Wednesday July 1, 2026 1:30pm - 2:00pm CEST

Jussieu:Conf 2 (Binaural)

Binaural rendering is central to spatial audio reproduction via headphones and wearable devices, yet systematic evaluation of enhancement techniques remains methodologically inconsistent across the literature. This paper presents and applies a subjective evaluation methodology designed to consistently elicit five perceptual attributes of headphone-based spatial audio: externalization, elevation fidelity, front/back distinction, room tone, and spectral coloration. The methodology combines absolute judgment and relative comParison protocols, taking advantage of their complementary capabilities to capture both the absolute perceptual quality of individual rendering conditions, and the salience of perceived differences between them. It is applied in a controlled experiment comparing conventional HRTF-based binaural rendering against two enhancement variants that each superpose a masked spatially diffuse sound field component into the binaural output. Stimuli were spatialized across eleven azimuthal positions in the horizontal plane using a generic dummy-head dataset, and presented over closed-back headphones to participants. The results validate the proposed methodology as a tool for revealing perceptually relevant differences among binaural rendering conditions. Relative comParison tests reveal additional performance difference details between rendering methods. Both enhancement methods significantly improve externalization and front/back distinction relative to unenhanced binaural rendering, with the largest gains at lateral azimuths, and without a statistically significant increase in perceived spectral coloration beyond the baseline effect of HRTF filtering. However, the results do not indicate conclusively whether the binaural rendering methods examined here exhibit or mitigate a "spurious elevation" artifact associated with frontal sound presentation.

Speakers

Garrett Treanor

Shaunak Ranade

Jean-Marc Jot

Daphna Harel

Agnieszka Roginska

Wednesday July 1, 2026 1:30pm - 2:00pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

Binaural, Lecture

2:00pm CEST

How representative are Dummy head HRTFs? A subjective comParison of mannequin and human datasets

Wednesday July 1, 2026 2:00pm - 2:30pm CEST

Jussieu:Conf 2 (Binaural)

Head-related transfer functions (HRTFs) are central to convincing binaural rendering in virtual and augmented reality applications. While individual HRTFs offer the highest perceptual fidelity, the practical difficulty of personal HRTF acquisition drives widespread use of dummy head (mannequin) measurements as a non-individualized substitute. Despite their ubiquity, systematic perceptual benchmarking of dummy head HRTFs against human HRTFs remains limited, particularly with respect to whether consistent trends emerge across listeners irrespective of individual HRTF preference. This study extends prior work on subjective HRTF evaluation methodologies and perceptual ranking by applying an established trajectory-based quality assessment paradigm to a mixed set of dummy head and human HRTFs. Participants were presented with predefined auditory trajectories rendered via binaural synthesis and asked to rate the perceptual quality of each rendering with respect to adherence to the prescribed trajectory. HRTFs were presented in randomised order across two sets of eight, with repeated items serving as an inter-set normalisation anchors. The HRTF pool encompassed human measurements alongside a range of dummy head types: simplified head-only geometries, head-and-torso simulators (HATS), and models incorporating absorptive materials (hair, clothing analogues). The primary research question is whether, despite well-documented listener-dependent variability in HRTF suitability, population-level trends differentiate dummy head HRTFs from human ones, and further, whether acoustic complexity of the mannequin (torso, absorptive surfaces) correlates with perceptual performance. Results are discussed in terms of implications for HRTF database design and substitute HRTF selection strategies for immersive audio applications.

Speakers

Brian F.G. Katz

Samuel D. Bellows

Wednesday July 1, 2026 2:00pm - 2:30pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

HRTFs, Lecture

2:30pm CEST

Understanding Ambisonics Through Practical Decisions and Listening

Wednesday July 1, 2026 2:30pm - 3:30pm CEST

Jussieu:Conf 2 (Binaural)

Ambisonics is a scene-based spatial audio format that has been around since the 1970s. In recent years its popularity has increased, with inclusion in game engines (such as Unity and Unreal) and distribution standards (like ADM or IAMF). Despite this, many practitioners view Ambisonics as being complex, mathematical, or academic. This tutorial explains Ambisonics through the lens of practical decision-making. Instead of equations, it covers the choices audio professionals are required to make when working on a project, with a particular emphasis on the audible consequences of those choices. The tutorial enables attendees to develop an intuitive understanding of Ambisonics through explanations of theory, combined with listening examples and workflow demonstrations. The topics covered in this tutorial are: • Fundamentals: What Ambisonics is and how it differs from channel-based and object-based formats, and why it is well suited to VR, AR, and game audio. • Encoding: How audio sources are converted to Ambisonics, and how to choose the Ambisonic order based on perceptual and computational trade-offs, as well as delivery constraints. • Conventions: Common channel ordering (ACN vs FuMa) and gain normalisation (SN3D vs N3D) conventions, and what happens when things get mismatched. • Processing: What kinds of effects can be used on Ambisonic signals while preserving the spatial integrity. • Decoding and binaural rendering: How Ambisonic signals are converted to loudspeaker or binaural signals. The impact of head-tracking and HRTF selection on the binaural rendering. • Mixed-order projects: What the options are when working with mixed-order sources, and the audible artefacts that can arise. The tutorial will provide brief practical demonstrations of setting up an Ambisonics project in Pro Tools and Reaper, two widely used DAWs for immersive audio. By the end of the tutorial attendees will have a practical understanding of the main concepts of Ambisonics, as well as knowing how the practical choices they make will impact the final audio. They will also be familiar with the main workflow pitfalls and how to avoid them. The tutorial assumes familiarity with general audio production concepts (DAW use, signal routing, mixing). However, no prior experience with Ambisonics or spatial audio formats is required. It is suitable for sound designers, composers, and audio engineers working in or interested in immersive media.

Speakers

Peter Stitt

Wednesday July 1, 2026 2:30pm - 3:30pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

HOA, Tutorial

3:30pm CEST

Rendering 6DoF Audio in Augmented Reality: Perceptual Evaluation of Game Engines, Plugins, and Middleware

Wednesday July 1, 2026 3:30pm - 4:00pm CEST

Jussieu:Conf 2 (Binaural)

Contemporary game engines such as Unreal Engine and Unity are widely used for Extended Reality (XR), yet their native audio pipelines often rely on simplified spatialization with limited acoustic control. In Augmented Reality (AR), virtual sound sources must integrate coherently with the physical environment to maintain perceptual plausibility, making accurate Six-Degrees-of-Freedom (6DoF) rendering critical. This study carried out a perceptual evaluation of multiple 6DoF audio rendering approaches, including Audiokinetic Wwise (Reflect and RoomVerb), Steam Audio, Meta XR Audio SDK, a dense 6DoF Room Impulse Response (RIR) interpolation method, and APLVirtuoso XR. A measured physical room was reconstructed in Unreal Engine 5, and the rendering pipelines were calibrated by matching the reverberation time (RT60) and direct-to-reverberant ratio (DRR) to the measured room within their respective just noticeable difference (JND) thresholds. The results showed significant differences in perceived spatial and timbral fidelity as well as plausibility and overall listening experience across the tested renderers. Despite the high accuracy achieved by the 6DoF interpolation method, an algorithmic renderer demonstrated comparable or superior performance. However, some other algorithmic renderers exhibited tradeoffs in terms of computational overhead and acoustic modelling accuracy. Our findings indicate that an optimised system prioritising a plausible auditory representation, rather than strict physical replication, may be sufficient and, in some cases, can yield superior perceptual outcomes.

Speakers

Kush Munjal

Hyunkook Lee

Dale Johnson

Wednesday July 1, 2026 3:30pm - 4:00pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

Plausibility, Lecture

4:00pm CEST

Audiovisual Coherence Thresholds for Direct-to-Reverberant Energy Ratio

Wednesday July 1, 2026 4:00pm - 4:30pm CEST

Jussieu:Conf 2 (Binaural)

Holographic calling systems aim to create the perceptual illusion that a remote interlocutor is physically present by combining augmented reality (AR) visualizations with spatial audio rendering. A key cIRCAM:Galleryenge in such systems is achieving audiovisual coherence when room-acoustic properties must be inferred, since this can lead to greater inaccuracies than when they are estimated from measurements. In this study, we investigate perceptual tolerance to mismatches in the direct-to-reverberant energy ratio (DRR), a critical cue for auditory distance perception. We conducted a listening experiment in which participants judged whether the audio and the visual presentation of a stimulus were coherent using a yes/no task. Stimuli included both a volumetric capture of a speaking human avatar, rendered at the correct direct sound level, and a loudspeaker reproducing wideband noise. For the loudspeaker, level roving was introduced to assess the influence of intensity cues on listener decisions. Results show that audiovisual coherence is maintained for half of the presentations within a DRR mismatch range of approximately 3 dB for too dry and 4.3 dB for too reverberant renderings. Within the limited number of participants included in the study, no evidence for significant differences was found between speaking avatars and loudspeakers reproducing noise, nor between different ranges of level roving in the loudspeaker condition. Nevertheless, the findings help to understand the consequences of DRR estimation mismatches for holographic AR calling experiences.

Speakers

Peter Lindahl

Johannes M. Arend

Nils Meyer-Kahlen

Wednesday July 1, 2026 4:00pm - 4:30pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

Plausibility, Lecture

4:30pm CEST

Assessing the Plausibility of Measurement-Based Auralization of Sound Transmission through Walls

Wednesday July 1, 2026 4:30pm - 5:00pm CEST

Jussieu:Conf 2 (Binaural)

In building acoustics, modeling and auralizing sound transmission through walls is a relevant area of research related to residential well-being and workplace productivity. While most existing studies use auralization methods that focus on accurately reproducing spectrum and amplitude of the sound transmission in accordance with ISO standards, almost no studies have explicitly tried to evaluate the perceptual quality of wall transmission auralization. To address this, the present study applies the plausibility paradigm to evaluate a measurement-based auralization approach to validate it for further psychoacoustic experiments on residential well-being. Binaural Room Impulse Responses (BRIRs) were measured for three loudspeakers placed in adjacent rooms to a central listening room. In a subsequent listening test, participants were asked to evaluate the overall plausibility of the auralization. Results demonstrate that due to the absence of visual cues, the lower sound pressure level, the reduced signal-to-noise ratio, and the diffuse radiation characteristic of the source, high plausibility scores close to the guessing rate were achieved for all adjacent rooms. These findings suggest that due to the lack of a visual cue, lower sound pressure level, reduced signal to noise ratio and the diffuse radiation characteristics of the source, auralization of wall transmission using BRIRs can be used as a plausible method for psychoacoustic research on residential well-being, without the need for complex physical simulations.

Speakers

Hendrik Himmelein

Florian Köber

Christoph Pörschmann

Wednesday July 1, 2026 4:30pm - 5:00pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

Plausibility, Lecture

5:00pm CEST

The quest for decorrelation continues -- from stereo to immersive

Wednesday July 1, 2026 5:00pm - 6:00pm CEST

Jussieu:Conf 2 (Binaural)

Using pitch, delay and modulation effects to perceptually spread a mono source across additional, adjacent playback channels is a staple of music production, born in stereo and extended to immersive. This tutorial begins with the historical origins of the effect in the mid 1970s, showing the signal processing chains used, measuring its impact on the signal, discussing its psychoacoustic merit, and demonstrating the resulting sound. The evolution from stereo through surround sound to immersive formats using contemporary production tools and techniques is demonstrated. The effect is still evolving as new tools are developed and creators explore what is possible across all immersive artforms – music, AR/VR, and games. It is hoped a deep dive into the first 50 years of this effect will inform and inspire immersive mixers for the future.

Speakers

Alex Case

Wednesday July 1, 2026 5:00pm - 6:00pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

Application / Gaming, Tutorial

10:00am CEST

10:30am CEST

11:00am CEST

11:30am CEST

12:30pm CEST

1:00pm CEST

1:30pm CEST

2:00pm CEST

2:30pm CEST

3:30pm CEST

4:00pm CEST

4:30pm CEST

5:00pm CEST

Get help with the event