AES 2026 AVARIG Conference

AES 2026 AVARIG Conference

Schedule as of May 2026 - subject to change

Default Time Zone is EDT - Eastern Daylight Time

12:00pm CEST

Eclipsa Audio: Bringing Immersive Audio to Everyone

Tuesday June 30, 2026 12:00pm - 1:00pm CEST

Jussieu:Conf 2 (Binaural)

Eclipsa Audio, based on the Immersive Audio Model and Format (IAMF) specification developed by members of the Alliance for Open Media, represents an open and royalty-free approach to immersive audio creation and delivery. Eclipsa Audio provides a growing ecosystem for producing and distributing spatial audio content, with hardware integration and streaming platform support, including YouTube, actively being rolled out. This panel brings together practitioners, researchers, and engineers directly involved in the development of IAMF and Eclipsa Audio to inform the audio engineering community about the current state of the format and its evolving toolkit for immersive audio production and delivery. The presenters will discuss how the Eclipsa Audio ecosystem can continue growing in the live and interactive realms, including 360 videos, streaming, gaming and the combination of both, eg. in e-sports. Future directions for development will also include developers' perspective on how Eclipsa Audio can be embraced by interactive environments.

Speakers

Katarzyna Sochaczewska

Gavin Kearney

Rémi Audfray

Jan Skoglund

Nils Peters

Tuesday June 30, 2026 12:00pm - 1:00pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

Application / Gaming, Workshop

1:00pm CEST

Field-of-View Informed Binaural Signal Matching for Head-Worn Arrays

Tuesday June 30, 2026 1:00pm - 1:30pm CEST

Jussieu:Conf 2 (Binaural)

Capturing acoustic scenes with head-worn microphone arrays is cIRCAM:Galleryenging due to a limited number of sensors and constrained placement flexibility. Nevertheless, binaural reproduction based on these arrays has been recently proposed using binaural signal matching (BSM), showing high robustness and computational efficiency, but inferior performance compared to the more computationally complex signal-dependent methods, in particular at low reverberation conditions. To address this gap, this paper investigates the Field-of-View informed Binaural Signal Matching (FoVi-BSM) method for far-field sources. FoVi-BSM incorporates a diagonal spatial weight matrix directly into the error formulation, redefining the filter solution to prioritize spectral and spatial fidelity within a predefined FoV, showing performance comparable to signal dependent methods but with the same computation complexity as BSM. The performance of the method is evaluated through objective single-source anechoic simulations, multi-source reverberant Monte Carlo simulations, and a subjective MUSHRA listening test. Results demonstrate that prioritizing the FoV improves rendering accuracy within the targeted region over the standard baseline BSM method, achieving perceptual quality comparable to signal-dependent parametric methods while maintaining baseline-level performance outside the FoV.

Speakers

Or Berebi

Zamir Ben-Hur

David Lou Alon

Boaz Rafaely

Tuesday June 30, 2026 1:00pm - 1:30pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

MicArray / HeadWorn, Lecture

1:30pm CEST

Interpolation of Sparsely Sampled Array Transfer Functions for Head-Worn Microphone Arrays

Tuesday June 30, 2026 1:30pm - 2:00pm CEST

Jussieu:Conf 2 (Binaural)

Integrating microphone arrays into head-worn devices, such as augmented reality (AR) and virtual reality (VR) headsets, as well as hearing aids, has become increasingly popular for capturing and reproducing acoustic scenes. A common requirement in many such systems is a dense set of array transfer functions~(ATFs). However, dense ATFs are cumbersome to measure, and practical setups commonly yield sparse grids rather than the uniform dense sampling often required. This motivates the use of interpolation to reconstruct dense ATF sets from sparse measurements. This paper evaluates spherical harmonics and natural-neighbor interpolation, each combined with onset-based time-alignment and post-interpolation magnitude correction, for a head-worn array across sampling densities. To examine how interpolation errors propagate to binaural rendering, the interpolated ATFs are substituted into two recent filter design methods: signal-independent binaural signal matching (BSM) and a signal-dependent method combining COMPASS parametric spatial coding with BSM (COM). Results show that BSM remains largely robust to interpolation errors, while COM substantially degrades under sparse sampling conditions with errors comparable to BSM at the lowest density, but achieves considerably lower errors than BSM as the sampling grid density increases. This is because BSM averages errors across all steering directions, while COM relies on individual steering vectors for source-directed beamforming.

Speakers

Or Berebi

Konstantin Fontaine

Boaz Rafaely

Johannes M. Arend

Tuesday June 30, 2026 1:30pm - 2:00pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

MicArray / HeadWorn, Lecture

2:00pm CEST

A Baffleless Equatorial Ambisonic Microphone Array of Arbitrary Order

Tuesday June 30, 2026 2:00pm - 2:30pm CEST

Jussieu:Conf 2 (Binaural)

We propose a baffleless circular array of radially outward facing cardioid microphones that produces standard ambisonic signals. The array produces an $N$th-order ambisonic signal from $2N+1$ microphones. It can be seen as a baffleless variant of the previously proposed equatorial microphone array, which uses a rigid spherical baffle. The simplicity of the microphone arrangement comes at the price of not being able to extract certain spatial information from the captured sound field. The ambisonic output signal represents a horizontal projection of the captured sound field. We demonstrate that interaural elevation cues are maintained when binaural rendering is performed despite the horizontal projection.

Speakers

Jens Ahrens

Tuesday June 30, 2026 2:00pm - 2:30pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

MicArray / HeadWorn, Lecture

2:30pm CEST

Time, Gender and Auditory Envelopment in Reproduced Sound: A Review

Tuesday June 30, 2026 2:30pm - 3:30pm CEST

Jussieu:Conf 2 (Binaural)

We summarise recent clinical studies on neurosensory differences between humans across age and gender, in particular related to Auditory Envelopment and reproduced sound. In audio engineering, listening quality is generally explained and tested considering just snapshot (frequency domain) metrics: Frequency response, sound pressure level, distortion and direction of sound. However, the two elusive dimensions, time and change, play the most important roles. Not only in hearing, but in all our five primary senses. Human sensory physiology is summarised, along with recent studies on a particular time-domain dimension, Auditory Envelopment (AEV). Naive and professional subjects of any age and across genders, point to AEV as a coherent and universal inducer of emotion in sound reproduction. With a conference in Paris attendees are furthermore able to experience AEV at one of most conducive places for it to emerge naturally: Inside the newly renovated Notre Dame cathedral. Using the quadrant model from Nordic universities, it is discussed if audio engineering, like medicine and other “objective” sciences, may have been tending primarily to adult male needs and preferences. Our literature is abundant with investigation of frequency-domain attributes such as power, punch and mobilisation; but ignores physiological and mental effects of the time-domain modulation that happens on basically any time-scale, as we listen. It appears stonemasons centuries ago knew important things about humans and hearing that would have been lost by now, had it not been for their magnificent monuments. Because of those efforts, however, we still have guides to what sound might achieve on new platforms, when we remember to listen slowly.

Speakers

Thomas Lund

Tuesday June 30, 2026 2:30pm - 3:30pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

Perception, Tutorial

3:30pm CEST

The M480: High-Resolution Spatial Audio Capture Using a 7th-Order Ambisonic Microphone

Tuesday June 30, 2026 3:30pm - 4:00pm CEST

Jussieu:Conf 2 (Binaural)

The Brittany-based consortium HDaudio3D (LabSTICC, Brest; Noise Makers, Rennes; and Feichter Audio, Lannion) has been working for five years on the development of a professional 3D audio recording solution: one offering sufficient spatial accuracy and signal-to-noise ratio, compatibility with headphones and speakers, and the ability to manipulate the audio scene in post-production. The solution is now operational and includes a 7th-order ambisonic sensor (with 480 MEMS on the surface of a 16 cm icosahedral tetrahedron), signal processing and a software suite. Pascal Rueff, a sound engineer specialising in binaural sound and François Salmon, research engineer at Noise Makers, members of the HDaudio3D consortium, will present the system and play excerpts from recordings made for various use cases.

Speakers

Pascal Rueff

François Salmon

Tuesday June 30, 2026 3:30pm - 4:00pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

MicArray / HeadWorn, Workshop

4:00pm CEST

Adapting to Manipulated Acoustic Distance Laws

Tuesday June 30, 2026 4:00pm - 4:30pm CEST

Jussieu:Conf 2 (Binaural)

This study investigates whether humans can adapt to manipulated auditory distance cues in virtual environments. While adaptation to remapped auditory localization cues is well established, it remains unclear whether similar processes apply to distance perception, particularly when natural acoustic cues are systematically modified. Virtual reality (VR) systems often employ non-ecological distance laws to improve the intelligibility of distant sound sources, which can introduce conflicts between auditory and visual information. To examine perceptual adaptation under such conditions, we modified a binaural real-time room acoustic simulation engine with rendering in six-degrees-of-freedom. The manipulation consisted of holding the direct sound level constant across distance while applying a distance-dependent low-pass filter. Over four consecutive days, participants completed a training protocol combining alternating testing phases with gamified training sessions. Results show that participants successfully adapted to the altered distance cues, with most learning occurring within the first two days. Initial exposure to the manipulation severely disrupted distance perception, rendering participants unable to make reliable judgments. However, following training, perceived distance functions approached veridical performance for far distances, exhibiting slopes close to unity. In contrast, judgments at close distances remained highly variable, suggesting that the available spectral cues were insufficient for accurate estimation in this range. These findings demonstrate that auditory distance perception can be recalibrated through short-term perceptual learning, even when initial perceptual mappings are strongly degraded. Adaptation generalizes across contexts within the virtual environment, although limitations persist for near-field perception.

Speakers

Christian Scheer

Fabian Brinkmann

Lucas Schmitz

Markus von Berg

Stefan Weinzierl

Tuesday June 30, 2026 4:00pm - 4:30pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

Auralization / 6DoF, Lecture

4:30pm CEST

Real-Time Spatial Auralization for Collaborative Architectural Design in Virtual Reality

Tuesday June 30, 2026 4:30pm - 5:00pm CEST

Jussieu:Conf 2 (Binaural)

Acoustic performance remains insufficiently addressed in early-stage architectural design, where visual and spatial considerations typically guide decision-making. The present research examines the integration of real-time spatial auralization into multi-user virtual reality environments to facilitate collaborative evaluation of architectural acoustic performance during design exploration. A multi-user VR framework has been developed that embeds real-time binaural auralization within a collaborative design evaluation context. The framework supports shared inhabitation of virtual models, auditory exploration of spatialized sound sources, and systematic investigation of how spatial configurations and material properties affect perceived acoustic characteristics. Temporal soundscapes and acoustic implications of alternative materials and spatial arrangements can be evaluated through a gesture-driven interface. Spatialized voice communication further supports co-inhabitation of virtual environments and acoustically informed discussion. By embedding real-time auralization within collaborative virtual environments, the framework repositions acoustic performance as an experiential dimension of architectural design, enabling earlier incorporation of auditory considerations into the design process. An exploratory user study indicates that immersive, real-time auralization can integrate acoustic feedback into collaborative architectural design workflows, supporting multisensory evaluation and reducing dependence on late-stage corrective interventions.

Speakers

Achilleas Xydis

Giacomo Montiani

Chao Chia-Hsuan

Fabio Scotto

Tuesday June 30, 2026 4:30pm - 5:00pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

Auralization / 6DoF, Lecture

5:00pm CEST

Comparing Three Movement Simulation Algorithms from Discrete Impulse Responses

Tuesday June 30, 2026 5:00pm - 5:30pm CEST

Jussieu:Conf 2 (Binaural)

Reproducing the acoustic consequences of source or receiver motion from a set of discrete static room impulse responses (RIRs) is a fundamental cIRCAM:Galleryenge in spatial audio processing, with direct relevance to the generation of training and evaluation data for machine learning systems operating on reverberant speech and audio. This paper presents a comparative evaluation of three offline algorithms for acoustic motion simulation, two of which are ports of previously published methods and one of which is an independent implementation conceptually related to prior work. All three methods are evaluated against a common reference using two complementary validation approaches: an objective analysis based on interaural time difference (ITD) estimation from synthetic binaural signals, and a perceptual evaluation conducted under the MUSHRA protocol using stimuli drawn from a controlled moving-receiver database. Results indicate that frequency-domain interpolation between neighbouring impulse responses provides the most accurate binaural cue reproduction and the highest perceptual similarity to the reference under spectrally demanding stimuli, while nearest-neighbour switching produces the most pronounced artefacts under broadband excitation. Time-domain crossfading between fully convolved signals yields intermediate performance, achieving parity with frequency-domain interpolation for speech and noise but falling significantly behind for music. The combination of ITD-based objective analysis and MUSHRA perceptual evaluation proved informative in characterising method differences, and the two measures converged on a consistent performance ordering across methods.

Speakers

Sarabeth S. Mullins

Hermes Sampedro Llopis

Tuesday June 30, 2026 5:00pm - 5:30pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

Auralization / 6DoF, Lecture

5:30pm CEST

Immersive recording with a virtual 3D microphone array using spatial information of virtual sound sources sampled in a target space.

Tuesday June 30, 2026 5:30pm - 6:00pm CEST

Jussieu:Conf 2 (Binaural)

We have proposed a method called V2MA (VSVerb Virtual Microphone Array). This method virtually generates spatial room impulse responses (SRIRs) captured by a conventional microphone array using only a set of four impulse responses (IRs) measured by an A-format microphone in the target space. V2MA is based on the concept of geometrical acoustics, that involves a virtual sound source, also known as a mirror source. After measuring the four IRs using an A-format microphone, we calculate the instantaneous sound intensities in the x, y, and z directions. The “source intensities,” that contain sound source information, are detected from these sound intensities. Then, we estimate the locations, strengths, and phase characteristics of the sound sources. The spatial properties of the obtained virtual sound sources can be considered the fingerprint of a target space's reverberant characteristics. Using the manner of geometrical acoustics, we can update the spatial properties of the virtual sound sources to match a neighboring receiver position and a desired directivity. Lastly, we can obtain the SRIRs at any receiver (microphone) position with any directivity in the target room by translating the spatial information of the updated virtual sound sources into time responses. For immersive recording, large microphone arrays are often used. Using V2MA, we can virtually make an immersive recording using such a virtual microphone array, provided that we measure four IRs using an A-format microphone in the target space. In our previous study, we developed the framework of V2MA. To verify the plausibility of V2MA, this paper compares the responses of virtual (V2MA) and real (conventional) microphone arrays using measurement results collected in a practice IRCAM:Gallery under the various conditions. The results show similar overall characteristics, but also suggest the difficulty of a detailed evaluation. We also introduce practical examples of immersive recording using V2MA.

Speakers

Masataka Nakahara

Toru Kamekawa

Yasuhiko Nagatomo

Akira Omoto

Tuesday June 30, 2026 5:30pm - 6:00pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

Auralization / 6DoF, Lecture

10:00am CEST

Designing Spatial Audio Environments for Autonomic Regulation: Toward a Framework of Social Sonic Design

Wednesday July 1, 2026 10:00am - 10:30am CEST

Jussieu:Conf 2 (Binaural)

Research in spatial audio has traditionally focused on localization accuracy, spatial realism, and rendering algorithms. Comparatively little work has examined how intentionally designed spatial audio environments may influence listener physiological regulation and emotional perception. This paper introduces the concept of Social Sonic Design, a framework that examines how spatially organized sonic information within and external to XR contexts may affect autonomic nervous system response and autobiographical memory. Spatial audio cues such as proximity, elevation, and diffuse reverberation influence listener perception of environmental stability and safety. Building on these perceptual principles, the present study investigates whether object-based audio environments incorporating temporally structured and personally meaningful sound materials can influence listener physiological state. Spatial audio infrastructures were constructed from lullabies, caregiving vocalizations, and environmental sonic textures. Postpartum mothers were selected as an initial participant group because caregiving sound environments and lullaby traditions play a central role in maternal–infant interaction and emotional regulation. Immersive sonic infrastructures were produced using spatial audio capture and design techniques including Dolby Atmos multichannel rendering (7.1.2) and binaural headphone reproduction. Sound sources were modified to activate three targeted neuro-cognitive nodes and spatially distributed across the listening environment to create immersive auditory scenes incorporating foreground vocal sources, diffuse environmental textures, and spatialized reverberant fields. Participants experienced these environments in brief episodic listening sessions accompanied by visual media. The episodic presentation structure draws inspiration from media models such as those developed by Miguel Sabido, in which repeated exposure to idealized sensory environments may influence perception and behavioral response over time. Physiological responses were monitored using measures associated with autonomic nervous system activity, including heart rate and heart rate variability (HRV), alongside reports of perceived calm, emotional response, and autobiographical memory recall. Preliminary observations suggest that the sonic infrastructure of immersive lullaby environments evokes caregiving memories and perceived emotional grounding among participants, indicating that spatially designed sonic environments may contribute to changes in listener perception and physiological regulation.

Speakers

Carolyn Malachi

Wednesday July 1, 2026 10:00am - 10:30am CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

Authoring, Lecture

10:30am CEST

Sound as Cultural Memory: Participatory Immersive Audio Production for the Witness Blanket VR Environment

Wednesday July 1, 2026 10:30am - 11:00am CEST

Jussieu:Conf 2 (Binaural)

This case study examines the Witness Blanket VR Experience to explore how Indigenous‑led immersive audio production can support the safeguarding of intangible cultural heritage in virtual environments. Grounded in Indigenous epistemologies of listening, the study draws on participatory sound collection, documentation of the audio production workflow, and subjective evaluation through community‑engaged events. Results demonstrate how spatial audio and culturally grounded production protocols can enable relational storytelling, ethical engagement, and protocol‑informed VR design.

Speakers

Kirk McNally

Carey Newman

Wednesday July 1, 2026 10:30am - 11:00am CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

Authoring, Lecture

11:00am CEST

Mémoire Vive: Exploring the use of authentic rendering as a narrative process in an AR fiction

Wednesday July 1, 2026 11:00am - 11:30am CEST

Jussieu:Conf 2 (Binaural)

This paper presents an exploratory mixed reality prototype investigating how low-cost spatial audio and XR technologies may already enable partially believable augmented reality experiences. Rather than pursuing maximal realism across all modalities, the project relied on selective auditory realism, intentionally degraded sounds and visuals, and progressive perceptual trust building in order to create plausible auditory events within a mixed reality environment. Participants were introduced to a fictional experimental protocol progressively constructing increasingly dense auditory reconstruction around them. At the end of the experience, all virtual reconstructions abruptly disappeared, leaving participants alone in the now silent physical room, revealing the extent to which virtual events had progressively contaminated their perception of reality. Qualitative observations suggest that coherent multimodal staging and expectation shaping play an important role in perceptual acceptance alongside rendering realism itself. Beyond the presented prototype, the project highlights how current XR and spatial audio tools already enable new forms of immersive narrative experiences based on persistent ambiguity between reality and virtual reconstruction.

Speakers

Raphaël Revault

David Poirier-Quinot

Wednesday July 1, 2026 11:00am - 11:30am CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

Authoring, Lecture

11:30am CEST

The Binaural Rendering Toolbox (BRT)

Wednesday July 1, 2026 11:30am - 12:30pm CEST

Jussieu:Conf 2 (Binaural)

This workshop introduces the Binaural Rendering Toolbox (BRT), a set of open-source (GPLv3) software libraries, applications, and definitions aimed as a virtual laboratory for spatial psychoacoustic experimentation. The BRT provides a flexible and modular framework for binaural spatialisation, supporting multiple rendering models, including convolution-based and geometric approaches, as well as advanced features such as source directivity, several room acoustics models, individual HRTFs, BRIRs, near-field simulation, and real-time control via OSC.

Speakers

Arcadio Reyes-Lecuona

Daniel Gonzalez-Toledo

Maria Cuevas-Rodriguez

Katarina Poole

Lorenzo Picinali

Wednesday July 1, 2026 11:30am - 12:30pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

Application / Gaming, Workshop

12:30pm CEST

Extending Audio Accessibility Toward Structured and Immersive Auditory Interaction in Gameplay

Wednesday July 1, 2026 12:30pm - 1:00pm CEST

Jussieu:Conf 2 (Binaural)

This work investigates how audio accessibility in games can be reconceptualized as a structured and immersive auditory interaction paradigm, rather than a collection of discrete assistive cues. While existing approaches have improved access to gameplay information for visually impaired players, they remain largely event-driven and fragmented, often presenting auditory signals as isolated notifications. Such approaches may limit perceptual continuity and fail to reflect the dynamic, layered nature of interactive environments. The proposed system introduces an auditory information architecture that organizes gameplay information into four continuous layers: navigation, interaction, salience, and environment. Each layer represents a distinct yet interrelated perceptual function. Navigation encodes spatial orientation and movement, interaction conveys player actions and system responses, environment reflects ambient and contextual information, and salience integrates perceptually relevant events—including hazards, state transitions, and attention-driven signals—into a unified and context-sensitive layer. By structuring auditory output across these layers, gameplay information is represented as continuously evolving auditory processes rather than discrete cues. The system is implemented using Unreal Engine and Audiokinetic Wwise, with Max/MSP and RNBO used to extend real-time audio processing. Rather than prioritizing novelty through fully generative synthesis, the approach focuses on transforming and reorganizing existing sound materials through continuous parameter mapping. This enables adaptive auditory behavior while maintaining perceptual clarity and consistency with the game’s sonic identity. Interaction design is extended through a multi-source input framework. A camera-based input layer combines webcam-based motion capture with analysis of screen-mediated interactions, including controller inputs (e.g., mouse, gamepad, touch) and player-character movement within the game environment. These inputs are translated into perceptual features and mapped to auditory parameters, forming a bidirectional interaction loop in which player behavior directly influences auditory output. A user study is planned to evaluate the effectiveness of the proposed system in non-visual navigation tasks. The study will compare the layered auditory architecture with conventional cue-based approaches. Evaluation metrics will include objective measures such as navigation accuracy and task completion time, as well as subjective measures including perceived spatial awareness, perceptual continuity, and immersion. In addition, the study will examine the impact of camera-based interaction on engagement and perceived agency. The evaluation is designed to investigate whether continuous auditory representation improves coherence between auditory feedback and gameplay experience. A central contribution of this work lies in the aesthetic integration of accessibility. Rather than functioning as an external assistive layer, accessibility-oriented audio is embedded within the core sound design. Informational signals emerge through transformations of existing sound materials, allowing perceptual clarity to be achieved without disrupting immersion. This reframes audio accessibility as an integral component of auditory interaction design. From a practical perspective, the system is structured as a modular and parameter-driven framework, allowing scalable implementation across different platforms. Potential constraints related to computational load—particularly in real-time processing and camera-based input—are considered, with an emphasis on efficient parameter mapping and system optimization for resource-limited environments such as mobile and virtual reality.

Speakers

Jiwon Kwak

Wednesday July 1, 2026 12:30pm - 1:00pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

Application / Gaming, Workshop

1:00pm CEST

Effect of the template on the predicted sagittal-plane sound-localization performance

Wednesday July 1, 2026 1:00pm - 1:30pm CEST

Jussieu:Conf 2 (Binaural)

In the state-of-the art models of sagittal-plane sound localization, template head-related transfer functions (HRTFs) are used to reflect the listener's internal calibration of auditory space decoding, and thus determine the prediction quality. The effect of the template HRTFs has not yet been investigated directly. Here, a model was calibrated separately to two HRTF measurements of the same listeners and its predictions were compared to behavioral localization responses of these listeners obtained in three listening conditions: two acoustically measured HRTF sets (those used for model calibration), and an additional condition (unseen during the calibration) used to test the model's ability to generalize. We analyzed the quadrant error rates (QE) and local polar errors (PEs) from eight listeners. The predicted errors were similar in both calibration conditions and increased in the unseen condition. The quality of the predictions, however, varied significantly with the template, more for PE than for QE, slightly preferring one template over the other when predicting the unseen condition. Our findings suggest that small differences in HRTFs used for the template may influence the prediction quality, especially when applied to unseen listening conditions.

Speakers

Felix Perfler

Piotr Majdak

Wednesday July 1, 2026 1:00pm - 1:30pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

Binaural, Lecture

1:30pm CEST

Eliciting the Effectiveness of Binaural Renderer Enhancement on the Horizontal Plane

Wednesday July 1, 2026 1:30pm - 2:00pm CEST

Jussieu:Conf 2 (Binaural)

Binaural rendering is central to spatial audio reproduction via headphones and wearable devices, yet systematic evaluation of enhancement techniques remains methodologically inconsistent across the literature. This paper presents and applies a subjective evaluation methodology designed to consistently elicit five perceptual attributes of headphone-based spatial audio: externalization, elevation fidelity, front/back distinction, room tone, and spectral coloration. The methodology combines absolute judgment and relative comParison protocols, taking advantage of their complementary capabilities to capture both the absolute perceptual quality of individual rendering conditions, and the salience of perceived differences between them. It is applied in a controlled experiment comparing conventional HRTF-based binaural rendering against two enhancement variants that each superpose a masked spatially diffuse sound field component into the binaural output. Stimuli were spatialized across eleven azimuthal positions in the horizontal plane using a generic dummy-head dataset, and presented over closed-back headphones to participants. The results validate the proposed methodology as a tool for revealing perceptually relevant differences among binaural rendering conditions. Relative comParison tests reveal additional performance difference details between rendering methods. Both enhancement methods significantly improve externalization and front/back distinction relative to unenhanced binaural rendering, with the largest gains at lateral azimuths, and without a statistically significant increase in perceived spectral coloration beyond the baseline effect of HRTF filtering. However, the results do not indicate conclusively whether the binaural rendering methods examined here exhibit or mitigate a "spurious elevation" artifact associated with frontal sound presentation.

Speakers

Garrett Treanor

Shaunak Ranade

Jean-Marc Jot

D

Daphna Harel

Agnieszka Roginska

Wednesday July 1, 2026 1:30pm - 2:00pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

Binaural, Lecture

2:00pm CEST

How representative are Dummy head HRTFs? A subjective comParison of mannequin and human datasets

Wednesday July 1, 2026 2:00pm - 2:30pm CEST

Jussieu:Conf 2 (Binaural)

Head-related transfer functions (HRTFs) are central to convincing binaural rendering in virtual and augmented reality applications. While individual HRTFs offer the highest perceptual fidelity, the practical difficulty of personal HRTF acquisition drives widespread use of dummy head (mannequin) measurements as a non-individualized substitute. Despite their ubiquity, systematic perceptual benchmarking of dummy head HRTFs against human HRTFs remains limited, particularly with respect to whether consistent trends emerge across listeners irrespective of individual HRTF preference. This study extends prior work on subjective HRTF evaluation methodologies and perceptual ranking by applying an established trajectory-based quality assessment paradigm to a mixed set of dummy head and human HRTFs. Participants were presented with predefined auditory trajectories rendered via binaural synthesis and asked to rate the perceptual quality of each rendering with respect to adherence to the prescribed trajectory. HRTFs were presented in randomised order across two sets of eight, with repeated items serving as an inter-set normalisation anchors. The HRTF pool encompassed human measurements alongside a range of dummy head types: simplified head-only geometries, head-and-torso simulators (HATS), and models incorporating absorptive materials (hair, clothing analogues). The primary research question is whether, despite well-documented listener-dependent variability in HRTF suitability, population-level trends differentiate dummy head HRTFs from human ones, and further, whether acoustic complexity of the mannequin (torso, absorptive surfaces) correlates with perceptual performance. Results are discussed in terms of implications for HRTF database design and substitute HRTF selection strategies for immersive audio applications.

Speakers

Brian F.G. Katz

Samuel D. Bellows

Wednesday July 1, 2026 2:00pm - 2:30pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

HRTFs, Lecture

2:30pm CEST

Understanding Ambisonics Through Practical Decisions and Listening

Wednesday July 1, 2026 2:30pm - 3:30pm CEST

Jussieu:Conf 2 (Binaural)

Ambisonics is a scene-based spatial audio format that has been around since the 1970s. In recent years its popularity has increased, with inclusion in game engines (such as Unity and Unreal) and distribution standards (like ADM or IAMF). Despite this, many practitioners view Ambisonics as being complex, mathematical, or academic. This tutorial explains Ambisonics through the lens of practical decision-making. Instead of equations, it covers the choices audio professionals are required to make when working on a project, with a particular emphasis on the audible consequences of those choices. The tutorial enables attendees to develop an intuitive understanding of Ambisonics through explanations of theory, combined with listening examples and workflow demonstrations. The topics covered in this tutorial are: • Fundamentals: What Ambisonics is and how it differs from channel-based and object-based formats, and why it is well suited to VR, AR, and game audio. • Encoding: How audio sources are converted to Ambisonics, and how to choose the Ambisonic order based on perceptual and computational trade-offs, as well as delivery constraints. • Conventions: Common channel ordering (ACN vs FuMa) and gain normalisation (SN3D vs N3D) conventions, and what happens when things get mismatched. • Processing: What kinds of effects can be used on Ambisonic signals while preserving the spatial integrity. • Decoding and binaural rendering: How Ambisonic signals are converted to loudspeaker or binaural signals. The impact of head-tracking and HRTF selection on the binaural rendering. • Mixed-order projects: What the options are when working with mixed-order sources, and the audible artefacts that can arise. The tutorial will provide brief practical demonstrations of setting up an Ambisonics project in Pro Tools and Reaper, two widely used DAWs for immersive audio. By the end of the tutorial attendees will have a practical understanding of the main concepts of Ambisonics, as well as knowing how the practical choices they make will impact the final audio. They will also be familiar with the main workflow pitfalls and how to avoid them. The tutorial assumes familiarity with general audio production concepts (DAW use, signal routing, mixing). However, no prior experience with Ambisonics or spatial audio formats is required. It is suitable for sound designers, composers, and audio engineers working in or interested in immersive media.

Speakers

Peter Stitt

Wednesday July 1, 2026 2:30pm - 3:30pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

HOA, Tutorial

3:30pm CEST

Rendering 6DoF Audio in Augmented Reality: Perceptual Evaluation of Game Engines, Plugins, and Middleware

Wednesday July 1, 2026 3:30pm - 4:00pm CEST

Jussieu:Conf 2 (Binaural)

Contemporary game engines such as Unreal Engine and Unity are widely used for Extended Reality (XR), yet their native audio pipelines often rely on simplified spatialization with limited acoustic control. In Augmented Reality (AR), virtual sound sources must integrate coherently with the physical environment to maintain perceptual plausibility, making accurate Six-Degrees-of-Freedom (6DoF) rendering critical. This study carried out a perceptual evaluation of multiple 6DoF audio rendering approaches, including Audiokinetic Wwise (Reflect and RoomVerb), Steam Audio, Meta XR Audio SDK, a dense 6DoF Room Impulse Response (RIR) interpolation method, and APLVirtuoso XR. A measured physical room was reconstructed in Unreal Engine 5, and the rendering pipelines were calibrated by matching the reverberation time (RT60) and direct-to-reverberant ratio (DRR) to the measured room within their respective just noticeable difference (JND) thresholds. The results showed significant differences in perceived spatial and timbral fidelity as well as plausibility and overall listening experience across the tested renderers. Despite the high accuracy achieved by the 6DoF interpolation method, an algorithmic renderer demonstrated comparable or superior performance. However, some other algorithmic renderers exhibited tradeoffs in terms of computational overhead and acoustic modelling accuracy. Our findings indicate that an optimised system prioritising a plausible auditory representation, rather than strict physical replication, may be sufficient and, in some cases, can yield superior perceptual outcomes.

Speakers

Kush Munjal

Hyunkook Lee

Dale Johnson

Wednesday July 1, 2026 3:30pm - 4:00pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

Plausibility, Lecture

4:00pm CEST

Audiovisual Coherence Thresholds for Direct-to-Reverberant Energy Ratio

Wednesday July 1, 2026 4:00pm - 4:30pm CEST

Jussieu:Conf 2 (Binaural)

Holographic calling systems aim to create the perceptual illusion that a remote interlocutor is physically present by combining augmented reality (AR) visualizations with spatial audio rendering. A key cIRCAM:Galleryenge in such systems is achieving audiovisual coherence when room-acoustic properties must be inferred, since this can lead to greater inaccuracies than when they are estimated from measurements. In this study, we investigate perceptual tolerance to mismatches in the direct-to-reverberant energy ratio (DRR), a critical cue for auditory distance perception. We conducted a listening experiment in which participants judged whether the audio and the visual presentation of a stimulus were coherent using a yes/no task. Stimuli included both a volumetric capture of a speaking human avatar, rendered at the correct direct sound level, and a loudspeaker reproducing wideband noise. For the loudspeaker, level roving was introduced to assess the influence of intensity cues on listener decisions. Results show that audiovisual coherence is maintained for half of the presentations within a DRR mismatch range of approximately 3 dB for too dry and 4.3 dB for too reverberant renderings. Within the limited number of participants included in the study, no evidence for significant differences was found between speaking avatars and loudspeakers reproducing noise, nor between different ranges of level roving in the loudspeaker condition. Nevertheless, the findings help to understand the consequences of DRR estimation mismatches for holographic AR calling experiences.

Speakers

Peter Lindahl

Johannes M. Arend

Nils Meyer-Kahlen

Wednesday July 1, 2026 4:00pm - 4:30pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

Plausibility, Lecture

4:30pm CEST

Assessing the Plausibility of Measurement-Based Auralization of Sound Transmission through Walls

Wednesday July 1, 2026 4:30pm - 5:00pm CEST

Jussieu:Conf 2 (Binaural)

In building acoustics, modeling and auralizing sound transmission through walls is a relevant area of research related to residential well-being and workplace productivity. While most existing studies use auralization methods that focus on accurately reproducing spectrum and amplitude of the sound transmission in accordance with ISO standards, almost no studies have explicitly tried to evaluate the perceptual quality of wall transmission auralization. To address this, the present study applies the plausibility paradigm to evaluate a measurement-based auralization approach to validate it for further psychoacoustic experiments on residential well-being. Binaural Room Impulse Responses (BRIRs) were measured for three loudspeakers placed in adjacent rooms to a central listening room. In a subsequent listening test, participants were asked to evaluate the overall plausibility of the auralization. Results demonstrate that due to the absence of visual cues, the lower sound pressure level, the reduced signal-to-noise ratio, and the diffuse radiation characteristic of the source, high plausibility scores close to the guessing rate were achieved for all adjacent rooms. These findings suggest that due to the lack of a visual cue, lower sound pressure level, reduced signal to noise ratio and the diffuse radiation characteristics of the source, auralization of wall transmission using BRIRs can be used as a plausible method for psychoacoustic research on residential well-being, without the need for complex physical simulations.

Speakers

Hendrik Himmelein

Florian Köber

Christoph Pörschmann

Wednesday July 1, 2026 4:30pm - 5:00pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

Plausibility, Lecture

5:00pm CEST

The quest for decorrelation continues -- from stereo to immersive

Wednesday July 1, 2026 5:00pm - 6:00pm CEST

Jussieu:Conf 2 (Binaural)

Using pitch, delay and modulation effects to perceptually spread a mono source across additional, adjacent playback channels is a staple of music production, born in stereo and extended to immersive. This tutorial begins with the historical origins of the effect in the mid 1970s, showing the signal processing chains used, measuring its impact on the signal, discussing its psychoacoustic merit, and demonstrating the resulting sound. The evolution from stereo through surround sound to immersive formats using contemporary production tools and techniques is demonstrated. The effect is still evolving as new tools are developed and creators explore what is possible across all immersive artforms – music, AR/VR, and games. It is hoped a deep dive into the first 50 years of this effect will inform and inspire immersive mixers for the future.

Speakers

Alex Case

Wednesday July 1, 2026 5:00pm - 6:00pm CEST
Jussieu:Conf 2 (Binaural) 4, place Jussieu Paris 5e

Application / Gaming, Tutorial