Eclipsa Audio, based on the Immersive Audio Model and Format (IAMF) specification developed by members of the Alliance for Open Media, represents an open and royalty-free approach to immersive audio creation and delivery. Eclipsa Audio provides a growing ecosystem for producing and distributing spatial audio content, with hardware integration and streaming platform support, including YouTube, actively being rolled out. This panel brings together practitioners, researchers, and engineers directly involved in the development of IAMF and Eclipsa Audio to inform the audio engineering community about the current state of the format and its evolving toolkit for immersive audio production and delivery. The presenters will discuss how the Eclipsa Audio ecosystem can continue growing in the live and interactive realms, including 360 videos, streaming, gaming and the combination of both, eg. in e-sports. Future directions for development will also include developers' perspective on how Eclipsa Audio can be embraced by interactive environments.
Capturing acoustic scenes with head-worn microphone arrays is cIRCAM:Galleryenging due to a limited number of sensors and constrained placement flexibility. Nevertheless, binaural reproduction based on these arrays has been recently proposed using binaural signal matching (BSM), showing high robustness and computational efficiency, but inferior performance compared to the more computationally complex signal-dependent methods, in particular at low reverberation conditions. To address this gap, this paper investigates the Field-of-View informed Binaural Signal Matching (FoVi-BSM) method for far-field sources. FoVi-BSM incorporates a diagonal spatial weight matrix directly into the error formulation, redefining the filter solution to prioritize spectral and spatial fidelity within a predefined FoV, showing performance comparable to signal dependent methods but with the same computation complexity as BSM. The performance of the method is evaluated through objective single-source anechoic simulations, multi-source reverberant Monte Carlo simulations, and a subjective MUSHRA listening test. Results demonstrate that prioritizing the FoV improves rendering accuracy within the targeted region over the standard baseline BSM method, achieving perceptual quality comparable to signal-dependent parametric methods while maintaining baseline-level performance outside the FoV.
Integrating microphone arrays into head-worn devices, such as augmented reality (AR) and virtual reality (VR) headsets, as well as hearing aids, has become increasingly popular for capturing and reproducing acoustic scenes. A common requirement in many such systems is a dense set of array transfer functions~(ATFs). However, dense ATFs are cumbersome to measure, and practical setups commonly yield sparse grids rather than the uniform dense sampling often required. This motivates the use of interpolation to reconstruct dense ATF sets from sparse measurements. This paper evaluates spherical harmonics and natural-neighbor interpolation, each combined with onset-based time-alignment and post-interpolation magnitude correction, for a head-worn array across sampling densities. To examine how interpolation errors propagate to binaural rendering, the interpolated ATFs are substituted into two recent filter design methods: signal-independent binaural signal matching (BSM) and a signal-dependent method combining COMPASS parametric spatial coding with BSM (COM). Results show that BSM remains largely robust to interpolation errors, while COM substantially degrades under sparse sampling conditions with errors comparable to BSM at the lowest density, but achieves considerably lower errors than BSM as the sampling grid density increases. This is because BSM averages errors across all steering directions, while COM relies on individual steering vectors for source-directed beamforming.
We propose a baffleless circular array of radially outward facing cardioid microphones that produces standard ambisonic signals. The array produces an $N$th-order ambisonic signal from $2N+1$ microphones. It can be seen as a baffleless variant of the previously proposed equatorial microphone array, which uses a rigid spherical baffle. The simplicity of the microphone arrangement comes at the price of not being able to extract certain spatial information from the captured sound field. The ambisonic output signal represents a horizontal projection of the captured sound field. We demonstrate that interaural elevation cues are maintained when binaural rendering is performed despite the horizontal projection.
We summarise recent clinical studies on neurosensory differences between humans across age and gender, in particular related to Auditory Envelopment and reproduced sound. In audio engineering, listening quality is generally explained and tested considering just snapshot (frequency domain) metrics: Frequency response, sound pressure level, distortion and direction of sound. However, the two elusive dimensions, time and change, play the most important roles. Not only in hearing, but in all our five primary senses. Human sensory physiology is summarised, along with recent studies on a particular time-domain dimension, Auditory Envelopment (AEV). Naive and professional subjects of any age and across genders, point to AEV as a coherent and universal inducer of emotion in sound reproduction. With a conference in Paris attendees are furthermore able to experience AEV at one of most conducive places for it to emerge naturally: Inside the newly renovated Notre Dame cathedral. Using the quadrant model from Nordic universities, it is discussed if audio engineering, like medicine and other “objective” sciences, may have been tending primarily to adult male needs and preferences. Our literature is abundant with investigation of frequency-domain attributes such as power, punch and mobilisation; but ignores physiological and mental effects of the time-domain modulation that happens on basically any time-scale, as we listen. It appears stonemasons centuries ago knew important things about humans and hearing that would have been lost by now, had it not been for their magnificent monuments. Because of those efforts, however, we still have guides to what sound might achieve on new platforms, when we remember to listen slowly.
The Brittany-based consortium HDaudio3D (LabSTICC, Brest; Noise Makers, Rennes; and Feichter Audio, Lannion) has been working for five years on the development of a professional 3D audio recording solution: one offering sufficient spatial accuracy and signal-to-noise ratio, compatibility with headphones and speakers, and the ability to manipulate the audio scene in post-production. The solution is now operational and includes a 7th-order ambisonic sensor (with 480 MEMS on the surface of a 16 cm icosahedral tetrahedron), signal processing and a software suite. Pascal Rueff, a sound engineer specialising in binaural sound and François Salmon, research engineer at Noise Makers, members of the HDaudio3D consortium, will present the system and play excerpts from recordings made for various use cases.
This study investigates whether humans can adapt to manipulated auditory distance cues in virtual environments. While adaptation to remapped auditory localization cues is well established, it remains unclear whether similar processes apply to distance perception, particularly when natural acoustic cues are systematically modified. Virtual reality (VR) systems often employ non-ecological distance laws to improve the intelligibility of distant sound sources, which can introduce conflicts between auditory and visual information. To examine perceptual adaptation under such conditions, we modified a binaural real-time room acoustic simulation engine with rendering in six-degrees-of-freedom. The manipulation consisted of holding the direct sound level constant across distance while applying a distance-dependent low-pass filter. Over four consecutive days, participants completed a training protocol combining alternating testing phases with gamified training sessions. Results show that participants successfully adapted to the altered distance cues, with most learning occurring within the first two days. Initial exposure to the manipulation severely disrupted distance perception, rendering participants unable to make reliable judgments. However, following training, perceived distance functions approached veridical performance for far distances, exhibiting slopes close to unity. In contrast, judgments at close distances remained highly variable, suggesting that the available spectral cues were insufficient for accurate estimation in this range. These findings demonstrate that auditory distance perception can be recalibrated through short-term perceptual learning, even when initial perceptual mappings are strongly degraded. Adaptation generalizes across contexts within the virtual environment, although limitations persist for near-field perception.
Acoustic performance remains insufficiently addressed in early-stage architectural design, where visual and spatial considerations typically guide decision-making. The present research examines the integration of real-time spatial auralization into multi-user virtual reality environments to facilitate collaborative evaluation of architectural acoustic performance during design exploration. A multi-user VR framework has been developed that embeds real-time binaural auralization within a collaborative design evaluation context. The framework supports shared inhabitation of virtual models, auditory exploration of spatialized sound sources, and systematic investigation of how spatial configurations and material properties affect perceived acoustic characteristics. Temporal soundscapes and acoustic implications of alternative materials and spatial arrangements can be evaluated through a gesture-driven interface. Spatialized voice communication further supports co-inhabitation of virtual environments and acoustically informed discussion. By embedding real-time auralization within collaborative virtual environments, the framework repositions acoustic performance as an experiential dimension of architectural design, enabling earlier incorporation of auditory considerations into the design process. An exploratory user study indicates that immersive, real-time auralization can integrate acoustic feedback into collaborative architectural design workflows, supporting multisensory evaluation and reducing dependence on late-stage corrective interventions.
Reproducing the acoustic consequences of source or receiver motion from a set of discrete static room impulse responses (RIRs) is a fundamental cIRCAM:Galleryenge in spatial audio processing, with direct relevance to the generation of training and evaluation data for machine learning systems operating on reverberant speech and audio. This paper presents a comparative evaluation of three offline algorithms for acoustic motion simulation, two of which are ports of previously published methods and one of which is an independent implementation conceptually related to prior work. All three methods are evaluated against a common reference using two complementary validation approaches: an objective analysis based on interaural time difference (ITD) estimation from synthetic binaural signals, and a perceptual evaluation conducted under the MUSHRA protocol using stimuli drawn from a controlled moving-receiver database. Results indicate that frequency-domain interpolation between neighbouring impulse responses provides the most accurate binaural cue reproduction and the highest perceptual similarity to the reference under spectrally demanding stimuli, while nearest-neighbour switching produces the most pronounced artefacts under broadband excitation. Time-domain crossfading between fully convolved signals yields intermediate performance, achieving parity with frequency-domain interpolation for speech and noise but falling significantly behind for music. The combination of ITD-based objective analysis and MUSHRA perceptual evaluation proved informative in characterising method differences, and the two measures converged on a consistent performance ordering across methods.
We have proposed a method called V2MA (VSVerb Virtual Microphone Array). This method virtually generates spatial room impulse responses (SRIRs) captured by a conventional microphone array using only a set of four impulse responses (IRs) measured by an A-format microphone in the target space. V2MA is based on the concept of geometrical acoustics, that involves a virtual sound source, also known as a mirror source. After measuring the four IRs using an A-format microphone, we calculate the instantaneous sound intensities in the x, y, and z directions. The “source intensities,” that contain sound source information, are detected from these sound intensities. Then, we estimate the locations, strengths, and phase characteristics of the sound sources. The spatial properties of the obtained virtual sound sources can be considered the fingerprint of a target space's reverberant characteristics. Using the manner of geometrical acoustics, we can update the spatial properties of the virtual sound sources to match a neighboring receiver position and a desired directivity. Lastly, we can obtain the SRIRs at any receiver (microphone) position with any directivity in the target room by translating the spatial information of the updated virtual sound sources into time responses. For immersive recording, large microphone arrays are often used. Using V2MA, we can virtually make an immersive recording using such a virtual microphone array, provided that we measure four IRs using an A-format microphone in the target space. In our previous study, we developed the framework of V2MA. To verify the plausibility of V2MA, this paper compares the responses of virtual (V2MA) and real (conventional) microphone arrays using measurement results collected in a practice IRCAM:Gallery under the various conditions. The results show similar overall characteristics, but also suggest the difficulty of a detailed evaluation. We also introduce practical examples of immersive recording using V2MA.