Including diffraction modelling in an acoustic simulation is known to improve the plausibility of rendered room acoustics in Virtual Reality (VR). In VR, acoustic rendering only needs to satisfy the expectations raised by the visual room impression. In Augmented Reality (AR), however, the user’s natural acoustic environment provides an additional reference, which typically increases the perceptual demands. This study assesses a selection of diffraction modelling approaches in an augmented reality (AR) setting in an L-shaped corridor. The participants rated the plausibility and similarity using a paired-comParison paradigm. ComParisons were included between acoustic simulations and between simulations and a real sound source. This is, to the best of the authors’ knowledge, the first experiment investigating diffraction perception in an AR context. The results indicated that room auralisation including diffraction was rated as more plausible than auralisation without, similar to VR experiments. However, the real sound source was rated as more plausible than all of the simulations. These observations suggest that the relative performance of room acoustic modelling is perceived similarly in VR and AR experiments, but needs further improvement to be suitable for occlusion scenarios in AR, where diffraction modelling might not be the main limitation. In general, perceptually accurate acoustic modelling of a complex real environment remains a cIRCAM:Galleryenge in AR.
Auralization enables multisensory evaluation of architectural designs in Virtual Reality (VR), yet physically accurate acoustic simulations remain computationally prohibitive for interactive workflows. This study investigates efficient artificial reverberation methods as lightweight proxies for different stages of VR-based architectural design. After assessing the predictive capabilities of geometrically informed models, a hierarchical 3-Alternative-Forced-Choice listening experiment with a transferring task paradigm was conducted in VR using binaural audio. In this experiment, the measured room impulse responses of a physical space in untreated and acoustically treated conditions were compared with those from five auralization techniques. These techniques ranged from industry-standard simulations to artificial reverberators, all calibrated to the measured Energy Decay Curves. Statistical analysis revealed that the pure Image-Source Method was easily detected, likely because the late reverberation's temporal density was insufficient. Conversely, when incorporating a dense late reverberant tail, computationally efficient methods achieved perceptual comparability with high-fidelity simulations. Participants prioritized the timbral quality of late reverberation over geometric early reflections. This suggests that computationally efficient models can serve as convincing, scalable rendering tools for interactive design and presents this audiovisual VR paradigm as an ecologically valid platform for multisensory architectural assessment.
This study investigates whether humans can adapt to manipulated auditory distance cues in virtual environments. While adaptation to remapped auditory localization cues is well established, it remains unclear whether similar processes apply to distance perception, particularly when natural acoustic cues are systematically modified. Virtual reality (VR) systems often employ non-ecological distance laws to improve the intelligibility of distant sound sources, which can introduce conflicts between auditory and visual information. To examine perceptual adaptation under such conditions, we modified a binaural real-time room acoustic simulation engine with rendering in six-degrees-of-freedom. The manipulation consisted of holding the direct sound level constant across distance while applying a distance-dependent low-pass filter. Over four consecutive days, participants completed a training protocol combining alternating testing phases with gamified training sessions. Results show that participants successfully adapted to the altered distance cues, with most learning occurring within the first two days. Initial exposure to the manipulation severely disrupted distance perception, rendering participants unable to make reliable judgments. However, following training, perceived distance functions approached veridical performance for far distances, exhibiting slopes close to unity. In contrast, judgments at close distances remained highly variable, suggesting that the available spectral cues were insufficient for accurate estimation in this range. These findings demonstrate that auditory distance perception can be recalibrated through short-term perceptual learning, even when initial perceptual mappings are strongly degraded. Adaptation generalizes across contexts within the virtual environment, although limitations persist for near-field perception.
Acoustic performance remains insufficiently addressed in early-stage architectural design, where visual and spatial considerations typically guide decision-making. The present research examines the integration of real-time spatial auralization into multi-user virtual reality environments to facilitate collaborative evaluation of architectural acoustic performance during design exploration. A multi-user VR framework has been developed that embeds real-time binaural auralization within a collaborative design evaluation context. The framework supports shared inhabitation of virtual models, auditory exploration of spatialized sound sources, and systematic investigation of how spatial configurations and material properties affect perceived acoustic characteristics. Temporal soundscapes and acoustic implications of alternative materials and spatial arrangements can be evaluated through a gesture-driven interface. Spatialized voice communication further supports co-inhabitation of virtual environments and acoustically informed discussion. By embedding real-time auralization within collaborative virtual environments, the framework repositions acoustic performance as an experiential dimension of architectural design, enabling earlier incorporation of auditory considerations into the design process. An exploratory user study indicates that immersive, real-time auralization can integrate acoustic feedback into collaborative architectural design workflows, supporting multisensory evaluation and reducing dependence on late-stage corrective interventions.
Reproducing the acoustic consequences of source or receiver motion from a set of discrete static room impulse responses (RIRs) is a fundamental cIRCAM:Galleryenge in spatial audio processing, with direct relevance to the generation of training and evaluation data for machine learning systems operating on reverberant speech and audio. This paper presents a comparative evaluation of three offline algorithms for acoustic motion simulation, two of which are ports of previously published methods and one of which is an independent implementation conceptually related to prior work. All three methods are evaluated against a common reference using two complementary validation approaches: an objective analysis based on interaural time difference (ITD) estimation from synthetic binaural signals, and a perceptual evaluation conducted under the MUSHRA protocol using stimuli drawn from a controlled moving-receiver database. Results indicate that frequency-domain interpolation between neighbouring impulse responses provides the most accurate binaural cue reproduction and the highest perceptual similarity to the reference under spectrally demanding stimuli, while nearest-neighbour switching produces the most pronounced artefacts under broadband excitation. Time-domain crossfading between fully convolved signals yields intermediate performance, achieving parity with frequency-domain interpolation for speech and noise but falling significantly behind for music. The combination of ITD-based objective analysis and MUSHRA perceptual evaluation proved informative in characterising method differences, and the two measures converged on a consistent performance ordering across methods.
We have proposed a method called V2MA (VSVerb Virtual Microphone Array). This method virtually generates spatial room impulse responses (SRIRs) captured by a conventional microphone array using only a set of four impulse responses (IRs) measured by an A-format microphone in the target space. V2MA is based on the concept of geometrical acoustics, that involves a virtual sound source, also known as a mirror source. After measuring the four IRs using an A-format microphone, we calculate the instantaneous sound intensities in the x, y, and z directions. The “source intensities,” that contain sound source information, are detected from these sound intensities. Then, we estimate the locations, strengths, and phase characteristics of the sound sources. The spatial properties of the obtained virtual sound sources can be considered the fingerprint of a target space's reverberant characteristics. Using the manner of geometrical acoustics, we can update the spatial properties of the virtual sound sources to match a neighboring receiver position and a desired directivity. Lastly, we can obtain the SRIRs at any receiver (microphone) position with any directivity in the target room by translating the spatial information of the updated virtual sound sources into time responses. For immersive recording, large microphone arrays are often used. Using V2MA, we can virtually make an immersive recording using such a virtual microphone array, provided that we measure four IRs using an A-format microphone in the target space. In our previous study, we developed the framework of V2MA. To verify the plausibility of V2MA, this paper compares the responses of virtual (V2MA) and real (conventional) microphone arrays using measurement results collected in a practice IRCAM:Gallery under the various conditions. The results show similar overall characteristics, but also suggest the difficulty of a detailed evaluation. We also introduce practical examples of immersive recording using V2MA.
Realistic reproduction of spatial reverberation is essential for immersive audio applications, including virtual reality and interactive gaming. While geometrical acoustics methods enable efficient rendering, they do not fully capture wave phenomena such as low-frequency modal behavior and diffraction, which are particularly significant in small spaces. Wave-based simulations provide higher physical accuracy but at substantial computational cost. This paper extends VSVerb, a 4pi sampling reverberator based on virtual sound sources (VS) extracted via sound intensity analysis, to use pressure and three-axis particle velocity computed by a discontinuous Galerkin finite element method (dG-FEM) simulation, enabling reverberation that reflects the wave-based acoustic characteristics of virtual spaces to be generated. Experiments conducted in a university lecture room demonstrate that simulation-based VS distributions and their corresponding impulse responses closely match those derived from actual measurements. ComParison with measured impulse responses and geometrical acoustics ray tracing shows that the proposed method produces room acoustic parameters, including clarity and definition, closer to the measured reference across most metrics and frequency bands. A tendency to underestimate reverberation time was observed, which may be addressed through improved simulation modeling or post-processing. Furthermore, the VS distribution extracted from a single simulation can be adapted to different receiver positions by re-estimating the geometric contribution of each VS, enabling 6DoF navigation support without additional simulation. These results indicate the potential of the proposed framework for wave-based interactive reverberation in virtual spaces.
Numerous approaches have been taken to address the problem of generating navigable virtual models for multi-volume acoustic spaces. The general practice for creating empirically informed interactive models of multi-volume acoustic spaces, as embodied by the Spatially Oriented Format for Acoustics, is to discretely sample emitter-receiver pair positions. For a user to then navigate between these discrete positions involves cross-fading, blending, or otherwise perceptually interpolating between corresponding zones. This paper outlines a new approach which instead involves the continuous three-dimensional sampling of acoustic spaces, much as is done with 3D visual spaces in photogrammetry. To achieve this result, a first-of-its-kind consolidated ambisonic impulse response capturing apparatus has been designed and built. This apparatus combines a 3rd-order ambisonic microphone array with a 2nd-order ambisonic loudspeaker array and is designed to be moved through a space with maximal ease. AD/DA conversion, playback, and recording are all handled on a central compute platform. In parallel, a software workflow has been developed which can be implemented in Unreal Engine, as well as other game engines. To solve general issues of spatial audio in game engines, a custom encoding and decoding framework has been implemented. Then, to map the continuous ambisonic impulse response onto a virtual space, a spline mirroring the sampling path is drawn through the space. On the DSP side, an impulse response is extracted from any arbitrary point along the spline by way of the Common-slope Model for coupled spaces. Future work for better addressing early reflections and minimizing the theoretical intermediary of the Common-slope Model is discussed. Additionally, a special use case for visualizing acoustic energy in architectural acoustics is explored.