Wave Field Synthesis is a well-established spatialization technique that solves the sweet-spot limitation of conventional sound reinforcement and uniquely allows the synthesis of focused sources — virtual sources positioned between the loudspeaker array and the listener. Despite its potential for extended reality (XR), WFS has remained confined to specialized environments such as live performance, installation or post-production workflows, with no accessible open-source tooling for research and creative authoring. We present AT_WaveSpace, an open-source WFS engine built on JUCE, distributed under the MIT licence, and integrated into Unity game engine, designed to democratize WFS for researchers, developers and creators. Building on the methodological framework of the SoundScape Renderer — which combined WFS engine development with a perceptual research platform — AT_WaveSpace serves simultaneously as a spatial audio delivery tool and as an experimental tool. A perceptual evaluation of near-field distance perception of focused WFS sources was conducted using this framework — a dimension absent from prior literature. Using a Midpoint ComParison procedure, participants were unable to rank sources at 40–100 cm consistently, while they ranked sources at 120–150 cm in correct order. Spectral centroid analysis reveals a distance-dependent timbral variation in the proximal zone whose physical origin remains unclear. Low-frequency ILD remains the primary candidate cue for correct ranking at 120–150 cm. Perspectives for further studies are outlined.
This study introduces needlets, a specific class of spherical wavelets, for spatial audio applications. Needlets are constructed in the spherical harmonic domain, are mathematically well defined, possess good localisation properties, and facilitate multiresolution analysis. However, because they form a tight frame, they are redundant and therefore require sparsification for practical applications. We propose a comprehensive spatial audio framework based on needlets, spanning encoding through to head-tracking-enabled binaural rendering. In this framework, a sound scene is encoded into a redundant needlet dictionary, which is subsequently sparsified using a novel algorithm. The resulting sparse representation is then decoded for headphone reproduction. Scene rotation is achieved by applying SO(3) rotation matrices to the sparse representation. The perceptual implications of the framework’s design parameters were evaluated using objective metrics and compared with those of Ambisonics. Initial results show that the proposed framework can achieve better tonal and spatial fidelity than third- and fourth-order Ambisonics Magnitude Least-Squares decoding while using a similar number of channels. Moreover, the proposed framework has been shown to allow users to tune the reproduced sound scene while maintaining fidelity.
Pressure-matching (PM) for personal sound zone (PSZ) can achieve high contrast at nominal control points, but the performance may degrade when transfer functions are mismatched. We introduce a neural method that maps transfer functions to loudspeaker weights using a single-frequency input network with parameters shared across frequencies. We evaluate the robustness under position shifts, additive transfer-function noise, and added reflections, and compare against PM with Tikhonov regularization. Results show improved robustness to structured perturbations such as listener displacement, whereas regularized PM remains more resilient to unstructured random transfer-function noise and reverberation. We further explain these results using a singular value decomposition based perturbation projection. Finally, we analyze different regularization mechanisms induced by the network and derive practical guidelines for neural PSZ filter optimization.
To enable dynamic control in transaural personal sound zone (PSZ) systems, accurate binaural room impulse responses (BRIRs) at various listener positions are needed. Since it is impractical to measure BRIRs at all possible positions, interpolation from a sparse set of measured positions can be used. Although numerous BRIR interpolation methods exist, their effectiveness in sound field control applications remains unclear. In this paper, we propose a sub-band interpolation method that combines linear interpolation for frequencies lower than 2000 Hz with sinusoidal representation networks for frequencies higher than 2000 Hz. The interpolated BRIRs are then applied in a PSZ control system. Simulation results demonstrate that this hybrid approach significantly improves system performance at a wider frequency range.