This study introduces needlets, a specific class of spherical wavelets, for spatial audio applications. Needlets are constructed in the spherical harmonic domain, are mathematically well defined, possess good localisation properties, and facilitate multiresolution analysis. However, because they form a tight frame, they are redundant and therefore require sparsification for practical applications. We propose a comprehensive spatial audio framework based on needlets, spanning encoding through to head-tracking-enabled binaural rendering. In this framework, a sound scene is encoded into a redundant needlet dictionary, which is subsequently sparsified using a novel algorithm. The resulting sparse representation is then decoded for headphone reproduction. Scene rotation is achieved by applying SO(3) rotation matrices to the sparse representation. The perceptual implications of the framework’s design parameters were evaluated using objective metrics and compared with those of Ambisonics. Initial results show that the proposed framework can achieve better tonal and spatial fidelity than third- and fourth-order Ambisonics Magnitude Least-Squares decoding while using a similar number of channels. Moreover, the proposed framework has been shown to allow users to tune the reproduced sound scene while maintaining fidelity.
Pressure-matching (PM) for personal sound zone (PSZ) can achieve high contrast at nominal control points, but the performance may degrade when transfer functions are mismatched. We introduce a neural method that maps transfer functions to loudspeaker weights using a single-frequency input network with parameters shared across frequencies. We evaluate the robustness under position shifts, additive transfer-function noise, and added reflections, and compare against PM with Tikhonov regularization. Results show improved robustness to structured perturbations such as listener displacement, whereas regularized PM remains more resilient to unstructured random transfer-function noise and reverberation. We further explain these results using a singular value decomposition based perturbation projection. Finally, we analyze different regularization mechanisms induced by the network and derive practical guidelines for neural PSZ filter optimization.
To enable dynamic control in transaural personal sound zone (PSZ) systems, accurate binaural room impulse responses (BRIRs) at various listener positions are needed. Since it is impractical to measure BRIRs at all possible positions, interpolation from a sparse set of measured positions can be used. Although numerous BRIR interpolation methods exist, their effectiveness in sound field control applications remains unclear. In this paper, we propose a sub-band interpolation method that combines linear interpolation for frequencies lower than 2000 Hz with sinusoidal representation networks for frequencies higher than 2000 Hz. The interpolated BRIRs are then applied in a PSZ control system. Simulation results demonstrate that this hybrid approach significantly improves system performance at a wider frequency range.