The rapid mainstreaming of spatial audio has created a need for design frameworks that translate research into production-ready practice. This roundtable brings together contributing authors from Immersive Sound Volume II: The Design and Practice of Binaural and Multi-Channel Experiences for an open, cross-disciplinary conversation about where the field stands and where it is headed. Participants will draw on the text's core themes: perceptual grounding, system design, and the creative practice of building immersive sound experiences. The session is structured to encourage dialogue between contributors and attendees, surfacing points of debate, unresolved questions, and divergent perspectives across binaural and multi-channel workflows. Through case studies and cross-disciplinary perspectives, the session offers a practical roadmap for audio engineers, sound designers, and researchers navigating the evolving immersive audio industry. Attendees will leave with concrete frameworks applicable to both studio production and real-time XR deployment.
Multi-channel compact linear loudspeaker arrays combined with Crosstalk Cancellation (CTC) can deliver binaural audio to a user by directing sound precisely at the listener’s ears. This enables the reproduction of binaural, and thus all given spatial audio formats, without the need for headphones. The technique traditionally suffers from a small sweet-spot, which can be overcome by combining real-time head-tracking and position-adaptive beamforming. This workshop introduces the concept of CTC and beamforming combined with user head-tracking, covering both its theoretical foundations and the practical considerations of incorporating head tracking. A live demonstration will showcase head-tracked binaural audio delivered through a position-adaptive CTC soundbar.
In the state-of-the art models of sagittal-plane sound localization, template head-related transfer functions (HRTFs) are used to reflect the listener's internal calibration of auditory space decoding, and thus determine the prediction quality. The effect of the template HRTFs has not yet been investigated directly. Here, a model was calibrated separately to two HRTF measurements of the same listeners and its predictions were compared to behavioral localization responses of these listeners obtained in three listening conditions: two acoustically measured HRTF sets (those used for model calibration), and an additional condition (unseen during the calibration) used to test the model's ability to generalize. We analyzed the quadrant error rates (QE) and local polar errors (PEs) from eight listeners. The predicted errors were similar in both calibration conditions and increased in the unseen condition. The quality of the predictions, however, varied significantly with the template, more for PE than for QE, slightly preferring one template over the other when predicting the unseen condition. Our findings suggest that small differences in HRTFs used for the template may influence the prediction quality, especially when applied to unseen listening conditions.
Binaural rendering is central to spatial audio reproduction via headphones and wearable devices, yet systematic evaluation of enhancement techniques remains methodologically inconsistent across the literature. This paper presents and applies a subjective evaluation methodology designed to consistently elicit five perceptual attributes of headphone-based spatial audio: externalization, elevation fidelity, front/back distinction, room tone, and spectral coloration. The methodology combines absolute judgment and relative comParison protocols, taking advantage of their complementary capabilities to capture both the absolute perceptual quality of individual rendering conditions, and the salience of perceived differences between them. It is applied in a controlled experiment comparing conventional HRTF-based binaural rendering against two enhancement variants that each superpose a masked spatially diffuse sound field component into the binaural output. Stimuli were spatialized across eleven azimuthal positions in the horizontal plane using a generic dummy-head dataset, and presented over closed-back headphones to participants. The results validate the proposed methodology as a tool for revealing perceptually relevant differences among binaural rendering conditions. Relative comParison tests reveal additional performance difference details between rendering methods. Both enhancement methods significantly improve externalization and front/back distinction relative to unenhanced binaural rendering, with the largest gains at lateral azimuths, and without a statistically significant increase in perceived spectral coloration beyond the baseline effect of HRTF filtering. However, the results do not indicate conclusively whether the binaural rendering methods examined here exhibit or mitigate a "spurious elevation" artifact associated with frontal sound presentation.
Binaural signal synthesis is typically formulated as forward modelling using head-related transfer functions (HRTFs). We explore an inverse auditory modelling perspective in which binaural ear signals are estimated directly from a source signal and its azimuth. We present a lightweight complex-valued neural network that predicts frequency-domain binaural filters from the input source spectrum and azimuthal direction, which are then applied to synthesize binaural signals. Controlled experiments evaluate how excitation bandwidth and angular sampling density affect reconstruction and generalization. Results show accurate spectral reconstruction and interpolation to unseen source directions even when training uses sparse angular grids, while bandwidth strongly influences problem conditioning and error behaviour. This work focuses on characterizing compact signal-conditioned inverse models as efficient components for binaural signal generation.
Single-sided deafness (SSD) reduces access to binaural cues and can make spatial-audio localization difficult in virtual reality (VR). This study investigated short-term localization training under simulated SSD in a VR task using generic, non-individualized head-related transfer function (HRTF) rendering with head-movement-contingent auditory updating, and examined whether an enhanced HRTF could improve performance by emphasizing monaurally available spectral cues at the better-hearing ear. The rationale was that, although directional judgment in normal binaural listening depends strongly on interaural differences, monaural listening must rely more heavily on direction-dependent spectral characteristics that remain available at the better-hearing ear. Twenty normal-hearing participants performed a 13-source horizontal-plane localization task using a VR headset and headphones under simulated SSD. Participants were assigned to either normal-HRTF training or enhanced-HRTF training (n = 10 each). The experiment comprised pre-test, three training sessions, and post-test, and all participants were tested with both normal and enhanced HRTFs, yielding four train-test combinations. Performance was evaluated using accuracy (ACC), mean absolute error (MAE), and response time (RT). Localization performance improved with training under the present VR simulated-SSD condition. ACC increased and MAE decreased from pre-test to post-test, whereas RT showed no clear change. No significant overall between-group difference in cumulative improvement was observed. However, during training, the enhanced-HRTF group showed a significant first-session advantage, and matched train-test combinations showed descriptively larger gains than mismatched combinations. These results suggest that short-term VR localization training can improve directional judgment under simulated SSD and that enhancing monaural spectral cues may provide an early benefit by making direction-specific patterns easier to associate with source direction. The findings are limited to localization performance in the present VR task under simulated SSD and should not be directly generalized to clinical SSD populations, real-world auditory rehabilitation, or broader everyday 3D spatial-audio experience.