Immersive virtual reality (VR) is increasingly used to simulate concert experiences, yet it remains unclear whether its experiential advantages are accompanied by corresponding physiological changes when audiovisual content is held constant. The present study compared head-mounted immersive VR concert playback with tablet-based video in a fully counterbalanced within-subject design. Musical content and audio reproduction were identical across conditions, isolating the effect of visual immersion. Results showed consistently higher subjective ratings in VR across all measures, including music-induced affect, chills, perceived presence, and performance liking. In contrast, physiological differences were comparatively small: electrodermal activity showed only a modest increase in VR, and heart rate variability did not reliably differentiate between conditions. These findings suggest that immersive VR substantially enhances subjective music experience, particularly in terms of presence and affective engagement, while corresponding changes in autonomic physiology are limited under controlled conditions. The results indicate that subjective and physiological responses were differentially sensitive to the presentation-format manipulation.
Audio Augmented Reality (AAR) can be experienced through different navigation techniques that may influence presence and spatial perception. This paper investigates the effects of navigation type and listener perspective on exploration behavior, presence, and localization accuracy in AAR systems built with consumer hardware. A within-subjects study compared four conditions: virtual navigation, virtual navigation with head tracking, physical navigation, and physical navigation with head tracking. Fifteen participants completed exploration and sound localization tasks in each condition. Results show that physical navigation increased presence and improved exploration behavior and localization performance compared to virtual navigation, while head tracking paired with a non-individualized HRTF for binaural rendering did not produce significant effects.
Virtual Reality has emerged as a promising medium for high-stakes training, yet its predominantly visual design places disproportionate demands on attentional resources, limiting capacity for other task-relevant information. Spatial audio cues exploit the underutilized auditory channel to redistribute this load, with demonstrated improvements in reaction time, search efficiency, and situational awareness. However, when audio cues are spatially incongruent with visual targets, task performance degrades. The cognitive and behavioral costs of such incongruency, particularly under increasing visual complexity, remain underexplored. This pilot study examines how audiovisual spatial incongruency affects mental workload and task performance through a within-subjects VR experiment in which 15 participants complete a search-and-respond task across congruent and incongruent audiovisual conditions at three levels of visual complexity. Reaction time, target accuracy, timeouts, and subjective workload are measured across 10 trials per participant. Audiovisual incongruency is hypothesized to increase mental workload and impair performance, with effects amplified under higher visual complexity. Findings will inform spatial audio design for immersive training systems and motivate further investigation into tolerance thresholds for audiovisual misalignment.
This paper introduces MAV-C, an offline, signal-based framework for the joint objective estimation of Audio-Visual Complexity (AVC) in locally-rendered interactive games. MAV-C integrates entropy-based Acoustic Scene Complexity (ASC) features with multi-scale visual complexity metrics adapted to video via optical flow variance, and fuses modality-specific scores via Minkowski pooling. Features are normalized to a common scale relative to analytical bounds, ensuring cross-sequence comparability. We present the framework architecture, report initial verification results on synthetic stimuli with known complexity properties, and outline a parametric sensitivity analysis evaluating the effect of Entropy Weight Method (EWM) regularization, motion scaling, and pooling exponent on discriminability across gameplay sequences of varying complexity.