Given the limited research on the use of extended reality (XR) technologies in remote music instruction from the perspective of music tutors, this work examines the perceptual importance of voice directivity within a virtual reality (VR) environment. In particular, the perceptual ability to discriminate differences between a measured vocal directivity pattern and a slightly modified omnidirectional directivity pattern is investigated. Two listening tests were conducted to probe directivity perception under (i) static and (ii) dynamic listener conditions within a simulated music practice room, integrating 3rd order Ambisonics Room Impulse Responses (RIRs) and head-tracked binaural reproduction within an interactive Unity-based interface. The static test used an ABX discrimination model to assess directivity detectability as a function of location and stimulus content. The dynamic test involved free navigation around a virtual singer and the evaluation of the perceived directional plausibility, naturalness of the sound emission, and the adequacy of the experience for the assessment of the singer’s vocal characteristics. The results suggest that while listeners can detect differences between vocal directivity patterns under controlled listening conditions, such differences may become less perceptually salient during dynamic interaction within a virtual environment. Nevertheless, the overall positive evaluations in the dynamic listening test indicate that the implemented spatial audio approach provides a plausible and effective auditory experience, supporting its potential use in XR-based applications for remote music instruction and performance evaluation.