This paper investigates the impact of authentic spatial audio on verbal working memory (WM) within a WebXR-based virtual reality learning environment (VRLE). While prior virtual reality (VR) research has predominantly focused on visual modalities, the influence of auditory realism, particularly authentic spatialised sound, on cognitive performance remains underexplored. To address this gap, a controlled within-subjects experiment was conducted using an adapted automated operation span (AOSPAN) task under two conditions: with and without authentic spatial sound. A total of 40 participants completed the study using a head-mounted display in a controlled laboratory setting. The VRLE was implemented using web-based technologies, incorporating ambisonics audio capture and real-time spatial sound rendering. Statistical analysis revealed no significant differences in WM performance across conditions for all measured metrics, including OSPAN score, total correct recall, and error rates. However, results consistently showed a non-significant trend toward improved performance in the presence of authentic spatial sound. In contrast, subjective measures indicated substantial enhancements in perceived presence, immersion, realism, and user preference when spatial ambient audio was enabled. These findings suggest that while authentic spatial sound does not significantly influence verbal WM performance, it enhances experiential quality without increasing cognitive load. The study highlights the importance of incorporating realistic auditory environments in VR design for education, supporting user engagement while maintaining cognitive neutrality.