This study is motivated by an ambition to determine the ‘best’-matching HRTFs during an onboarding task for an audio-only virtual reality (VR) experience using a ‘shooting down sound sources’ task. The study is motivated by the needs of blind and visually impaired gamers, who may rely more crucially on accurate rendering of auditory spatial cues for succeeding in the audio-only VR experience. We present an exploratory study applying an experimental VR test platform that renders ‘target’ sound sources in a virtual environment and logs tracking characteristics of head, hand-held controller and body while participants localise and ‘shoot’ audible ‘targets’ that are visible (for task familiarisation) and invisible. Four game-relevant sound stimuli and three different HRTFs were tested across eight sessions on two separate days. In this study, we show data collected from fifteen seeing participants, which demonstrate an ability to localise the sound sources accurately. The tracking data suggests various search patterns (e.g. hemisphere swaps and direction reversals) associated with ‘weak’ localisation cues and possible ambiguities. The search patterns are likely all quantifiable via angular error, response time, path length, search directions, number of reversals, and search speed as determined from the tracking characteristics.