This paper presents an application-driven objective measurement framework for benchmarking spatial audio reproduction in smart glasses and extended reality (XR) headsets. Wearable XR devices render virtual spatial audio while users simultaneously perceive the physical acoustic environment, creating evaluation cIRCAM:Galleryenges distinct from conventional headphone-based playback. Existing approaches are often inconsistent, focusing on limited device classes or metrics, and do not support unified cross-device benchmarking. The proposed framework derives benchmark attributes from two application dimensions: the acoustic role of the device and the usage context. Measurements are organized into four groups: baseline playback checks, cue fidelity, sound leakage, and robustness to wearing variability. The framework adopts a system-level methodology that characterizes observable device behavior without requiring access to proprietary internal parameters, enabling reproducible cross-device comParison. An illustrative application of the framework is presented in a companion paper.