In this study, we present SCHuBERT, a real-time end-to-end Piano Music Emotion Recognition (PMER) system that operates directly on raw audio and fine-tunes DistilHuBERT for short-window classification on the Valence–Arousal (V–A) plane. Designed for low latency and high responsiveness, the system is particularly well suited for immersive applications such as Virtual and Augmented Reality (VR/AR). Compared with both audio- and symbolic-domain baselines, SCHuBERT achieves strong accuracy in four-quadrant (4Q) classification as well as in binary arousal and valence tasks, while maintaining low computational overhead for real-time operation.