Head-related transfer functions (HRTFs) are fundamental to spatial audio via binaural rendering. Personalized HRTFs have been shown to improve localization accuracy and reduce perceptual artifacts and directional ambiguities. However, acquiring such HRTFs is time-consuming and requires costly measurement setups. To address this limitation, this article investigates the use of deep learning models to estimate personalized HRTFs from ear shape representations. We propose and evaluate three different architectures with various types of input data and identify the minimum achievable spectral distance error when predicting true HRTFs magnitude spectra. The best model we evaluated achieves a test Log Spectral Distortion (LSD) of 4.93 dB. We also established a performance ranking based on input data types and architectural choices.