AES 2026 AVARIG Conference: Full Schedule

Schedule as of May 2026 - subject to change

Default Time Zone is EDT - Eastern Daylight Time

11:00am CEST

Residual Learning for Neural Ambisonics Encoders

Thursday July 2, 2026 11:00am - 11:30am CEST

Emerging wearable devices such as smartglasses and extended reality headsets demand high-quality spatial audio capture from compact, head-worn microphone arrays. Ambisonics provides a device-agnostic spatial audio representation by mapping array signals to spherical harmonic (SH) coefficients. In practice, however, accurate encoding remains cIRCAM:Galleryenging. While traditional linear encoders are signal-independent and robust, they amplify low-frequency noise and suffer from high-frequency spatial aliasing. On the other hand, neural network approaches can outperform linear encoders but they often assume idealized microphones and may perform inconsistently in real-world scenarios. To leverage their complementary strengths, we introduce a residual-learning framework that refines a linear encoder with corrections from a neural network. Using measured array transfer functions from smartglasses, we compare a UNet-based encoder from the literature with a new recurrent attention model. Our analysis reveals that both neural encoders only consistently outperform the linear baseline when integrated within the residual learning framework. In the residual configuration, both neural models achieve consistent and significant improvements across all tested metrics for in-domain data and moderate gains for out-of-domain data. Yet, coherence analysis indicates that all neural encoder configurations continue to struggle with directionally accurate high-frequency encoding.

Speakers

Thomas Deppisch

Yang Gao

Manan Mittal

Benjamin Stahl

Chris Hold

David Alon

Zamir Ben-Hur

Thursday July 2, 2026 11:00am - 11:30am CEST
IRCAM:ESPRO (HOA) 1, place Igor Stravinsky Paris 4e

HOA, Lecture

11:30am CEST

Evaluation of a Higher Order Ambisonic Renderer with Reverberation Compensation via Crosstalk Inversion

Thursday July 2, 2026 11:30am - 12:00pm CEST

IRCAM:ESPRO (HOA)

In this work, the authors evaluate a higher-order Ambisonic (HOA) renderer that compensates for reverberant characteristics of the intended listening room; this is accomplished by decoding a HOA signal to control points distributed around a boundary surrounding the listening area, then convolving the control signal with a compensation filter derived via matrix inversion of room impulse responses (RIR) from loudspeakers to control points in the frequency domain. First, a comParison is performed over renderers utilizing increasing control point density and evaluated using simulated RIRs. Then, robustness of the renderer to simulation inaccuracy is evaluated experimentally in a listening room. Metrics of reconstructed soundfield directionality and reverberation are compared to those obtained from a conventional HOA decoder, and results demonstrate an increase in source directivity, and a reduction in reverberation time for both directional and diffuse stimuli.

Speakers

Alex Tung

Mark Rau

Thursday July 2, 2026 11:30am - 12:00pm CEST
IRCAM:ESPRO (HOA) 1, place Igor Stravinsky Paris 4e

HOA, Lecture

12:00pm CEST

Investigating the “Ring of Silence” in Loudspeaker and Binaural Reproduction Using Advanced Ambisonic Decoding Strategies

Thursday July 2, 2026 12:00pm - 12:30pm CEST

IRCAM:ESPRO (HOA)

Higher-Order Ambisonics (HOA) reproduction with conventional mode-matching decoders can exhibit the so-called “ring of silence,” characterised by sound level reduction in specific spatial or spectral regions. This effect arises in loudspeaker reproduction when the number of loudspeakers exceeds that required by the Ambisonic order, and in binaural rendering when head-related transfer functions (HRTFs) are sampled at a higher spatial resolution than supported by the input signal. This paper investigates the extent to which advanced Ambisonic decoding strategies can mitigate this artefact. In particular, decoders based on Lasso regularisation and magnitude least-squares (magLS) are evaluated through numerical simulations in both loudspeaker and binaural reproduction scenarios. The results show that both approaches significantly reduce the prominence of the ring of silence compared to conventional minimum-norm mode-matching decoders. In loudspeaker reproduction, a more uniform spatial distribution of SPL is obtained, while in binaural rendering, spectral consistency is improved. An interpretation of these results is proposed, linking the observed behaviour to the underlying optimisation criteria of the decoding process. The results indicate that the ring of silence is not an inherent limitation of Ambisonics, but rather a consequence of the decoding strategy, and can be effectively mitigated through appropriate decoder design.

Speakers

AES 2026 AVARIG Conference

11:00am CEST

Thomas Deppisch

Yang Gao

Manan Mittal

Benjamin Stahl

Chris Hold

David Alon

Zamir Ben-Hur

11:30am CEST

Alex Tung

Mark Rau

12:00pm CEST

Filippo Maria Fazi

Jacob Hollebon

Yueheng Li

Get help with the event