Source Localization

Introduction extracted from [1]

The human ability to distinguish when a sounding object is close or far from us is completely developed when we are just a few months old. In fact, the development of the localization mechanisms used by the human auditory system takes place before being one year old. The localization of sound sources is possible because the human brain analyzes all the signals arriving through our ears, using subtle differences in intensity and other spectral and timing cues to recognize the direction of one or even several sound sources.

While localizing sound sources does not require any special effort for a human subject, for machines, sound source localization in a room is a complicated process since not all the sound objects have the same spectral properties, they occur at different time instants and at different spatial positions, and the process is strongly affected by reflections. Acoustic reflections dominate the perception of sound in a room modifying the spatial characteristics of the perceived sources.

Over the last decades, the scientific community has dedicated many efforts to localize sound sources in space by means of microphone array systems and, today, achieving high localization performance is still a challenge. Microphone arrays have been used in many applications, such as speech recognition, teleconferencing systems, hands-free speech acquisition, digital hearing aids, video-gaming or autonomous robots.

Algorithms for SSL can be broadly divided into indirect and direct approaches.

  • Indirect approaches usually follow a two-step procedure: they first estimate the Time Difference Of Arrival (TDOA) between microphone pairs and, afterwards, they estimate the source position based on the geometry of the array and the estimated delays.
  • Direct approaches perform TDOA estimation and source localization in one single step by scanning a set of candidate source locations and selecting the most likely position as an estimate of the source location. In addition, information theoretic approaches have also shown to be significantly powerful in source localization tasks.

The SRP-PHAT algorithm is a direct approach that has been shown to be very robust under difficult acoustic conditions. The algorithm is commonly interpreted as a beamforming-based approach that searches for the candidate source position that maximizes the output of a steered delay-and-sum beamformer.

Lecture Contents

In this lecture, we covered the general techniques to perform Source Localization.

  1. SL using TDOA hyperbolic positioning – more info [1]
  2. Beamforming Basics- more info [2,3,4]
    1. Introduction
    2. Delay and Sum beamforming
    3. Weighting Sum beamforming
    4. Near-field beamforming
  3. TDE estimation via Generalized Cross Correlation (GCC)- more info [1]
  4. SPR-PHAT Method – more info [1]
  5. Conclusions


[1] A. Martí Guerola, “Multichannel Audio Processing for Speaker Localization, Separation and Enhancement”. PhD Thesis. Universitat Politécnica de Valencia, Valencia, Spain.

[2] Iain McCowan, “Microphone Arrays : A Tutorial”


[4] Microphone Beamforming Simulation Tool

Wave Field Synthesis

WFS is a spatial sound rendering technique that generates a true sound field using loudspeaker arrays [Berkhout et al., 1993]. Wave fields are synthesized based on virtual sound sources at some position in the sound stage behind the loudspeakers or even inside the listening area. In other words, contrary to traditional spatialization techniques such as stereo or surround sound, the localization of virtual sources in WFS does not depend on or change with the listener’s position.

When using sound reproduction based on WFS, sound fields can be generated in a spatially and temporally correct way. Therefore, listeners experience the feeling that the origin of the sound is actually in the position of the virtual sources. Furthermore, the synthesized wave field is correct for an extended listening area, with much larger dimensions than the “sweet spot” of the current surround systems, such as the commercial 5.1 channel surround.

The major drawback is that the number of speakers needed for an acceptable sound field representation is very high (usually in the order of hundreds). Moreover, WFS algorithm requires a considerable amount of computational power. As a consequence, three-dimensional WFS systems are still not practical although mathematical formulations are already available.

See more at :


Jens Ahrens – Analytic methods of sound field synthesis [Recurs electrònic] 

Basilio Pueo Ortega  – Analysis and enhancement of multiactuator panels for wave field synthesis reproduction (Anexo A).

Sergio Bleda Pérez – Contribuciones a la implementación de sistemas de Wavefield Synthesis  – Section 2.5 (Spanish)

Sasha Spors – The Theory of Wave Field Synthesis Revisited

Software:  –  Applet that simulates wave field synthesis – Open-source, cross-platform application for performing wave field synthesis with large speaker arrays