Source Localization

Introduction extracted from [1]

The human ability to distinguish when a sounding object is close or far from us is completely developed when we are just a few months old. In fact, the development of the localization mechanisms used by the human auditory system takes place before being one year old. The localization of sound sources is possible because the human brain analyzes all the signals arriving through our ears, using subtle differences in intensity and other spectral and timing cues to recognize the direction of one or even several sound sources.

While localizing sound sources does not require any special effort for a human subject, for machines, sound source localization in a room is a complicated process since not all the sound objects have the same spectral properties, they occur at different time instants and at different spatial positions, and the process is strongly affected by reflections. Acoustic reflections dominate the perception of sound in a room modifying the spatial characteristics of the perceived sources.

Over the last decades, the scientific community has dedicated many efforts to localize sound sources in space by means of microphone array systems and, today, achieving high localization performance is still a challenge. Microphone arrays have been used in many applications, such as speech recognition, teleconferencing systems, hands-free speech acquisition, digital hearing aids, video-gaming or autonomous robots.

Algorithms for SSL can be broadly divided into indirect and direct approaches.

  • Indirect approaches usually follow a two-step procedure: they first estimate the Time Difference Of Arrival (TDOA) between microphone pairs and, afterwards, they estimate the source position based on the geometry of the array and the estimated delays.
  • Direct approaches perform TDOA estimation and source localization in one single step by scanning a set of candidate source locations and selecting the most likely position as an estimate of the source location. In addition, information theoretic approaches have also shown to be significantly powerful in source localization tasks.

The SRP-PHAT algorithm is a direct approach that has been shown to be very robust under difficult acoustic conditions. The algorithm is commonly interpreted as a beamforming-based approach that searches for the candidate source position that maximizes the output of a steered delay-and-sum beamformer.

Lecture Contents

In this lecture, we covered the general techniques to perform Source Localization.

  1. SL using TDOA hyperbolic positioning – more info [1]
  2. Beamforming Basics- more info [2,3,4]
    1. Introduction
    2. Delay and Sum beamforming
    3. Weighting Sum beamforming
    4. Near-field beamforming
  3. TDE estimation via Generalized Cross Correlation (GCC)- more info [1]
  4. SPR-PHAT Method – more info [1]
  5. Conclusions


[1] A. Martí Guerola, “Multichannel Audio Processing for Speaker Localization, Separation and Enhancement”. PhD Thesis. Universitat Politécnica de Valencia, Valencia, Spain.

[2] Iain McCowan, “Microphone Arrays : A Tutorial”


[4] Microphone Beamforming Simulation Tool