Source Localization

Introduction extracted from [1]

The human ability to distinguish when a sounding object is close or far from us is completely developed when we are just a few months old. In fact, the development of the localization mechanisms used by the human auditory system takes place before being one year old. The localization of sound sources is possible because the human brain analyzes all the signals arriving through our ears, using subtle differences in intensity and other spectral and timing cues to recognize the direction of one or even several sound sources.

While localizing sound sources does not require any special effort for a human subject, for machines, sound source localization in a room is a complicated process since not all the sound objects have the same spectral properties, they occur at different time instants and at different spatial positions, and the process is strongly affected by reflections. Acoustic reflections dominate the perception of sound in a room modifying the spatial characteristics of the perceived sources.

Over the last decades, the scientific community has dedicated many efforts to localize sound sources in space by means of microphone array systems and, today, achieving high localization performance is still a challenge. Microphone arrays have been used in many applications, such as speech recognition, teleconferencing systems, hands-free speech acquisition, digital hearing aids, video-gaming or autonomous robots.

Algorithms for SSL can be broadly divided into indirect and direct approaches.

  • Indirect approaches usually follow a two-step procedure: they first estimate the Time Difference Of Arrival (TDOA) between microphone pairs and, afterwards, they estimate the source position based on the geometry of the array and the estimated delays.
  • Direct approaches perform TDOA estimation and source localization in one single step by scanning a set of candidate source locations and selecting the most likely position as an estimate of the source location. In addition, information theoretic approaches have also shown to be significantly powerful in source localization tasks.

The SRP-PHAT algorithm is a direct approach that has been shown to be very robust under difficult acoustic conditions. The algorithm is commonly interpreted as a beamforming-based approach that searches for the candidate source position that maximizes the output of a steered delay-and-sum beamformer.

Lecture Contents

In this lecture, we covered the general techniques to perform Source Localization.

  1. SL using TDOA hyperbolic positioning – more info [1]
  2. Beamforming Basics- more info [2,3,4]
    1. Introduction
    2. Delay and Sum beamforming
    3. Weighting Sum beamforming
    4. Near-field beamforming
  3. TDE estimation via Generalized Cross Correlation (GCC)- more info [1]
  4. SPR-PHAT Method – more info [1]
  5. Conclusions


[1] A. Martí Guerola, “Multichannel Audio Processing for Speaker Localization, Separation and Enhancement”. PhD Thesis. Universitat Politécnica de Valencia, Valencia, Spain.

[2] Iain McCowan, “Microphone Arrays : A Tutorial”


[4] Microphone Beamforming Simulation Tool


Wave Field Synthesis

WFS is a spatial sound rendering technique that generates a true sound field using loudspeaker arrays [Berkhout et al., 1993]. Wave fields are synthesized based on virtual sound sources at some position in the sound stage behind the loudspeakers or even inside the listening area. In other words, contrary to traditional spatialization techniques such as stereo or surround sound, the localization of virtual sources in WFS does not depend on or change with the listener’s position.

When using sound reproduction based on WFS, sound fields can be generated in a spatially and temporally correct way. Therefore, listeners experience the feeling that the origin of the sound is actually in the position of the virtual sources. Furthermore, the synthesized wave field is correct for an extended listening area, with much larger dimensions than the “sweet spot” of the current surround systems, such as the commercial 5.1 channel surround.

The major drawback is that the number of speakers needed for an acceptable sound field representation is very high (usually in the order of hundreds). Moreover, WFS algorithm requires a considerable amount of computational power. As a consequence, three-dimensional WFS systems are still not practical although mathematical formulations are already available.

See more at :


Jens Ahrens – Analytic methods of sound field synthesis [Recurs electrònic] 

Basilio Pueo Ortega  – Analysis and enhancement of multiactuator panels for wave field synthesis reproduction (Anexo A).

Sergio Bleda Pérez – Contribuciones a la implementación de sistemas de Wavefield Synthesis  – Section 2.5 (Spanish)

Sasha Spors – The Theory of Wave Field Synthesis Revisited

Software:  –  Applet that simulates wave field synthesis – Open-source, cross-platform application for performing wave field synthesis with large speaker arrays

BinauralSIM released!

BinauralSIM_logoWe have the pleasure to present the beta release of BinauralSIM


BinauralSIM allows to simulate binaural recordings using headphones.

Try out your own or the Kemar HRIR and test your own signals. The software allows you to compare stereo vs binaural performance in real time.




BinauralSIM has been developed by Nadine Kroher and founded by PLA D’AJUTS DE SUPORT A LA QUALITAT I LA INNOVACIÓ DOCENTS PLAQUID, Universitat Pompeu Fabra, 2015.

Stereo and multi-loudspeaker reproduction

Some notions before starting:


Monophonic Sound

Monophonic sound is sound created by one channel or speaker and is also known as Monaural or High-Fidelity sound. Monophonic sound was replaced by Stereo or Stereophonic sound in the 1960s.

Stereophonic Sound

Stereo or Stereophonic sound is created by two independent audio channels or speakers and provides a sense of directionality because sounds can be heard from different directions.

The term stereophonic is derived from the Greek words stereos, which means solid and phone, which means sound. Stereo sound can reproduce sounds and music from various directions or positions the way we hear things naturally, hence the term solid sound. Stereo sound is a common form of sound reproduction.

Multichannel Surround Sound

Multichannel sound, also known as surround sound, is created by at least four and up to seven independent audio channels or speakers placed in front of and behind the listener that surrounds the listener in sound. Multichannel sound can be enjoyed on DVD music discs, DVD movies and some CDs.

This Lecture we describe the principles of two channel stereo, analise the most common configurations for Multichannel reproduction and briefly describe the most used Stereo Recording techniques.

An detailed overview is depicted below:

  1. Introduction
  2. Two loudspeaker Stereo – More info in [1,2,3]
    1. Two channel (2-0) stereo
      1. Basic principles of loudspeaker stereo: ‘Blumlein Stereo’
      2. Cross-Talk
      3. Basic principles of loudspeaker stereo
      4. Intensity Stereo
      5. Time Difference Stereo
    2. Basic two-channel signal formats
    3. Limitations of two-channel loudspeaker stereo
  3. Multichannel stereo and surround systems – More info in [1]
    1. Three channel stereo (3-0)
    2. Four-channel surround (3-1 stereo)
    3. Channel Surround (3-2 stereo)
    4. Other multichannel configurations
      1. (7.1 channel surround)
      2. (10.2 channel surround)
  4. Surround Sound Systems – More info in [1]
  5. Matrixed surround sound systems – More info in [1]
    1. Dolby Stereo, Surround and Prologic
    2. Circle Surround
    3. Lexicon Logic 7
    4. Dolby EX
  6. Digital surround sound formats – More info in [1]
    1. Dolby Digital
    2. MPEG
  7. Stereo Recording Techniques – More info in [3, 4]
    1. X-Y technique
    2. A-B technique
    3. ORTF technique (Mix technique)


[1] F. Rumsey and T. McCormick – Sound and recording (Chapter 3 and 4)

[2] V. Pulkki “Compensating displacement of amplitude-panned virtual sources.” Audio Engineering Society 22th Int. Conf. on Virtual, Synthetic and Entertainment Audio pp. 186-195. 2002 Espoo, Finland

[3] Bennett et al. – A new approach to the assessment of stereophonic

[4] Bruce Barlett, Jenny Barlett – On Location Recording Techniques


Spatial Audio Psychoacoustics

From [2]

Most research into the mechanisms underlying directional sound perception conclude that there are two primary mechanisms at work, the importance of each depending on the nature of the sound signal and the conflicting environmental cues that may accompany discrete sources. These broad mechanisms involve the detection of timing or phase differences between the ears, and of amplitude or spectral differences between the ears. The majority of spatial perception is dependent on the listener having two ears, although certain monaural cues have been shown to exist – in other words it is mainly the differences in signals received by the two ears that matter.

In this lecture we cover issues related to the perception and cognition of spatial sound as it relates to sound recordings and reproduction. The overview of the class is as follows:

  1. 3D Sound and Spatial Audio
  2. Important terms
  3. Geometric convention
  4. Introduction to sound localization
  5. The minimum audible angle (MAA)
  6. Acoustic cues used in localization
  7. Measurements
  8. Subjective Attributes of Spatial Sound (please read **, pages from 35-39 )
  9. Conclusions

More info at:

[1] A.Gelfand – Hearing: an introduction to psychological and physiological acoustics (Chapter 13)

[2] F. Rumsey and T. McCormick – Sound and recording (Chapter 2) **

Even more ….

[3] G. Kendall – A 3-D Sound Primer: Directional Hearing and Stereo Reproduction

[4] W. Yost – Fundamentals of hearing : an introduction

[5] J. Blauert – Spatial hearing : the psychophysics of human sound localization 

[6] B. Moore – An introduction to the psychology of hearing

Binaural Reproduction using Loudspeakers

Can we listen to binaural recordings through normal speakers?

Of course. Even when binaural recordings for headphones still sound fantastic on speakers – you just won’t be able to get the 3D effect. That’s because as the sound travels through room, the left and right channels mix and your brain can’t make sense of the directional cues (cross-talk). Various works “has been/are being” developed to cancel this crosstalk and thus, allow proper 3D listening on speakers.

In this lecture, we revise the principles of Transaural recordings and learn the basis of CrossTalk Cancellation proposed by Schroeder. The contents of the lecture and the related references are depicted below:

1.- Introduction

2.- Crosstalk Introduction

3.- CrossTalk Cancellation

4.- Conclusions

More info at:

Binaural Reproduction using Headphones

Binaural recordings are reproductions of sound “the way human ears hear it”. Actually, the word “binaural” literally just means “using both ears.” When you listen to a binaural recording through headphones, you perceive distinct and genuine 360° sound.

It’s the purest, most natural way to record and listen to music.

In this lecture, we cover the following topics about Binaural recording using Headphones. Further information can be found in the related references.

1.- Introduction to Binaural Audio

2.- The Stereo Reproduction of 3D Sound

3.- Headphone Reproduction: Quick Review

4.- Binaural Principles

    4.1.- Problems of Binaural Systems

5.- Binaural Recording

     5.1.- Measurement of Binaural IR

     5.2.- Excitation Signal

     5.3.- Using DSP for HRTFs

     5.4.- Collecting HRTF Measurements

     5.5.- Equalization

     5.6.- Data Reduction of HRTFs

     5.7.- Head Tracking