WFS Visualizer

WFS Visualizer by Matt Montag (github).

Reproduced from his web page (all credits belong to Matt Montag).

This is a Processing sketch/Java applet that simulates wave field synthesis. A virtual source follows the position of the mouse cursor. It’s useful for visualizing the behavior and limitations of WFS.

Latest Version: March 29, 2011

Key Commands:

p Toggle primary wave
1/2 Increase/decrease resolution
q/w Adjust tapering profile (cosn)
Left arrow/Right arrow Decrease/increase number of loudspeakers
Up arrow/Down arrow Increase/decrease array spacing
[/] Decrease/increase signal wavelength
s Change signal waveform (sine, noise, and saw)

Applet not supported.


Ambisonics lecture notes

Ambisonics is a method of codifying a sound field taking into account its directional properties. Instead of every channel having the signal what every loudspeaker should be emitting, as in stereo or 5.1 surround, every Ambisonics channel has information about certain physical properties of the acoustic field, such as the pressure or the acoustic velocity.

Since it is relatively difficult to find good introductory material about Ambisonics, you can find some preliminary lecture notes at:

Lecture notes on Ambisonics (version 0.3)

This material has been elaborated by Daniel Arteaga and published under Creative Commons Attributtion-ShareAlike 4.0 International License.

Many things are missing or can bee improved, like a better list of references, historical remarks, a better exposition of concepts, et a revision of the English, typographical revision, etc. Additionally, it will probably contain errors. Use it under your own risk!

Stereo and surround recording techniques

In this entry we are going to recall the main techniques for stereo recording:

  • Coincident microphones or intensity stereophony. It uses level differences between the two microphones, with no time differences.
    • XY pair. Consists in two directional microphones at the same place with opening angles which typically go from 90 to 150º.
    • Blumlein pair. Consists in two figure of eight microphones angled 90º.
      Blumlein -Stereo
    • Mid/side pair, or MS pair. Employs a bidirectional (or figure-of-8) microphone pointing sideways (side, or S) plus either an omnidirectional or a variant of a cardioid microphone pointing forward (mid, or M). L and R signals can be formed by a linear combination of S and M.
      MS stereo
  • Spaced microphones, time-of-arrival stereophony or AB technique. This techique consists in having two omnidirectional microphones separated a distance which goes from a few tenths of cm to a few meters. Signal levels are mostly the same, at least with small separations, and stereophony is due to time-of-arrival differences only.
    AB Stereo
  • Near coincident pair techniques, the most common of which is the ORTF system. It consists on two cardioid microphones opened 110º and spaced 17 cm.

By checking some references on stereo microphony, like

try to answer the following questions:

  • What are the advantages and incovenients of each one of the techniques?
  • How is the stereo image perceived in each one of the cases?
  • What kind of directivity patterns one can get with the MS technique? How to go from MS to LR signals?
  • Is it always the optimal thing to place two microphones near the stage in the approximate positions the playback loudspeakers would be? Why?

Also, there are microphone configurations directly suitable for recording directly in 5.1 stereo surround. See for instance:

Surround sound recording techniques are based on the same principles (summing localization — time and/or intenisty differences). They try to capture the main soundstage in the L, C, R channels and leave ambience and reverberation for Ls and Rs. In general, listening to surround sound reduces the stereo separation because of the center channel, and surround recording techniques try to counteract this effect.

Physical acoustics overview

In order to study the spatial properties of audio we need to know what is sound and how it behaves in space: how it propagates, what energy it does carry, how it does interact with other sound sources.

The first lecture of the course is therefore a 2h introduction to acoustics, and therefore it is a summary of what you already know from previous acoustics courses:

The topics we will treat will be:

  1. Waves on the air
    • The wave equation
    • Plane waves
    • Spherical waves
    • Energy and intensity
    • Decibel scale
  2. Features of the acoustic field
    • Frequencies and energies
    • Coherence and incoherence. Interference
    • Reflexion and reverberation
  3. Waves in a room
    • Room modes and geometric acoustics
    • The impulse response: direct sound, early reflections, reverberation
    • Reverberation time
    • Diffuse sound field
  4. Green function method
    • Wave equation in presence of sources
    • Diffraction and the Huygens principle
    • Green function in the free field

As references, any good acoustics book will do, like Kinsler‘s book, or even basic physics textbooks such as the one by Tipler. You can also check your lecture notes for previous courses.

Longitudinal wave traveling through a material medium. Extracted from Dan Roussell’s Acoustics and Vibration Animations

Also you can get good acoustics simulations in Dan Roussell’s Acoustics and Vibrations Animations web page. See in particular the following animations:

New course 2015-16: Introduction to Spatial and 3D audio

Welcome new students to the blog of the subject “3D audio”. We hope you will find this resource useful.

By spatial audio we refer to the investigation of techniques for sound

  • recording and encoding
  • transmission
  • manipulation (postproduction)
  • exhibition

taking into account the spatial properties and the spatial nature of sound. Spatial audio would be a term of widespread academic usage.

By 3D audio we mostly refer to the spatial audio techniques which  go beyond traditional stereo or 5.1/7.1 surround and which normally include height. It is rather a commercial term.


Source Localization

Introduction extracted from [1]

The human ability to distinguish when a sounding object is close or far from us is completely developed when we are just a few months old. In fact, the development of the localization mechanisms used by the human auditory system takes place before being one year old. The localization of sound sources is possible because the human brain analyzes all the signals arriving through our ears, using subtle differences in intensity and other spectral and timing cues to recognize the direction of one or even several sound sources.

While localizing sound sources does not require any special effort for a human subject, for machines, sound source localization in a room is a complicated process since not all the sound objects have the same spectral properties, they occur at different time instants and at different spatial positions, and the process is strongly affected by reflections. Acoustic reflections dominate the perception of sound in a room modifying the spatial characteristics of the perceived sources.

Over the last decades, the scientific community has dedicated many efforts to localize sound sources in space by means of microphone array systems and, today, achieving high localization performance is still a challenge. Microphone arrays have been used in many applications, such as speech recognition, teleconferencing systems, hands-free speech acquisition, digital hearing aids, video-gaming or autonomous robots.

Algorithms for SSL can be broadly divided into indirect and direct approaches.

  • Indirect approaches usually follow a two-step procedure: they first estimate the Time Difference Of Arrival (TDOA) between microphone pairs and, afterwards, they estimate the source position based on the geometry of the array and the estimated delays.
  • Direct approaches perform TDOA estimation and source localization in one single step by scanning a set of candidate source locations and selecting the most likely position as an estimate of the source location. In addition, information theoretic approaches have also shown to be significantly powerful in source localization tasks.

The SRP-PHAT algorithm is a direct approach that has been shown to be very robust under difficult acoustic conditions. The algorithm is commonly interpreted as a beamforming-based approach that searches for the candidate source position that maximizes the output of a steered delay-and-sum beamformer.

Lecture Contents

In this lecture, we covered the general techniques to perform Source Localization.

  1. SL using TDOA hyperbolic positioning – more info [1]
  2. Beamforming Basics- more info [2,3,4]
    1. Introduction
    2. Delay and Sum beamforming
    3. Weighting Sum beamforming
    4. Near-field beamforming
  3. TDE estimation via Generalized Cross Correlation (GCC)- more info [1]
  4. SPR-PHAT Method – more info [1]
  5. Conclusions


[1] A. Martí Guerola, “Multichannel Audio Processing for Speaker Localization, Separation and Enhancement”. PhD Thesis. Universitat Politécnica de Valencia, Valencia, Spain.

[2] Iain McCowan, “Microphone Arrays : A Tutorial”


[4] Microphone Beamforming Simulation Tool