Auditory Scene Analysis

Auditory Scene Analysis

The animation above (can't be viewed on mobile) attempts to illustrate what we do automatically - we create the perception of an auditory scene out of the complex mixture of the pressure waves hitting our eardrums. we group together signals as coming from the same auditory object - this is not a simple task as lots of signals overlap in location, frequency, amplitude, etc. This is achieved using a lot of auditory processing power.

There's a phrase used in linguistics "poverty of data" which equally applies in auditory perception - it means that there isn't enough data in the signals arriving at the ear to construct a definitive conclusion of the auditory objects in the scene so we have to use lots of efficient ways to solve this moment to moment.

How this is achieved is the puzzle that auditory scene analysis attempts to tease out.

Auditory Scene Analysis (ASA) is a research area in auditory perception which is attempting to answer how we make sense of the mixture of pressure waves that hit our ears from the likes of an orchestra as the animation above tries to represent. How we analyse this mixture of intertwined signals, create auditory objects from it & ultimately an auditory scene of moving objects within it. We do the same with visual perception where the neurological signals from the firing of the red, green & blue rod cells in the fovea send their signals along the optic nerve where it is analysed by parts of the brain into visual objects & a dynamic visual scene.

 

My interest in this research is because it has a direct bearing on our audio playback systems & what they are ultimately trying to achieve - an illusion of realism - a sense of being transported to the venue or the performance being located in your room.

 

ASA research is relatively new having been started in 1990 by the writings of Al Bregman & much is still to be discovered but my belief is that this understanding represents the best way forward for understanding the finer nuances of audio reproduction - the finer subtleties that result in our impression of more realism & a more believable reproduced sound.

 

What ASA sheds light on is a new understanding/explanation of what we hear in audio playback systems & how we evaluate such systems. We commonly read now listening reports mentioning more solidity to the sound stage, more defined separation of instruments/voices (more air around sound elements), more accurate timbre, more dynamics, more low level details, more body - essentially more realism to the sound. These perceptions are not yet readily explained by the common measurements used in audio testing. My best hypotheses for all these attributes cropping up again & again is that what has changed is that we are hearing better portrayal of the sound cues that ASA uses to construct its internal auditory objects & auditory scenes.

The complexity of the cues & rules used by ASA are being teased out in the research but we are far from being able to identify & measure just what is changing in the sound field between a flat, non-dynamic, uninteresting sound & one that is realistic sounding. It's not just frequency & amplitude which are responsible for what we perceive. How to measure these cues in a playback system requires a comprehensive grasp of the complex relationship between elements within the dynamic signal & the development of a test methodology to probe these dynamic relationships. We are a long way away from achieving this, at the moment.