Recurrent timing nets for auditory scene analysis

نویسنده

  • Peter Cariani
چکیده

We have recently proposed neural timing networks that operate on temporal fine structure of inputs to build up and separate periodic signals with different fundamental periods (Neural Networks, 14: 737-753, 2001). Simple recurrent nets consist of arrays of coincidence detectors fed by common input lines and conduction delay loops of different recurrence times. Short-term facilitation amplifies correlations between input and loop signals to amplify periodic patterns and segregate those with different periods, thereby allowing constituent waveforms to be recovered. Timing nets constitute a new, general strategy for scene analysis that builds up correlational invariances rather than feature-based labeling, segregation and binding of channels. I. PITCH AND AUDITORY SCENE ANALYSIS Perhaps the most basic function of a perceptual system is to coherently organize the incoming flux of sensory information into separate stable objects [1-4]. In hearing, sound components are fused into unified objects, streams and voices that exhibit perceptual attributes, such as pitch, timbre, loudness, and location. Common periodicity, temporal proximity (onset, duration, offset), frequency, amplitude dynamics, phase coherence, and location in auditory space are some of the factors that contribute to fusions and separations of sounds. For concurrent sounds, common harmonic structure plays perhaps the strongest role in forming unified objects and separating them [4, 5]. Harmonic complexes with different fundamentals produce strong pitches at their fundamentals, which can be heard even when the fundamental is “missing” from the power spectrum. As a rule of thumb, voices and musical instruments having the same fundamentals fuse, while those with fundamentals differing by more than a semitone (6%) can usually be separated. The mechanisms underlying pitch perception and auditory object formation therefore appear to intimately linked. One possibility is that the auditory system labels and segregates frequency channels according to pitch-related features (e.g. [6, 39]) and then binds them together at some later stage. Another possibility is that auditory objects are formed from the fine time structure of the acoustic stimulus and phase-locked neural responses. Objects would be formed from temporal pattern ⋅ .⋅ This work was supported by NSF-EIA-BITS-013807 invariances (phase coherences) whose period would then determine the perceived pitch of the object. In this view, object formation precedes analyses of object properties (pitch, timbre, loudness, location) (see [7, 8]). Recurrent neural timing nets are demonstrations of how the latter strategy for scene analysis might be implemented neurally. We cannot presently determine which strategy is used by real auditory systems because the central auditory mechanisms by which different voices with different fundamentals can be heard out are not yet well understood. In part this is due to the absence of a compelling theory of how pitch is represented and processed above the level of the auditory midbrain. Such a theory would need to account for pitches produced by both pure and complex tones and would need to explain how very fine pitch distinctions (< 1% in frequency) over very large dynamic ranges (> 80 dB) can be realized using neural elements whose responses are relatively coarsely tuned and highly level-dependent. Below the level of the midbrain, a large body of neurophysiological evidence [9, 10] and neurocomputational demonstration [11-16] does strongly suggest that the auditory system uses interspike interval information for pitch perception. For periodicities below roughly 5 kHz, the acoustic stimulus impresses its temporal structure on the temporal discharge patterns of auditory nerve fibers. The timings of individual spikes faithfully represent the phase structure of the stimulus, and the interspike intervals formed as a consequence of such phase-locked spike timings form an autocorrelation-like representation of the stimulus. Given that the pitches of low frequency pure tones and complex tones appear to be based on fine timing information, it is therefore entirely conceivable that stable auditory images are formed from the fine temporal structure of neural discharges (e.g. [17]). Separation of different periodic temporal patterns would be carried out on the basis of the coherence of temporal patterns by amplifying those patterns that recur in the stimulus (and consequently in neural activity patterns). A subsequent autocorrelation-like analysis based on interspike interval information would be carried out to subserve perception of the pitches and timbres of separated auditory objects. A central unsolved problem in auditory neurophysiology concerns how the central auditory system makes use of this superabundant peripheral fine timing information for sound separation and analysis. The fields of computational neuroscience, neural networks, and signal processing can make useful contributions to the 0-7803-7898-9/03/$17.00 ©2003 IEEE 1575 solution of this problem by formulating neural architectures and computational operations that can use fine timing (phase) information to do auditory scene analysis and pitch perception. Neurocomputational models then provide guides for finding neural populations in the auditory pathway that have the response characteristics needed to realize these functionalities. The current prevailing view among auditory neurophysiologists is that a time-to-place transformation is effected in the auditory pathway somewhere between the auditory nerve and the auditory cortex. In neural network terms, time-delay neural networks are thought to convert temporal input patterns to spatialized output patterns that are then analyzed more centrally by connectionist (Hopfield) networks that operate on average firing rates. J. C. R. Licklider’s original temporal autocorrelation network was an early time-delay neural network (TDNN) architecture [11, 12] that converted peripheral timing information to a spatial pattern of activity that represented the autocorrelation function of the stimulus (so as to account for the autocorrelation-like character of pitch perception). More recent proposals have involved tuned periodicity detectors [18], but unfortunately these modulation-tuned elements do not carry out the right kinds of operations for either pitch analysis [10] or object separation. Thus far, no neural temporal autocorrelators, true pitch detectors, spectral pattern analyzers, or, for that matter, any strong neural correlates of the pitches of complex tones have been found above the midbrain. For these reasons, we have strived to develop new heuristics for how auditory images might be formed and separated. One route is to explore alternative kinds of signal transformations[19] that would utilize temporal pattern information in novel, unforeseen ways. Another possible general solution to the problem is to keep the information in the time domain and to use mechanisms for temporal processing to form and then analyze auditory objects. This is the strategy outlined here. As an alternative to time-delay transformations, simple networks that operate on temporally-structured inputs to produce temporally-structured outputs were conceived and called “neural timing nets.” Both feedforward and recurrent networks were considered and their basic computational properties were explored [20, 21]. II. RECURRENT TIMING NETS Recurrent timing networks were inspired in different ways by models of stabilized auditory images [17], neural loop models [22], adaptive timing nets [23], adaptive resonance circuits [24], the precision of echoic memory, and the psychology of temporal expectation [25, 26]. Although much is known about time courses of temporal integration that are related to auditory percepts (pitch, timbre, loudness, location, object separation, and various masking effects), few neurocomputational models exist for how incoming information in the auditory periphery is integrated over time by the central auditory system to form stabilized auditory percepts. If the incoming information is indeed encoded in temporal patterns of spikes, then it is not unreasonable to consider possible neural architectures that store temporal patterns in (centrally disinhibited) reverberating circuits. One envisions the signals themselves circulating in closed transmission loops or regenerated via cellular recovery mechanisms. Reverberating temporal memory traces would be compared with incoming patterns via coincidencedetectors that compute temporal correlations. Neural representations would thus build up over time, dynamically creating sets of perceptual expectations that could either be confirmed or violated. Periodic signals, such as isochronous rhythms, would create the strongest temporal expectancies [27, 28]. The simplest recurrent timing networks imaginable in these terms consist of a 1-D array of coincidence detectors having common direct inputs (Figure 1). The output of each coincidence element is fed into a recurrent delay line such that the output of the element at time t circulates through the line and arrives tau milliseconds later (the signal that arrives back at time t is the one that was emitted at t – tau). A processing rule governs the interaction of direct and circulating inputs. In their development the networks have evolved from simple to more complex. In the first simulations [20], binary pulse trains (resembling spike trains) with repeated, randomly selected pulse patterns (e.g. 100101011100101011-100101011...) were passed through the network. For each time step, incoming binary pulses were multiplied by variable-amplitude pulses arriving through the delay loop. In the absence of a coincidence with a circulating pulse, the input pulse was fed into the delay loop without facilitation. When coincidences between incoming and circulating pulses occurred, the amplitude of the circulating pulse was increased by 5% and the pulse was fed back into the loop. It was quickly realized that Input waveform All time delays present

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Neural timing nets

Formulations of artificial neural networks are directly related to assumptions about neural coding in the brain. Traditional connectionist networks assume channel-based rate coding, while time-delay networks convert temporally-coded inputs into rate-coded outputs. Neural timing nets that operate on time structured input spike trains to produce meaningful time-structured outputs are proposed. Ba...

متن کامل

Recurrent timing nets for F0-based speaker separation

Arguably, the most important barrier to widespread use of automatic speech recognition systems in real-life situations is their present inability to separate speech of individual speakers from other sound sources: other speakers, acoustic clutter, background noise. We believe that critical examination of biological auditory systems with a focus on "reverseengineering" these systems, can lead to...

متن کامل

Representation of Pitch in Population-wide Interspike Interval Statistics

Temporal coding of sensory information can arise through exogenous, stimulus-locked temporal patternings of spikes or through endogenous, stimulus-triggered temporal patterns of response [1, 2]. Phase-locking to stimulus waveforms permits strategies for stimulus localization that are based on temporal cross-correlation across sensory receptors and strategies for representation of form that are ...

متن کامل

Recurrent Timing Neural Networks for Joint F0-Localisation Estimation

A novel extension to recurrent timing neural networks (RTNNs) is proposed which allows such networks to exploit a joint interaural time difference-fundamental frequency (ITD-F0) auditory cue as opposed to F0 only. This extension involves coupling a second layer of coincidence detectors to a two-dimensional RTNN. The coincidence detectors are tuned to particular ITDs and each feeds excitation to...

متن کامل

Temporal codes, timing nets, and music perception

Temporal codes and neural temporal processing architectures (neural timing nets) that potentially subserve perception of pitch and rhythm are discussed. We address 1) properties of neural interspike interval representations that may underlie basic aspects of musical tonality (e.g. octave similarities), 2) implementation of pattern-similarity comparisons between interval representations using fe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003