Learning midlevel auditory codes from natural sound statistics
From MaRDI portal
Publication:5157142
Abstract: Interaction with the world requires an organism to transform sensory signals into representations in which behaviorally meaningful properties of the environment are made explicit. These representations are derived through cascades of neuronal processing stages in which neurons at each stage recode the output of preceding stages. Explanations of sensory coding may thus involve understanding how low-level patterns are combined into more complex structures. Although models exist in the visual domain to explain how mid-level features such as junctions and curves might be derived from oriented filters in early visual cortex, little is known about analogous grouping principles for mid-level auditory representations. We propose a hierarchical generative model of natural sounds that learns combinations of spectrotemporal features from natural stimulus statistics. In the first layer the model forms a sparse convolutional code of spectrograms using a dictionary of learned spectrotemporal kernels. To generalize from specific kernel activation patterns, the second layer encodes patterns of time-varying magnitude of multiple first layer coefficients. Because second-layer features are sensitive to combinations of spectrotemporal features, the representation they support encodes more complex acoustic patterns than the first layer. When trained on corpora of speech and environmental sounds, some second-layer units learned to group spectrotemporal features that occur together in natural sounds. Others instantiate opponency between dissimilar sets of spectrotemporal features. Such groupings might be instantiated by neurons in the auditory cortex, providing a hypothesis for mid-level neuronal computation.
Recommendations
Cites work
- A Hierarchical Bayesian Model for Learning Nonlinear Statistical Regularities in Nonstationary Natural Signals
- Imposing sparsity on the mixing matrix in independent component analysis.
- Natural image statistics. A probabilistic approach to early computational vision.
- Sparse spectrotemporal coding of sounds
- The spectro-temporal receptive field. A functional characteristic of auditory neurons
Cited in
(9)- Learning the higher-order structure of a natural sound
- Decomposition and integration of monosyllabic information for auditory perceptual process
- Basic maps in the auditory midbrain
- Sparse spectrotemporal coding of sounds
- Sound retrieval and ranking using sparse auditory representations
- Symbols as self-emergent entities in an optimization process of feature extraction and predic\-tions
- Memory stacking in hierarchical networks
- A Bayesian Mallows approach to nontransitive pair comparison data: how human are sounds?
- A computational account of the role of cochlear nucleus and inferior colliculus in stabilizing auditory nerve firing for auditory category learning
This page was built for publication: Learning midlevel auditory codes from natural sound statistics
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5157142)