Untangling object recognition: How does the visual system achieve
“invariant” object representation.

James DiCarlo

DiCarlo Lab, McGovern Institute for Brain Research, MIT

Although object recognition is fundamental to our behavior and seemingly effortless, it is a
remarkably challenging computational problem because the visual system must somehow
tolerate tremendous image variation produced by different views of each object
(the“invariance” problem). To understand how the primate brain accomplishes this
remarkable feat, we must understand how sensory input is transformed from an initial
neuronal population representation (a photograph on the retina), to a new, remarkably
powerful form of neuronal population representation at the highest level of the primate ventral
visual stream (inferior temporal cortex, IT).

In this talk, I will review our results on the ability of the IT population representation to support
position-, scale- and clutter- tolerant recognition. I will present a geometric perspective for
thinking about how this ventral visual stream constructs this representation (“untangling”
object manifolds). Finally, I will show our recent neurophysiological and psychophysical
results that suggest that this untangling is driven by the spatiotemporal statistics of
unsupervised natural visual experience. Our longterm goal is to use the understanding of this
biological computation to inspire artificial vision systems, to aid the development of visual
prosthetics, to provide guidance to molecular approaches to repair lost brain function,
and to obtain deep insight into how the brain represents sensory information in a way that is
highly suited for cognition and action.