A Texture-Representation Model of Visual Crowding ... And Visual Search?

 

Ruth Rosenholtz,

Massachusetts Institute of Technology

 

Crowding refers to visual phenomena in which identification of a target stimulus is
significantly impaired by the presence of nearby stimuli, or flankers. This is particularly
noticeable in peripheral vision, where even fairly distant items cause crowding. Observers
have described the percept asa "jumble," with a mixing of features between items in the
display.

We propose that the visual system locally represents stimuli by the joint statistics of
responses of cells sensitive to different position, phase, orientation and scale. This statistical,
or "texture" representation predicts the subjective "jumble" of features often associated with
crowding. We will show early results suggesting that the difficulty of performing an
identification task with this representation of the stimuli is correlated with performance under
conditions of crowding. Our results provide evidencefor a unified neuronal representation for
perception across a wide range of conditions including crowded perception, ordinary pattern
recognition and texture perception. Perhaps the odd phenomenon of crowding is merely the
natural outcome of the tradeoff the visual system must accomplish between sensitivity to
differences and tolerance to irrelevant changes such as small shifts in position.

Is there a connection between the "jumble" of features perceived under crowding, and the
lack of "binding" in the absence of "focal attention" presumed to underlie difficult visual
search tasks? I will discuss recent work examining the feasiblity of a model of visual search
based upon the same texture representation we propose underlies visual crowding.

(This is work done in collaboration with Ben Balas, Lisa Nakano, and Stephanie Chan.
Thanks also to Ronald van den Berg, Krista Ehinger, Ted Adelson, and Alvin Raj.)