Map-CNN: A Convolutional Neural Network with Map-like Organizations


November 15, 2017 - 1:00pm
Northwest Building, Room 243
About the Speaker
Chen-Ping Yu
Speaker Title: 
Recent Post-doctoral Fellow, Konkle Lab
Founder and CEO of Phiar Technologies

Deep convolutional neural networks (CNNs) are currently the best computational models of visual processing. A core operation of these models is convolution: each artificial neuron of a CNN performs a sweep through the entire input image to produce a response profile. In contrast, neurons in the visual cortex have receptive fields, which are tuned to particular features at particular locations, though a common assumption is that a small set of features are replicated in hypercolumns uniformly across all positions in the retinotopic map. Here we examined this assumption using a computational model with map-like early layers.
We constructed a map-CNN in which the artificial neurons in the map layer have a spatial organization and receptive field scaling similar to human V1. First, retinotopy was implemented with local convolutions of unshared weights, with neurons organized in a grid-like layout. Second, a retina-like transformation to the input image was applied, such that images are compressed with increasing distance from the center. The combination of these designs naturally captures both cortical magnification of the fovea and the receptive field size scaling with eccentricity. Finally, the network was trained on 1000-way object classification using the ImageNet dataset.
We found that the features learned at each position of the visual field were not uniform, violating the convolutional assumption about the features represented across the visual field. Explorations of these tunings show that foveal map units (< 5°) had more gaussian-blob tuning than peripheral map units, and that while edge filters were learned uniformly across the visual field, the orientations of those edge features exhibited substantial positional biases.
These results demonstrate that features learned from natural image statistics in order to perform successful object recognition are naturally heterogeneous across the visual field, and make testable predictions for the spatial distribution of feature tuning in retinotopic areas.