Forrest Iandola

Running AlexNet in Caffe with 128×128 instead of 256×256 images, we observed a 5.4x speedup and a <2 percentage point drop in ImageNet accuracy:

Reducing the input data size reduces the amount of work that every convolutional layer needs to perform.

Details:

In Caffe’s default AlexNet configuration, we train and test with 256×256 images, with randomized 227×227 crops for training and central 227×227 crops for testing.
In our 128×128 experiment, we train and test with 99×99 crops. (256-227=29, and 128-99=29.) 99×99 crops contain 5.25x fewer pixels than 227×227 crops. Other than this dimension change, our 128×128 experiments are identical to the default Caffe AlexNet configuration.
Speed tests were performed on an NVIDIA K40 with CUDA 6.5, and Caffe compiled with cuDNN version 1.
For 256×256 images, Alex Krizhevsky et al reported slightly higher accuracy (~82% top-5) than we are achieving in Caffe. This may be related to data augmentation settings in training and/or testing.
Trained on ILSVRC2012-train, tested on ILSVRC2012-val.

Accelerating AlexNet by Reducing Image Resolution