|Input||Crop||Top-1 accuracy||Top-5 accuracy||Frame rate at test-time|
|128×128||99×99||55.9% (-1.2)||78.7% (-1.5)||3368 fps (5.4x speedup)|
Reducing the input data size reduces the amount of work that every convolutional layer needs to perform.
- In Caffe’s default AlexNet configuration, we train and test with 256×256 images, with randomized 227×227 crops for training and central 227×227 crops for testing.
- In our 128×128 experiment, we train and test with 99×99 crops. (256-227=29, and 128-99=29.) 99×99 crops contain 5.25x fewer pixels than 227×227 crops. Other than this dimension change, our 128×128 experiments are identical to the default Caffe AlexNet configuration.
- Speed tests were performed on an NVIDIA K40 with CUDA 6.5, and Caffe compiled with cuDNN version 1.
- For 256×256 images, Alex Krizhevsky et al reported slightly higher accuracy (~82% top-5) than we are achieving in Caffe. This may be related to data augmentation settings in training and/or testing.
- Trained on ILSVRC2012-train, tested on ILSVRC2012-val.