Running AlexNet in Caffe with 128×128 instead of 256×256 images, we observed a 5.4x speedup and a <2 percentage point drop in ImageNet accuracy:
Input | Crop | Top-1 accuracy | Top-5 accuracy | Frame rate at test-time |
256×256 | 227×227 | 57.1% | 80.2% | 624 fps |
128×128 | 99×99 | 55.9% (-1.2) | 78.7% (-1.5) | 3368 fps (5.4x speedup) |
Reducing the input data size reduces the amount of work that every convolutional layer needs to perform.
Details:
- In Caffe’s default AlexNet configuration, we train and test with 256×256 images, with randomized 227×227 crops for training and central 227×227 crops for testing.
- In our 128×128 experiment, we train and test with 99×99 crops. (256-227=29, and 128-99=29.) 99×99 crops contain 5.25x fewer pixels than 227×227 crops. Other than this dimension change, our 128×128 experiments are identical to the default Caffe AlexNet configuration.
- Speed tests were performed on an NVIDIA K40 with CUDA 6.5, and Caffe compiled with cuDNN version 1.
- For 256×256 images, Alex Krizhevsky et al reported slightly higher accuracy (~82% top-5) than we are achieving in Caffe. This may be related to data augmentation settings in training and/or testing.
- Trained on ILSVRC2012-train, tested on ILSVRC2012-val.
Post a Comment