CIFAR-10, CIFAR-100 training with Convolutional Neural Network


[Update 2017.06.11] Add chainer v2 code

Writing your CNN model

This is example of small Convolutional Neural Network definition, CNNSmall


I also made a slightly bigger CNN, called CNNMedium,


It is nice to know the computational cost for Convolution layer, which is approximated as,

$$ H_I \times W_I \times CH_I \times CH_O \times k ^ 2 $$

  • \( CH_I \)  : Input image channel
  • \( CH_O \) : Output image channel
  • \( H_I \)     : Input image height
  • \( W_I \)    : Input image width
  • \( k \)           : kernal size (assuming same for width & height)


In above CNN definitions, the size of the channel is bigger for deeper layer. This can be understood by calculating the computational cost for each layer. 

When L.Convolution2D with stride=2 is used, the size of image become almost half. This means \( H_I\) and \( W_I \) becomes small value, so \(CH_I \) and \( CH_O \) can take larger value.

[TODO: add computational cost table for CNN Medium example]

Training CIFAR-10

Once you have written CNN, it is easy to train this model. The code,, is quite similar to MNIST training code.

Only small difference is the dataset preparation for CIFAR-10,

and model setup


The whole source code is the following,

See how clean the code is! Chainer abstracts the training process and thus the code can be reusable with other deep learning training.


[hands on] Try running train code.

Below is example in my environment

  • CNNSmall model

Chainer extension, PlotReport will automatically create the graph of loss and accuracy for each epoch.


We can achieve around 65% validation accuracy with such a easy CNN construction.

  • CNNMedium


As expected, CNNMedium takes little bit longer time for computation but it achieves higher accuracy for training data.

※ It is also important to notice that validation accuracy is almost same between CNNSmall and CNNMedium, which means CNNMedium may be overfitting to the training data. To avoid overfitting, data augmentation (flip, rotate, clip, resize, add gaussian noise etc the input image to increase the effective data size) technique is often used in practice. 

Training CIFAR-100

Again, training CIFAR-100 is quite similar to the training of CIFAR-10.

See Only the difference is model definition to set the output class number (model definition itself is not changed and can be reused!!).

and dataset preparation


[hands on] Try running train code.




We have learned how to train CNN with Chainer. CNN is widely used many image processing tasks, not only image classification. For example,

  • Bounding Box detection
    • SSD, YoLo V2
  • Semantic segmentation
    • FCN
  • Colorization
    • PaintsChainer
  • Image generation
    • GAN
  • Style transfer
    • chainer goph
  • Super resolution
    • SeRanet

etc. Now you are ready to enter these advanced image processing with deep learning!

[hands on]

Try modifying the CNN model or create your own CNN model and train it to see the computational speed and its performance (accuracy). You may try changing following

  • model depth
  • channel size of each layer
  • Layer (Ex. use F.max_pooling_2d instead of L.Convolution2D with stride 2)
  • activation function (F.relu to F.leaky_relu, F.sigmoid, F.tanh etc…) 
  • Try inserting another layer, Ex. L.BatchNormalization or F.dropout.

You can refer Chainer example codes to see the network definition examples.

Also, try configuring hyper parameter to see the performance

  • Change optimizer
  • Change learning rate of optimizer


Next: CIFAR-10, CIFAR-100 inference code

Sponsored Links

Leave a Reply

Your email address will not be published.