## Write predict code using concat_examples

This tutorial corresponds to 03_custom_dataset_mlp folder in the source code.

We have trained the model with own dataset, MyDataset, in previous post, let’s write predict code.

Source code:

## Prepare test data

It is not difficult for the model to fit to the train data, so we will check how the model is fit to the test data.

    # Load the custom dataset
dataset = MyDataset('data/my_data.csv')
train_ratio = 0.7
train_size = int(len(dataset) * train_ratio)
train, test = chainer.datasets.split_dataset_random(dataset, train_size, seed=13)

I used the same seed (=13) to extract the train and test data used in the training phase.

    # Load trained model
model = MyMLP(args.unit)  # type: MyMLP
if args.gpu >= 0:
chainer.cuda.get_device(args.gpu).use()  # Make a specified GPU current
model.to_gpu()  # Copy the model to the GPU
xp = np if args.gpu < 0 else cuda.cupy

serializers.load_npz(args.modelpath, model)

The procedure to load the trained model is

1. Instantiate the model (which is a subclass of Chain: here, it is MyMLP)
2. Send the parameters to GPU if necessary.
3. Load the trained parameters using serializers.load_npz function.

## Predict with minibatch

### Prepare minibatch from dataset with concat_examples

We need to feed minibatch instead of dataset itself into the model. The minibatch was constructed by the Iterator in training phase. In predict phase, it might be too much to prepare Iterator, then how to construct minibatch?

There is a convenient function, concat_examples, to prepare minibatch from dataset. It works as written in this figure.

• chainer.dataset.concat_examples(batch, device=None, padding=None)

Usually when we access dataset by slice indexing, for example dataset[i:j], it returns a list where data is sequential. concat_examples separates each element of data and concatenates it to generate minibatch.

You can use as follows,

from chainer.dataset import concat_examples

x, t = concat_examples(test[i:i + batchsize])
y = model.predict(x)
...

※ You can see more detail actual usage example code of concat_examples in dataset_introduction.ipynb, also refer official doc for more details.

### Predict code configuration

Predict phase has some difference compared to training phase,

1. Function behavior
– Expected behavior of some functions are different between training phase and validation/predict phase. For example, F.dropout is expected to drop out some unit in the training phase while it is better to not to drop out in validation/predict phase.These kinds of function behavior is handled by chainer.config.train configuration.
2. Back propagation is not necessary
When back propagation is enabled, the model need to construct computational graph which requires additional memory. However back propagation is not necessary in validation/predict phase and we can omit constructing computational graph to reduce memory usage.

This can be controlled by chainer.config.enable_backprop, and chainer.no_backprop_mode() function can be used for convenient method.

By considering above, we can write predict code in the MyMLP model as,

class MyMLP(chainer.Chain):

...

def predict(self, *args):
with chainer.using_config('train', False):
with chainer.no_backprop_mode():
return self.forward(*args)

Finally, predict code can be written as follows,

    # Predict
x_list = []
y_list = []
t_list = []
for i in range(0, len(test), batchsize):
x, t = concat_examples(test[i:i + batchsize])
y = model.predict(x)
y_list.append(y.data)
x_list.append(x)
t_list.append(t)

x_test = np.concatenate(x_list)[:, 0]
y_test = np.concatenate(y_list)[:, 0]
t_test = np.concatenate(t_list)[:, 0]
print('x', x_test)
print('y', y_test)
print('t', t_test)

## Plot the result

This is a regression task, so let’s see the difference between actual point and model’s predicted point.

    plt.figure()
plt.plot(x_test, t_test, 'o', label='test actual')
plt.plot(x_test, y_test, 'o', label='test predict')
plt.legend()
plt.savefig('predict.png')

which outputs this figure,

## Appendix: Refactoring predict code

Move predict function into model class: if you want to simplify main predict code in predict_custom_dataset1.py, you may move predict for loop into model side.

In MyMLP class, define predict2 method as

    def predict2(self, *args, batchsize=32):
data = args[0]
x_list = []
y_list = []
t_list = []
for i in range(0, len(data), batchsize):
x, t = concat_examples(data[i:i + batchsize])
y = self.predict(x)
y_list.append(y.data)
x_list.append(x)
t_list.append(t)

x_array = np.concatenate(x_list)[:, 0]
y_array = np.concatenate(y_list)[:, 0]
t_array = np.concatenate(t_list)[:, 0]
return x_array, y_array, t_array

then, we can write main predict code very simply,

"""Inference/predict code for MNIST
model must be trained before inference, train_mnist_4_trainer.py must be executed beforehand.
"""
from __future__ import print_function
import argparse
import time

import numpy as np
import six
import matplotlib.pyplot as plt

import chainer
import chainer.functions as F
from chainer import Chain, Variable, optimizers, serializers
from chainer import datasets, training, cuda, computational_graph
from chainer.dataset import concat_examples

from my_mlp import MyMLP
from my_dataset import MyDataset

def main():
parser = argparse.ArgumentParser(description='Chainer example: MNIST')
help='GPU ID (negative value indicates CPU)')
help='Number of units')
help='Number of images in each mini-batch')
args = parser.parse_args()

batchsize = args.batchsize
dataset = MyDataset('data/my_data.csv')
train_ratio = 0.7
train_size = int(len(dataset) * train_ratio)
train, test = chainer.datasets.split_dataset_random(dataset, train_size, seed=13)

model = MyMLP(args.unit)  # type: MyMLP
if args.gpu >= 0:
chainer.cuda.get_device(args.gpu).use()  # Make a specified GPU current
model.to_gpu()  # Copy the model to the GPU
xp = np if args.gpu < 0 else cuda.cupy

# Predict
x_test, y_test, t_test = model.predict2(test)
print('x', x_test)
print('y', y_test)
print('t', t_test)

plt.figure()
plt.plot(x_test, t_test, 'o', label='test actual')
plt.plot(x_test, y_test, 'o', label='test predict')
plt.legend()
plt.savefig('predict2.png')

if __name__ == '__main__':
main()

model prediction is written in one line of code,

x_test, y_test, t_test = model.predict2(test)

## Training code for MyDataset

This tutorial corresponds to 03_custom_dataset_mlp folder in the source code.

We have prepared your own dataset, MyDataset, in previous post. Training procedure for this dataset is now almost same with MNIST traning.

Differences from MNIST dataset are,

• Training data and validation/test data is not splitted in our custom dataset

## Model definition for Regression task training

Our task is to estimate the real value “t” given real value “x“, which is categorized as regression task.

We often use mean squared error as loss function, namely,

$$L = \frac{1}{D}\sum_i^D (t_i – y_i)^2$$

where $$i$$ denotes i-th data, $$D$$ is number of data, and $$y_i$$ is model’s output from input $$x_i$$.

The implementation for MLP can be written as my_mlp.py,

class MyMLP(chainer.Chain):

def __init__(self, n_units):
super(MyMLP, self).__init__()
with self.init_scope():
# the size of the inputs to each layer will be inferred
self.l1 = L.Linear(n_units)  # n_in -> n_units
self.l2 = L.Linear(n_units)  # n_units -> n_units
self.l3 = L.Linear(n_units)  # n_units -> n_units
self.l4 = L.Linear(1)    # n_units -> n_out

def __call__(self, *args):
# Calculate loss
h = self.forward(*args)
t = args[1]
self.loss = F.mean_squared_error(h, t)
reporter.report({'loss': self.loss}, self)
return self.loss

def forward(self, *args):
# Common code for both loss (__call__) and predict
x = args[0]
h = F.sigmoid(self.l1(x))
h = F.sigmoid(self.l2(h))
h = F.sigmoid(self.l3(h))
h = self.l4(h)
return h

In this case, MyMLP model will calculate y (target to predict) in forward computation, and loss is calculated at __call__ function of the model.

## Data separation for validation/test

When you are downloading publicly available machine learning dataset, it is often separated as training data and test data (and sometimes validation data) from the beginning.

However, our custom dataset is not separated yet. We can split the existing dataset easily with chainer’s function, which includes following function

• chainer.datasets.split_dataset(dataset, split_at, order=None)
• chainer.datasets.split_dataset_random(dataset, first_size, seed=None)
• chainer.datasets.get_cross_validation_datasets(dataset, n_fold, order=None)
• chainer.datasets.get_cross_validation_datasets_random(datasetn_foldseed=None)

refer SubDataset for details.

These are useful to separate training data and test data, example usage is as following,

    # Load the dataset and separate to train data and test data
dataset = MyDataset('data/my_data.csv')
train_ratio = 0.7
train_size = int(len(dataset) * train_ratio)
train, test = chainer.datasets.split_dataset_random(dataset, train_size, seed=13)

Here, we load our data as dataset (which is subclass of DatasetMixin), and split this dataset into train and test using chainer.datasets.split_dataset_random function. I split train data 70% : test data 30%, randomly in above code.

We can also specify seed argument to fix the random permutation order, which is useful for reproducing experiment or predicting code with same train/test dataset.

## Training code

The total code looks like, train_custom_dataset.py

from __future__ import print_function
import argparse

import chainer
import chainer.functions as F
from chainer import training
from chainer.training import extensions
from chainer import serializers

from my_mlp import MyMLP
from my_dataset import MyDataset

def main():
parser = argparse.ArgumentParser(description='Train custom dataset')
help='Number of images in each mini-batch')
help='Number of sweeps over the dataset to train')
help='GPU ID (negative value indicates CPU)')
help='Directory to output the result')
help='Resume the training from snapshot')
help='Number of units')
args = parser.parse_args()

print('GPU: {}'.format(args.gpu))
print('# unit: {}'.format(args.unit))
print('# Minibatch-size: {}'.format(args.batchsize))
print('# epoch: {}'.format(args.epoch))
print('')

# Set up a neural network to train
# Classifier reports softmax cross entropy loss and accuracy at every
# iteration, which will be used by the PrintReport extension below.
model = MyMLP(args.unit)

if args.gpu >= 0:
chainer.cuda.get_device(args.gpu).use()  # Make a specified GPU current
model.to_gpu()  # Copy the model to the GPU

# Setup an optimizer
optimizer = chainer.optimizers.MomentumSGD()
optimizer.setup(model)

# Load the dataset and separate to train data and test data
dataset = MyDataset('data/my_data.csv')
train_ratio = 0.7
train_size = int(len(dataset) * train_ratio)
train, test = chainer.datasets.split_dataset_random(dataset, train_size, seed=13)

train_iter = chainer.iterators.SerialIterator(train, args.batchsize)
test_iter = chainer.iterators.SerialIterator(test, args.batchsize, repeat=False, shuffle=False)

# Set up a trainer
updater = training.StandardUpdater(train_iter, optimizer, device=args.gpu)
trainer = training.Trainer(updater, (args.epoch, 'epoch'), out=args.out)

# Evaluate the model with the test dataset for each epoch
trainer.extend(extensions.Evaluator(test_iter, model, device=args.gpu))

# Dump a computational graph from 'loss' variable at the first iteration
# The "main" refers to the target link of the "main" optimizer.
trainer.extend(extensions.dump_graph('main/loss'))

# Take a snapshot at each epoch
#trainer.extend(extensions.snapshot(), trigger=(args.epoch, 'epoch'))
trainer.extend(extensions.snapshot(), trigger=(1, 'epoch'))

# Write a log of evaluation statistics for each epoch
trainer.extend(extensions.LogReport())

# Print selected entries of the log to stdout
# Here "main" refers to the target link of the "main" optimizer again, and
# "validation" refers to the default name of the Evaluator extension.
# Entries other than 'epoch' are reported by the Classifier link, called by
# either the updater or the evaluator.
trainer.extend(extensions.PrintReport(
['epoch', 'main/loss', 'validation/main/loss', 'elapsed_time']))

# Plot graph for loss for each epoch
if extensions.PlotReport.available():
trainer.extend(extensions.PlotReport(
['main/loss', 'validation/main/loss'],
x_key='epoch', file_name='loss.png'))
else:
print('Warning: PlotReport is not available in your environment')
# Print a progress bar to stdout
trainer.extend(extensions.ProgressBar())

if args.resume:
# Resume from a snapshot

# Run the training
trainer.run()
serializers.save_npz('{}/mymlp.model'.format(args.out), model)

if __name__ == '__main__':
main()

[hands on]

Execute train_custom_dataset.py to train the model. Trained model parameter will be saved to result/mymlp.model.

## Create dataset class from your own data with DatasetMixin

This tutorial corresponds to 03_custom_dataset_mlp folder in the source code.

In previous chapter we have learned how to train deep neural network using MNIST handwritten digits dataset. However, MNIST dataset has prepared by chainer utility library and you might now wonder how to prepare dataset when you want to use your own data for regression/classification task.

Chainer provides DatasetMixin class to let you define your own dataset class.

## Prepare Data

In this task, we will try very simple regression task. Own dataset can be generated by create_my_dataset.py

import os
import numpy as np
import pandas as pd

DATA_DIR = 'data'

def black_box_fn(x_data):
return np.sin(x_data) + np.random.normal(0, 0.1, x_data.shape)

if __name__ == '__main__':
if not os.path.exists(DATA_DIR):
os.mkdir(DATA_DIR)

x = np.arange(-5, 5, 0.01)
t = black_box_fn(x)
df = pd.DataFrame({'x': x, 't': t}, columns={'x', 't'})
df.to_csv(os.path.join(DATA_DIR, 'my_data.csv'), index=False)

This script will create very simple csv file named “data/my_data.csv“, with column name “x” and “t”. “x” indicates input value and “t” indicates target value to predict.

I adopted simple sin function with a little bit of Gaussian noise to generate “t” from “x”. (You may try modifying black_box_fn function to change the function to estimate.

Our task is to get a regression model of this black_box_fn.

## Define MyDataset as a subclass of DatasetMixin

Now you have your own data, let’s define dataset class by inheriting DatasetMixin class provided by chainer.

### Implementation

We usually implement 3 functions, such as

• __init__(self, *args)
To write initialization code.
• __len__(self)
Trainer module (Iterator) accesses this property to calculate the training progress in epoch.
• get_examples(self, i)
Return i-th data here.

In our case, we can implement my_dataset.py as

import numpy as np
import pandas as pd

import chainer

class MyDataset(chainer.dataset.DatasetMixin):

def __init__(self, filepath, debug=False):
self.debug = debug
# Load the data in initialization
self.data = df.values.astype(np.float32)
if self.debug:
print('[DEBUG] data: \n{}'.format(self.data))

def __len__(self):
"""return length of this dataset"""
return len(self.data)

def get_example(self, i):
"""Return i-th data"""
x, t = self.data[i]
return [x], [t]

Most important part is override function, get_example(self, i) where this function should be implemented to return only i-th data.

※ We don’t need to care about minibatch concatenation, Iterator will handle these stuffs. You only need to prepare a dataset to return i-th data :).

The above code works following,

1. We load prepared data ‘data/my_data.csv‘ (set as filepath) in __init__ function in the initialization code, and set expanded array (strictly, pandas.DataFrame class) into self.data.

2. return i-th data xi and ti as a vector with size 1 in get_example(self, i).

### How does it work

The idea is simple. You can instantiate dataset with MyDataset() and then you can access i-th data by dataset[i].

It is also possible to access by slice or one dimensional vector, dataset[i:j] returns [dataset[i], dataset[i+1], …, dataset[j-1]].

if __name__ == '__main__':
# Test code
dataset = MyDataset('data/my_data.csv', debug=True)

print('Access by index dataset[1] = ', dataset[1])
print('Access by slice dataset[:3] = ', dataset[:3])
print('Access by list dataset[[3, 5]] = ', dataset[[3, 5]])
index = np.arange(3)
print('Access by numpy array dataset[[0, 1, 2]] = ', dataset[index])
# Randomly take 3 data
index = np.random.permutation(len(dataset))[:3]
print('dataset[{}] = {}'.format(index, dataset[index]))

[DEBUG] data:
[[-5. 0.79404432]
[-4.98999977 1.03740847]
[-4.98000002 0.88521522]
...,
[ 4.96999979 -0.85200465]
[ 4.98000002 -1.10389316]
[ 4.98999977 -0.88174647]]
Access by index dataset[1] = ([-4.9899998], [1.0374085])
Access by slice dataset[:3] = [([-5.0], [0.79404432]), ([-4.9899998], [1.0374085]), ([-4.98], [0.88521522])]
Access by list dataset[[3, 5]] = [([-4.9699998], [1.0449667]), ([-4.9499998], [0.82551986])]
Access by numpy array dataset[[0, 1, 2]] = [([-5.0], [0.79404432]), ([-4.9899998], [1.0374085]), ([-4.98], [0.88521522])]dataset[[602 377 525]] = [([1.02], [0.71344751]), ([-1.23], [-0.92034239]), ([0.25], [0.31516379])]

### Flexibility of DatasetMixin – dynamic load from stolage, preprocess, data augmentation

(This my be advanced topic for now. You may skip and come back later.)

The nice part of DatasetMixin class is its flexibility. Basically you can implement anything in get_example function, and get_example is called every time when we access the data with data[i].

1. Data augmentation

This means we can write dynamic preprocessing. For example data augmentation is wll known, important Technic to avoid overfitting and get high validation score especially in image processing.

See chainer official imagenet example for the reference.

If you are dealing with very big size data, and all data cannot be expanded in memory at once, the best practice is to load the data each time when necessary (when creating minibatch).

We can achieve this procedure easy with DatasetMixin class. Simply, you can write loading code in get_example function to load i-th data from storage that’s all!

Refer dataset_introduction.ipynb if you want to know more about dataset class.

## Predict code for simple sequence dataset

Predict code is easy, implemented in predict_simple_sequence.py.

First, construct the model and load the trained model parameters,

    # Model Setup
model = archs[args.arch](n_vocab=N_VOCABULARY, n_units=args.unit)
if args.gpu >= 0:
chainer.cuda.get_device(args.gpu).use()  # Make a specified GPU current
model.to_gpu()                           # Copy the model to the GPU
xp = np if args.gpu < 0 else cuda.cupy

serializers.load_npz(args.modelpath, model)

Then we only specify the first index (corresponds to word id), primeindex, and generate next index. We can generate next index repeatedly based on the generated index.

    # Predict
predicted_sequence = [prev_index]
for i in range(args.length):
prev = chainer.Variable(xp.array([prev_index], dtype=xp.int32))
current = model(prev)
current_index = np.argmax(cuda.to_cpu(current.data))
predicted_sequence.append(current_index)
prev_index = current_index

print('Predicted sequence: ', predicted_sequence)

The result is the following, successfully generate the sequence.

Predicted sequence:  [1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5]

This is a simple example to check if RNN has an ability to remember past sequence, so I didn’t prepare validation data. I just wanted to check if the RNN model can “memorize” the training data sequence or not.

Note that the situation is little bit different during training phase and inference/predict phase. In training phase, the model is trained to generate $$xt$$ based on the correct sequence $$[x_0, x_1, \dots, x_{t-1}]$$.

However in predict phase, we only specify the first index $$x_0$$, and the model will generate $$x’_1$$ (here ‘ indicates output from the model). After that, the model need to generate $$x’_2$$ based on $$x’_1$$. Therefore, the model will generate $$x’_t$$ based on the predicted sequence $$[x_0, x’_1 \dots, x’_{t-1}]$$.

## Chainer v2 released: difference from v1

Chainer version 2 has been released on 2017 June 1,

#Chainer v2.0.0 has been released! Memory reduction (33% in ResNet), API clean up, and CuPy as a separate package. https://t.co/xRrmZAlJWT

— Chainer (@ChainerOfficial) June 1, 2017

This post is a summary of what you need to change in your code for your chainer development. Detail change is written in official document.

## Installation change

CuPy module becomes independent. Reason is that CuPy is GPU version of numpy, it can be used for many types of linear calculation, not specific for chainer.

To setup Chainer,

1. If you are using only CPU, this is enough as previous
pip install chainer
1. If you want to get a benefit of GPU, you need to setup CUDA and need to install CuPy separately.
pip install chainer
pip install cupy

[NOTE] Also, is you have multiple GPU, you can install NCCL before install chainer and cupy to use MultiProcessParallelUpdater.

Important to note that NO source code change is necessary for your Chainer development. Chainer will import CuPy only when it is installed in your environment.

## Global configuration is introduced

Global config chainer.global_config and thread local config chainer.config is introduced to control the chainer behavior.

Its config includes these flags,

• chainer.config.cudnn_deterministic
• chainer.config.debug
• chainer.config.enable_backprop
• chainer.config.keep_graph_on_report
• chainer.config.train
• chainer.config.type_check
• chainer.config.use_cudnn

See official document for details.

I think train flag and enable_backprop flag is important to remember.

### chainer.config.train

Function behavior can be controlled by using chainer.config.train flag instead of writing it in function argument. I will just cite above official doc for the example,

Example

Consider the following model definition and the code to call it in test mode written for Chainer v1.

# Chainer v1
import chainer.functions as F
...

def __call__(self, x, train=True):
return f(F.dropout(x, train=train))
m = MyModel(...)
y = m(x, train=False)

In Chainer v2, it should be updated into the following code:

# Chainer v2
import chainer.functions as F
...

def __call__(self, x):
return f(F.dropout(x))
m = MyModel(...)
with chainer.using_config('train', False):
y = m(x)

### chainer.config.enable_backprop

volatile flag of Variable class, used in Chainer v1, is removed in v2.

Instead you can use chainer.config.enable_backprop flag to control ON/OFF of backpropagation.

When disable backprop, there is util function chainer.no_backprop_mode(),

x = chainer.Variable(x)
with chainer.no_backprop_mode():
y = model(x)

## Input size of the Link can be omitted

Let me just show the example,

In Chainer v1

conv1=L.Convolution2D(None, 16, 3)

In Chainer v2 it can be also written as, (writing in Chainer v1 notation is also possible)

conv1=L.Convolution2D(16, 3)

This is available with following links,

## init_scope closure can be used for Link, Chain initialization

When you define your own Link or Chain class, init_scope() can be used to initialize Parameter or Link,

This writing style is recommended because of IDE (PyCharm etc) can enhance the local variable indexing and show type hinting. But you can still use conventional (chainer v1) initialization as well.

Below is an example of defining Chain class, from official doc,

Example

For example, the following chain initialization code

# Chainer v1
class MyMLP(chainer.Chain):
def __init__(self):
super(MyMLP, self).__init__(
layer1=L.Linear(None, 20),
layer2=L.Linear(None, 30),
)
...

is recommended to be updated as follows.

# Chainer v2
class MyMLP(chainer.Chain):
def __init__(self):
super(MyMLP, self).__init__()
with self.init_scope():
self.layer1 = L.Linear(20)
self.layer2 = L.Linear(30)

## Function spec change (GRU, LSTM etc)

This change affects those who are working with NLP (Natural Language Processing) field.

GRU and LSTM function behavior has changed.

## Optimizer spec change

Some of the deprecated optimizer function, e.g. zero_grads(), is removed.

## Internal Change for better performance

This does not affect to your development code change, but just good to know for your knowledge:

### Memory efficiency enhancement

When creating a computational graph for back propagation, Function does not keep the Variable’s array data itself but only keep the reference of it.

### Speed enhancement

Lazy type check is introduced to speed up type check.

## Summary

• CuPy module becomes independent package: need to install separately if using GPU.
• Global config, chainer.config is introduced
• FunctionChain call behavior is switched by chainer.config.train flag.
with chainer.using_config('train', False):
...
• volatile flag of Variable is removed, use chainer.config.enable_backprop flag instead
with chainer.no_backprop_mode():
...
• Your custom class of Link and Chain can be initialized by with self.init_scope(): sentence.
with self.init_scope():
self.l1 = L.Linear(100)
...

I noticed that many with statement is used in Chainer v2 code.

## Training RNN with simple sequence dataset

We have learned in previous post that RNN is expected to have an ability to remember the sequence information. Let’s do a easy experiment to check it before trying actual NLP application.

## Simple sequence dataset

I just prepared a simple script to generate simple integer sequence as follows,

import numpy as np

N_VOCABULARY = 10

def get_simple_sequence(n_vocab, repeat=100):
data = []
for i in range(repeat):
for j in range(n_vocab):
for k in range(j):
data.append(j)

return np.asarray(data, dtype=np.int32)

if __name__ == '__main__':
data = get_simple_sequence(N_VOCABULARY)
print(data)

Its output is,

[1 2 2 3 3 3 4 4 4 4 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 1 2 2 ..., 9 9 9]

So the number i is repeated i times. In order for RNN to generate correct sequence, RNN need to “count” how many times this number already appeared.

For example, to output correct sequence of 9 9 9 … followed by 1, RNN need to count if 9 is already appeared 9 times to output 1.

## Training code for RNN

Training procedure of RNN is little bit complicated compared to MLP or CNN, due to the existence of recurrent loop and we need to deal with back propagation with sequential data properly.

To achieve this, we implement custom iterator and updater.

※ Following implementation just referenced from Chainer official example code.

### Iterator – feed data sequentially

When training RNN, we need to input the data sequentially. Thus we should not take random permutation. We need to be careful when creating the minibatch dataset so that each minibatch should be feed in sequence.

You can implement custom Iterator class to achieve this functionality. The parent class Iterator is implemented as following, Iterator code.

So what we need to implement in Iterator is

• __init__(self, ...) :
Initialization code.
• __next__(self) :
This is the core part of iterator. For each iteration, this function is automatically called to get next minibatch.
• epoch_detail(self) :
This property is used by trainer module to show the progress of training.
• serialize(self) :
Implement if you want to support resume functionality of trainer.

We will implement ParallelSequentialIterator, works following, please also see the figure above.

1. It will get dataset in the __init__ code, and split the dataset equally with size batch_size.
2. Every iteration of the training loop, __next__() is called.
This iterator will prepare current word (input data) and next word (answer data).
The RNN model is trained to predict next word from current word (and its recurrent unit, which encodes the past sequence information).
3. Additionally, in order for the trainer extensions to work nicely, epoch_detail and serialize are implemented. (These are not mandatory for minimum implementation.)

The final code looks like following,

"""
This code is copied from official chainer examples
- https://github.com/chainer/chainer/blob/e2fe6f8023e635f8c1fc9c89e85d075ebd50c529/examples/ptb/train_ptb.py
"""
import chainer

# Dataset iterator to create a batch of sequences at different positions.
# This iterator returns a pair of current words and the next words. Each
# example is a part of sequences starting from the different offsets
# equally spaced within the whole sequence.
class ParallelSequentialIterator(chainer.dataset.Iterator):

def __init__(self, dataset, batch_size, repeat=True):
self.dataset = dataset
self.batch_size = batch_size  # batch size
# Number of completed sweeps over the dataset. In this case, it is
# incremented if every word is visited at least once after the last
# increment.
self.epoch = 0
# True if the epoch is incremented at the last iteration.
self.is_new_epoch = False
self.repeat = repeat
length = len(dataset)
# Offsets maintain the position of each sequence in the mini-batch.
self.offsets = [i * length // batch_size for i in range(batch_size)]
# NOTE: this is not a count of parameter updates. It is just a count of
# calls of __next__.
self.iteration = 0

def __next__(self):
# This iterator returns a list representing a mini-batch. Each item
# indicates a different position in the original sequence. Each item is
# represented by a pair of two word IDs. The first word is at the
# "current" position, while the second word at the next position.
# At each iteration, the iteration count is incremented, which pushes
# forward the "current" position.
length = len(self.dataset)
if not self.repeat and self.iteration * self.batch_size >= length:
# If not self.repeat, this iterator stops at the end of the first
# epoch (i.e., when all words are visited once).
raise StopIteration
cur_words = self.get_words()
self.iteration += 1
next_words = self.get_words()

epoch = self.iteration * self.batch_size // length
self.is_new_epoch = self.epoch < epoch
if self.is_new_epoch:
self.epoch = epoch

return list(zip(cur_words, next_words))

@property
def epoch_detail(self):
# Floating point version of epoch.
return self.iteration * self.batch_size / len(self.dataset)

def get_words(self):
# It returns a list of current words.
return [self.dataset[(offset + self.iteration) % len(self.dataset)]
for offset in self.offsets]

def serialize(self, serializer):
# It is important to serialize the state to be recovered on resume.
self.iteration = serializer('iteration', self.iteration)
self.epoch = serializer('epoch', self.epoch)

### Updater – Truncated back propagation through time (BPTT)

Back propagation through time: The training procedure for RNN model is different from MLP or CNN. Because each forward computation of RNN depends on the previous forward computation due to the existence of recurrent unit. Therefore we need to execute forward computation several times before executing backward computation to allow recurrent loop, Whh, to learn the sequential information. We set the value bprop_len (back propagation length) in below Updater implementation. Forward computation is executed this number of times consecutively, followed by one time of back propagation.

Truncate computational graph: Also, as you can see from the above figure, RNN graph will grow every time the forward computation is executed, and computer cannot handle if the graph grows infinitely long. To deal with this issue, we will cut (truncate) the graph after each time of backward computation. It can be achieved by calling unchain_backward function in chainer.

This optimization method can be implemented by creating custom Updater class, BPTTUpdater, as a subclass of StandardUpdater.

It just overrides the function update_core, which is the function to write parameter update (optimize) process.

Source code: bptt_updater.py

“””
This code is copied from official chainer examples

• https://github.com/chainer/chainer/blob/e2fe6f8023e635f8c1fc9c89e85d075ebd50c529/examples/ptb/train_ptb.py
“””
import chainer
from chainer import training

# Custom updater for truncated BackProp Through Time (BPTT)

class BPTTUpdater(training.StandardUpdater):

def __init__(self, train_iter, optimizer, bprop_len, device):
super(BPTTUpdater, self).__init__(
train_iter, optimizer, device=device)
self.bprop_len = bprop_len

# The core part of the update routine can be customized by overriding.
def update_core(self):
loss = 0
# When we pass one iterator and optimizer to StandardUpdater.__init__,
# they are automatically named 'main'.
train_iter = self.get_iterator('main')
optimizer = self.get_optimizer('main')

# Progress the dataset iterator for bprop_len words at each iteration.
for i in range(self.bprop_len):
# Get the next batch (a list of tuples of two word IDs)
batch = train_iter.__next__()

# Concatenate the word IDs to matrices and send them to the device
# self.converter does this job
# (it is chainer.dataset.concat_examples by default)
x, t = self.converter(batch, self.device)

# Compute the loss at this time step and accumulate it
loss += optimizer.target(chainer.Variable(x), chainer.Variable(t))

loss.backward()  # Backprop
loss.unchain_backward()  # Truncate the graph
optimizer.update()  # Update the parameters

As you can see, forward is executed in the for loop bprop_len times consecutively to accumulate loss, followed by one backward to execute the back propagation of this accumulated loss. After that, the parameter is updated by optimizer using update funciton.

Note that unchain_backward is called every time at the end of the update_core function to truncate/cut the computational graph.

### Main training code

Once iterator and the updater are prepared, training code is almost same with previous training for MLP-MNIST task or CNN-CIFAR10/CIFAR100.

"""
RNN Training code with simple sequence dataset
"""
from __future__ import print_function

import os
import sys
import argparse

import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

import chainer
import chainer.functions as F
from chainer import training, iterators, serializers, optimizers
from chainer.training import extensions

sys.path.append(os.pardir)
from RNN import RNN
from RNN2 import RNN2
from RNN3 import RNN3
from RNNForLM import RNNForLM
from simple_sequence.simple_sequence_dataset import N_VOCABULARY, get_simple_sequence
from parallel_sequential_iterator import ParallelSequentialIterator
from bptt_updater import BPTTUpdater

def main():
archs = {
'rnn': RNN,
'rnn2': RNN2,
'rnn3': RNN3,
'lstm': RNNForLM
}

parser = argparse.ArgumentParser(description='RNN example')
default='rnn', help='Net architecture')
help='Number of RNN units in each layer')
help='Number of words in each mini-batch '
'(= length of truncated BPTT)')
help='Number of images in each mini-batch')
help='Number of sweeps over the dataset to train')
help='GPU ID (negative value indicates CPU)')
help='Directory to output the result')
help='Resume the training from snapshot')
args = parser.parse_args()

print('GPU: {}'.format(args.gpu))
print('# Architecture: {}'.format(args.arch))
print('# Minibatch-size: {}'.format(args.batchsize))
print('# epoch: {}'.format(args.epoch))
print('')

# 1. Setup model
#model = archs[args.arch](n_vocab=N_VOCABRARY, n_units=args.unit)  # activation=F.leaky_relu
model = archs[args.arch](n_vocab=N_VOCABULARY,
n_units=args.unit)  # , activation=F.tanh
classifier_model = L.Classifier(model)

if args.gpu >= 0:
chainer.cuda.get_device(args.gpu).use()  # Make a specified GPU current
classifier_model.to_gpu()  # Copy the model to the GPU

eval_classifier_model = classifier_model.copy()  # Model with shared params and distinct states
eval_model = classifier_model.predictor

# 2. Setup an optimizer
#optimizer = optimizers.MomentumSGD()
optimizer.setup(classifier_model)

train = get_simple_sequence(N_VOCABULARY)
test = get_simple_sequence(N_VOCABULARY)

# 4. Setup an Iterator
train_iter = ParallelSequentialIterator(train, args.batchsize)
test_iter = ParallelSequentialIterator(test, args.batchsize, repeat=False)

# 5. Setup an Updater
updater = BPTTUpdater(train_iter, optimizer, args.bproplen, args.gpu)
# 6. Setup a trainer (and extensions)
trainer = training.Trainer(updater, (args.epoch, 'epoch'), out=args.out)

# Evaluate the model with the test dataset for each epoch
trainer.extend(extensions.Evaluator(test_iter, eval_classifier_model,
device=args.gpu,
# Reset the RNN state at the beginning of each evaluation
eval_hook=lambda _: eval_model.reset_state())
)

trainer.extend(extensions.dump_graph('main/loss'))
trainer.extend(extensions.snapshot(), trigger=(1, 'epoch'))
trainer.extend(extensions.LogReport())
trainer.extend(extensions.PrintReport(
['epoch', 'main/loss', 'validation/main/loss',
'main/accuracy', 'validation/main/accuracy', 'elapsed_time']))
trainer.extend(extensions.PlotReport(
['main/loss', 'validation/main/loss'],
x_key='epoch', file_name='loss.png'))
trainer.extend(extensions.PlotReport(
['main/accuracy', 'validation/main/accuracy'],
x_key='epoch',
file_name='accuracy.png'))

# trainer.extend(extensions.ProgressBar())

# Resume from a snapshot
if args.resume:

# Run the training
trainer.run()
serializers.save_npz('{}/{}_simple_sequence.model'
.format(args.out, args.arch), model)

if __name__ == '__main__':
main()

## Run the code

You can execute the code like,

python train_simple_sequence.py

You can also train with different model using -a option,

python train_simple_sequence.py -a rnn2

Below is the result in my environment with RNN architecture,

GPU: -1
# Architecture: rnn
# Minibatch-size: 10
# epoch: 10

epoch       main/loss   validation/main/loss  main/accuracy  validation/main/accuracy  elapsed_time
1           2.15793     1.04886               0.434783       0.862222                  1.11497
2           1.09747     0.569532              0.681818       0.866667                  2.63032
3           0.77518     0.4109                0.652174       0.866667                  4.14638
4           0.621658    0.335307              0.727273       0.888889                  5.66036
5           0.497747    0.278632              0.782609       0.911111                  7.15996
6           0.429227    0.233576              0.818182       0.955556                  8.61034
7           0.360052    0.194116              0.913043       0.955556                  10.0369
8           0.312902    0.162933              0.863636       0.977778                  11.4006
9           0.317397    0.141921              0.913043       0.977778                  12.8574
10          0.281399    0.120881              0.909091       1                         14.215        

I set the N_VOCABULARY=10 in simple_sequence_dataset.py, and even the simple RNN achieved the accuracy close to 1. It seems this RNN model have an ability to remember past 10 sequence.

## Recurrent Neural Network (RNN) introduction

[Update 2017.06.11] Add chainer v2 code

How can we deal with the sequential data in deep neural network?

This formulation is especially important in natural language processing (NLP) field. For example, text is made of sequence of word. If we want to predict the next word from given sentence, the probability of the next word depends on whole past sequence of word.

So, the neural network need an ability to “remember” the past sentence to predict next word.

In this chapter, Recurrent Neural Network (RNN) and Long Short Term Memory (LSTM) are introduced to deal with sequential data.

## Recurrent Neural Network (RNN)

Recurrent Neural Network is similar to Multi Layer Perceptron introduced before, but a loop is added in its hidden layer (Shown in above figure with $$W_{hh}$$).
Here the subscript $$t$$ represents the time step (sequence index). Due to this loop hidden layer unit $$h_{t-1}$$ is fed again to construct hidden unit $$h_{t}$$ of next sequence. Therefore, information of past sequence can be “stored” (memorized) in hidden layer and passed to next sequence.

You might wonder how the loop works in neural network in the above figure, below figure is the expanded version which explicitly explain how the loop works.

In this figure, data flow is from bottom ($$x$$) to top ($$y$$) and horizontal axis represents time step from left (time step=1) to right (time step=$$t$$).

Every time of the forward computation, it depends on the previous hidden unit $$h_{t-1}$$. So the RNN need to keep this hidden unit as a state, see implementation below.

Also, we need to be careful when executing back propagation, because it depends on the history of consecutive forward computation. The detail will be explained in later.

## RNN implementation in Chainer

Below code shows implementation of the most simple RNN implementation with one hidden (recurrent) layer, drawn in above figure.

import chainer
import chainer.functions as F

class RNN(chainer.Chain):
"""Simple Recurrent Neural Network implementation"""
def __init__(self, n_vocab, n_units):
super(RNN, self).__init__()
with self.init_scope():
self.embed = L.EmbedID(n_vocab, n_units)
self.l1 = L.Linear(n_units, n_units)
self.r1 = L.Linear(n_units, n_units)
self.l2 = L.Linear(n_units, n_vocab)
self.recurrent_h = None

def reset_state(self):
self.recurrent_h = None

def __call__(self, x):
h = self.embed(x)
if self.recurrent_h is None:
self.recurrent_h = F.tanh(self.l1(h))
else:
self.recurrent_h = F.tanh(self.l1(h) + self.r1(self.recurrent_h))
y = self.l2(self.recurrent_h)
return y

L.EmbedID is used in the above RNN implementation. This is convenient method if you want to input data which can be represented as ID.

In NLP with text processing, each word is represented as ID in integer. EmbedID layer convert this id into vector which can be considered as vector representation of the word.

More precisely, EmbedID layer works as combination of 2 operations:

1. Convert integer ID into in_size dimensional one-hot vector.
2. Apply Linear layer (with bias $$b = 0$$) to this one-hot vector to output out_size units.

See official document for details,

### Creating RecurrentBlock as sub-module

If you want to create more deep RNN, you can make recurrent block as a sub module layer like below.

import chainer
import chainer.functions as F

class RecurrentBlock(chainer.Chain):
"""Subblock for RNN"""
def __init__(self, n_in, n_out, activation=F.tanh):
super(RecurrentBlock, self).__init__()
with self.init_scope():
self.l = L.Linear(n_in, n_out)
self.r = L.Linear(n_in, n_out)
self.rh = None
self.activation = activation

def reset_state(self):
self.rh = None

def __call__(self, h):
if self.rh is None:
self.rh = self.activation(self.l(h))
else:
self.rh = self.activation(self.l(h) + self.r(self.rh))
return self.rh

class RNN2(chainer.Chain):
"""RNN implementation using RecurrentBlock"""
def __init__(self, n_vocab, n_units, activation=F.tanh):
super(RNN2, self).__init__()
with self.init_scope():
self.embed = L.EmbedID(n_vocab, n_units)
self.r1 = RecurrentBlock(n_units, n_units, activation=activation)
self.r2 = RecurrentBlock(n_units, n_units, activation=activation)
self.r3 = RecurrentBlock(n_units, n_units, activation=activation)
#self.r4 = RecurrentBlock(n_units, n_units, activation=activation)
self.l5 = L.Linear(n_units, n_vocab)

def reset_state(self):
self.r1.reset_state()
self.r2.reset_state()
self.r3.reset_state()
#self.r4.reset_state()

def __call__(self, x):
h = self.embed(x)
h = self.r1(h)
h = self.r2(h)
h = self.r3(h)
#h = self.r4(h)
y = self.l5(h)
return y

## CIFAR-10, CIFAR-100 inference code

The code structure of inference/predict stage is quite similar to MNIST inference code, please read this for precise explanation.

Here, I will simply put the code and its results.

## CIFAR-10 inference code

Code is uploaded on github as predict_cifar10.py.

"""Inference/predict code for CIFAR-10

model must be trained before inference,
train_cifar10.py must be executed beforehand.
"""
from __future__ import print_function
import os
import argparse

import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import chainer
import chainer.functions as F
from chainer import training, iterators, serializers, optimizers, Variable, cuda
from chainer.training import extensions

from CNNSmall import CNNSmall
from CNNMedium import CNNMedium

CIFAR10_LABELS_LIST = [
'airplane',
'automobile',
'bird',
'cat',
'deer',
'dog',
'frog',
'horse',
'ship',
'truck'
]

def main():
archs = {
'cnnsmall': CNNSmall,
'cnnmedium': CNNMedium,
}

parser = argparse.ArgumentParser(description='Cifar-10 CNN predict code')
default='cnnsmall', help='Convnet architecture')
#                    help='Number of images in each mini-batch')
help='GPU ID (negative value indicates CPU)')
args = parser.parse_args()

print('GPU: {}'.format(args.gpu))
#print('# Minibatch-size: {}'.format(args.batchsize))
print('')

# 1. Setup model
class_num = 10
model = archs[args.arch](n_out=class_num)
classifier_model = L.Classifier(model)
if args.gpu >= 0:
chainer.cuda.get_device(args.gpu).use()  # Make a specified GPU current
classifier_model.to_gpu()  # Copy the model to the GPU
xp = np if args.gpu < 0 else cuda.cupy

# 2. Load the CIFAR-10 dataset
train, test = chainer.datasets.get_cifar10()

basedir = 'images'
plot_predict_cifar(os.path.join(basedir, 'cifar10_predict.png'), model,
train, 4, 5, scale=5., label_list=CIFAR10_LABELS_LIST)

def plot_predict_cifar(filepath, model, data, row, col,
scale=3., label_list=None):
fig_width = data[0][0].shape[1] / 80 * row * scale
fig_height = data[0][0].shape[2] / 80 * col * scale
fig, axes = plt.subplots(row,
col,
figsize=(fig_height, fig_width))
for i in range(row * col):
# train[i][0] is i-th image data with size 32x32
image, label_index = data[i]
xp = cuda.cupy
x = Variable(xp.asarray(image.reshape(1, 3, 32, 32)))    # test data
#t = Variable(xp.asarray([test[i][1]]))  # labels
y = model(x)                              # Inference result
prediction = y.data.argmax(axis=1)
image = image.transpose(1, 2, 0)
print('Predicted {}-th image, prediction={}, actual={}'
.format(i, prediction[0], label_index))
r, c = divmod(i, col)
axes[r][c].imshow(image)  # cmap='gray' is for black and white picture.
if label_list is None:
.format(label_index, prediction[0]))
else:
pred = int(prediction[0])
.format(label_index, label_list[label_index],
pred, label_list[pred]))
axes[r][c].axis('off')  # do not show axis value
plt.savefig(filepath)
print('Result saved to {}'.format(filepath))

if __name__ == '__main__':
main()

This outputs the result as,

You can see that even small CNN, it successfully classifies most of the images. Of course this is just a simple example and you can improve the model accuracy by tuning the deep neural network!

## CIFAR-100 inference code

In the same way, code is uploaded on github as predict_cifar100.py.

CIFAR-100 is more difficult than CIFAR-10 in general because there are more class to classify but exists fewer number of training image data.

Again, the accuracy can be improved by tuning the deep neural network model, try it!

That’s all for understanding CNN, next is to understand RNN, LSTM used in Natual Language Processing.

## CIFAR-10, CIFAR-100 training with Convolutional Neural Network

[Update 2017.06.11] Add chainer v2 code

This is example of small Convolutional Neural Network definition, CNNSmall

import chainer
import chainer.functions as F

class CNNSmall(chainer.Chain):
def __init__(self, n_out):
super(CNNSmall, self).__init__()
with self.init_scope():
self.conv1 = L.Convolution2D(None, 16, 3, 2)
self.conv2 = L.Convolution2D(16, 32, 3, 2)
self.conv3 = L.Convolution2D(32, 32, 3, 2)
self.fc4 = L.Linear(None, 100)
self.fc5 = L.Linear(100, n_out)

def __call__(self, x):
h = F.relu(self.conv1(x))
h = F.relu(self.conv2(h))
h = F.relu(self.conv3(h))
h = F.relu(self.fc4(h))
h = self.fc5(h)
return h

I also made a slightly bigger CNN, called CNNMedium,

import chainer
import chainer.functions as F

class CNNMedium(chainer.Chain):
def __init__(self, n_out):
super(CNNMedium, self).__init__()
with self.init_scope():
self.conv1 = L.Convolution2D(None, 16, 3, 1)
self.conv2 = L.Convolution2D(16, 32, 3, 2)
self.conv3 = L.Convolution2D(32, 32, 3, 1)
self.conv4 = L.Convolution2D(32, 64, 3, 2)
self.conv5 = L.Convolution2D(64, 64, 3, 1)
self.conv6 = L.Convolution2D(64, 128, 3, 2)
self.fc7 = L.Linear(None, 100)
self.fc8 = L.Linear(100, n_out)

def __call__(self, x):
h = F.relu(self.conv1(x))
h = F.relu(self.conv2(h))
h = F.relu(self.conv3(h))
h = F.relu(self.conv4(h))
h = F.relu(self.conv5(h))
h = F.relu(self.conv6(h))
h = F.relu(self.fc7(h))
h = self.fc8(h)
return h

It is nice to know the computational cost for Convolution layer, which is approximated as,

$$H_I \times W_I \times CH_I \times CH_O \times k ^ 2$$
• $$CH_I$$  : Input image channel
• $$CH_O$$ : Output image channel
• $$H_I$$     : Input image height
• $$W_I$$    : Input image width
• $$k$$           : kernal size (assuming same for width & height)

In above CNN definitions, the size of the channel is bigger for deeper layer. This can be understood by calculating the computational cost for each layer.

When L.Convolution2D with stride=2 is used, the size of image become almost half. This means $$H_I$$ and $$W_I$$ becomes small value, so $$CH_I$$ and $$CH_O$$ can take larger value.

[TODO: add computational cost table for CNN Medium example]

## Training CIFAR-10

Once you have written CNN, it is easy to train this model. The code, train_cifar10.py, is quite similar to MNIST training code.

Only small difference is the dataset preparation for CIFAR-10,

    # 3. Load the CIFAR-10 dataset
train, test = chainer.datasets.get_cifar10()

and model setup

from CNNSmall import CNNSmall
from CNNMedium import CNNMedium

archs = {
'cnnsmall': CNNSmall,
'cnnmedium': CNNMedium,
}

...

class_num = 10
model = archs[args.arch](n_out=class_num)

The whole source code is the following,

from __future__ import print_function
import argparse

import chainer
import chainer.functions as F
from chainer import training, iterators, serializers, optimizers
from chainer.training import extensions

from CNNSmall import CNNSmall
from CNNMedium import CNNMedium

def main():
archs = {
'cnnsmall': CNNSmall,
'cnnmedium': CNNMedium,
}

parser = argparse.ArgumentParser(description='Cifar-10 CNN example')
default='cnnsmall', help='Convnet architecture')
help='Number of images in each mini-batch')
help='Number of sweeps over the dataset to train')
help='GPU ID (negative value indicates CPU)')
help='Directory to output the result')
help='Resume the training from snapshot')
args = parser.parse_args()

print('GPU: {}'.format(args.gpu))
print('# Minibatch-size: {}'.format(args.batchsize))
print('# epoch: {}'.format(args.epoch))
print('')

# 1. Setup model
class_num = 10
model = archs[args.arch](n_out=class_num)
classifier_model = L.Classifier(model)
if args.gpu >= 0:
chainer.cuda.get_device(args.gpu).use()  # Make a specified GPU current
classifier_model.to_gpu()  # Copy the model to the GPU

# 2. Setup an optimizer
optimizer.setup(classifier_model)

# 3. Load the CIFAR-10 dataset
train, test = chainer.datasets.get_cifar10()

# 4. Setup an Iterator
train_iter = iterators.SerialIterator(train, args.batchsize)
test_iter = iterators.SerialIterator(test, args.batchsize,
repeat=False, shuffle=False)

# 5. Setup an Updater
updater = training.StandardUpdater(train_iter, optimizer, device=args.gpu)
# 6. Setup a trainer (and extensions)
trainer = training.Trainer(updater, (args.epoch, 'epoch'), out=args.out)

# Evaluate the model with the test dataset for each epoch
trainer.extend(extensions.Evaluator(test_iter, classifier_model, device=args.gpu))

trainer.extend(extensions.dump_graph('main/loss'))
trainer.extend(extensions.snapshot(), trigger=(1, 'epoch'))
trainer.extend(extensions.LogReport())
trainer.extend(extensions.PrintReport(
['epoch', 'main/loss', 'validation/main/loss',
'main/accuracy', 'validation/main/accuracy', 'elapsed_time']))
trainer.extend(extensions.PlotReport(
['main/loss', 'validation/main/loss'],
x_key='epoch', file_name='loss.png'))
trainer.extend(extensions.PlotReport(
['main/accuracy', 'validation/main/accuracy'],
x_key='epoch',
file_name='accuracy.png'))

trainer.extend(extensions.ProgressBar())

# Resume from a snapshot
if args.resume:

# Run the training
trainer.run()
serializers.save_npz('{}/{}-cifar10.model'
.format(args.out, args.arch), model)

if __name__ == '__main__':
main()

See how clean the code is! Chainer abstracts the training process and thus the code can be reusable with other deep learning training.

[hands on] Try running train code.

Below is example in my environment

• CNNSmall model
$python train_cifar10.py -g 0 -o result-cifar10-cnnsmall -a cnnsmall GPU: 0 # Minibatch-size: 64 # epoch: 20 epoch main/loss validation/main/loss main/accuracy validation/main/accuracy elapsed_time 1 1.66603 1.44016 0.397638 0.477807 6.22123 2 1.36101 1.31731 0.511324 0.527568 12.0878 3 1.23553 1.20439 0.559119 0.568073 17.9239 4 1.14553 1.13121 0.589609 0.595541 23.7497 5 1.08058 1.09946 0.617747 0.606588 29.5948 6 1.02242 1.1259 0.638784 0.605295 35.4604 7 0.97847 1.0797 0.65533 0.615048 41.3058 8 0.938967 1.0584 0.669494 0.621815 47.184 9 0.902363 1.00883 0.681985 0.646099 53.0965 10 0.872796 1.00743 0.692782 0.644904 58.982 11 0.838787 0.993791 0.705226 0.651971 64.9511 12 0.813549 0.987916 0.714609 0.655454 70.3869 13 0.785552 0.987968 0.723825 0.659236 75.8247 14 0.766127 1.0092 0.730074 0.656748 81.4311 15 0.743967 1.04623 0.738496 0.650876 86.9175 16 0.723779 0.991238 0.744518 0.665008 92.6226 17 0.704939 1.02468 0.752058 0.655354 98.1399 18 0.68687 0.999966 0.756962 0.660629 103.657 19 0.668204 1.00803 0.763564 0.660928 109.226 20 0.650081 1.01554 0.769906 0.667197 114.705 Chainer extension, PlotReport will automatically create the graph of loss and accuracy for each epoch. We can achieve around 65% validation accuracy with such a easy CNN construction. • CNNMedium $ python train_cifar10.py -g 0 -o result-cifar10-cnnmedium -a cnnmedium
GPU: 0
# Minibatch-size: 64
# epoch: 20

epoch       main/loss   validation/main/loss  main/accuracy  validation/main/accuracy  elapsed_time
1           1.62656     1.3921                0.402494       0.493133                  7.61706
2           1.31508     1.2771                0.526448       0.54588                   15.209
3           1.14961     1.12021               0.589749       0.603603                  22.7185
4           1.04442     1.05119               0.631182       0.629877                  30.1564
5           0.947944    1.00655               0.66624        0.648288                  37.8547
6           0.876341    1.0247                0.690021       0.644705                  46.9253
7           0.819997    0.983303              0.711968       0.662719                  54.9994
8           0.757557    0.933339              0.733795       0.677846                  62.4761
9           0.699673    0.948701              0.751539       0.682126                  69.8784
10          0.652811    0.965453              0.769006       0.680533                  77.2829
11          0.606698    0.990516              0.785551       0.671278                  84.6915
12          0.559568    0.999138              0.799996       0.682822                  92.068
13          0.521884    1.07451               0.814158       0.678742                  99.4703
14          0.477247    1.08184               0.829445       0.673865                  107.249
15          0.443625    1.08582               0.840109       0.680832                  114.609
16          0.406318    1.26192               0.853573       0.660529                  122.218
17          0.378328    1.2075                0.86507        0.670183                  129.655
18          0.349719    1.27795               0.87548        0.673467                  137.098
19          0.329299    1.32094               0.881702       0.664709                  144.553
20          0.297305    1.39914               0.894426       0.666202                  151.959

As expected, CNNMedium takes little bit longer time for computation but it achieves higher accuracy for training data.

※ It is also important to notice that validation accuracy is almost same between CNNSmall and CNNMedium, which means CNNMedium may be overfitting to the training data. To avoid overfitting, data augmentation (flip, rotate, clip, resize, add gaussian noise etc the input image to increase the effective data size) technique is often used in practice.

## Training CIFAR-100

Again, training CIFAR-100 is quite similar to the training of CIFAR-10.

See train_cifar100.py. Only the difference is model definition to set the output class number (model definition itself is not changed and can be reused!!).

    # 1. Setup model
class_num = 100
model = archs[args.arch](n_out=class_num)

and dataset preparation

    # 3. Load the CIFAR-10 dataset
train, test = chainer.datasets.get_cifar100()

[hands on] Try running train code.

## Summary

We have learned how to train CNN with Chainer. CNN is widely used many image processing tasks, not only image classification. For example,

• Bounding Box detection
• SSD, YoLo V2
• Semantic segmentation
• FCN
• Colorization
• PaintsChainer
• Image generation
• GAN
• Style transfer
• chainer goph
• Super resolution
• SeRanet

etc. Now you are ready to enter these advanced image processing with deep learning!

[hands on]

Try modifying the CNN model or create your own CNN model and train it to see the computational speed and its performance (accuracy). You may try changing following

• model depth
• channel size of each layer
• Layer (Ex. use F.max_pooling_2d instead of L.Convolution2D with stride 2)
• activation function (F.relu to F.leaky_reluF.sigmoidF.tanh etc…)
• Try inserting another layer, Ex. L.BatchNormalization or F.dropout.
etc.

You can refer Chainer example codes to see the network definition examples.

Also, try configuring hyper parameter to see the performance

• Change optimizer
• Change learning rate of optimizer
etc.

## CIFAR-10, CIFAR-100 dataset introduction

Source code is uploaded on github.

CIFAR-10 and CIFAR-100 are the small image datasets with its classification labeled. It is widely used for easy image classification task/benchmark in research community.

In Chainer, CIFAR-10 and CIFAR-100 dataset can be obtained with build-in function.

Setup code:

from __future__ import print_function
import os
import matplotlib.pyplot as plt
%matplotlib inline

import numpy as np

import chainer

basedir = './src/cnn/images'

## CIFAR-10

chainer.datasets.get_cifar10 method is prepared in Chainer to get CIFAR-10 dataset. Dataset is automatically downloaded from https://www.cs.toronto.edu only for the first time, and its cache is used from second time.

CIFAR10_LABELS_LIST = [
'airplane',
'automobile',
'bird',
'cat',
'deer',
'dog',
'frog',
'horse',
'ship',
'truck'
]

train, test = chainer.datasets.get_cifar10()

The dataset structure is quite same with MNIST dataset, it is TupleDataset.
train[i] represents i-th data, there are 50000 training data.
test data structure is same, with 10000 test data.

print('len(train), type ', len(train), type(train))
print('len(test), type ', len(test), type(test))

len(train), type 50000 <class 'chainer.datasets.tuple_dataset.TupleDataset'>
len(test), type 10000 <class 'chainer.datasets.tuple_dataset.TupleDataset'>

train[i] represents i-th data, type=tuple $$(x_i, y_i)$$, where $$x_i$$ is image data and $$y_i$$ is label data.

train[i][0] represents $$x_i$$, CIFAR-10 image data, this is 3 dimensional array, (3, 32, 32), which represents RGB channel, width 32 px, height 32 px respectively.

train[i][1] represents $$y_i$$, the label of CIFAR-10 image data (scalar), this is scalar value whose actual label can be converted by LABELS_LIST.

Let’s see 0-th data, train[0], in detail.

print('train[0]', type(train[0]), len(train[0]))

x0, y0 = train[0]
print('train[0][0]', x0.shape, x0)
print('train[0][1]', y0.shape, y0, '->', CIFAR10_LABELS_LIST[y0])

train[0] <class 'tuple'> 2 train[0][0] (3, 32, 32) [[[ 0.23137257 0.16862746 0.19607845 ..., 0.61960787 0.59607846 0.58039218] [ 0.0627451 0. 0.07058824 ..., 0.48235297 0.4666667 0.4784314 ] [ 0.09803922 0.0627451 0.19215688 ..., 0.46274513 0.47058827 0.42745101] ..., [ 0.81568635 0.78823537 0.77647066 ..., 0.627451 0.21960786 0.20784315] [ 0.70588237 0.67843139 0.72941178 ..., 0.72156864 0.38039219 0.32549021] [ 0.69411767 0.65882355 0.7019608 ..., 0.84705889 0.59215689 0.48235297]] [[ 0.24313727 0.18039216 0.18823531 ..., 0.51764709 0.49019611 0.48627454] [ 0.07843138 0. 0.03137255 ..., 0.34509805 0.32549021 0.34117648] [ 0.09411766 0.02745098 0.10588236 ..., 0.32941177 0.32941177 0.28627452] ..., [ 0.66666669 0.60000002 0.63137257 ..., 0.52156866 0.12156864 0.13333334] [ 0.54509807 0.48235297 0.56470591 ..., 0.58039218 0.24313727 0.20784315] [ 0.56470591 0.50588238 0.55686277 ..., 0.72156864 0.46274513 0.36078432]] [[ 0.24705884 0.17647059 0.16862746 ..., 0.42352945 0.40000004 0.4039216 ] [ 0.07843138 0. 0. ..., 0.21568629 0.19607845 0.22352943] [ 0.08235294 0. 0.03137255 ..., 0.19607845 0.19607845 0.16470589] ..., [ 0.37647063 0.13333334 0.10196079 ..., 0.27450982 0.02745098 0.07843138] [ 0.37647063 0.16470589 0.11764707 ..., 0.36862746 0.13333334 0.13333334] [ 0.45490199 0.36862746 0.34117648 ..., 0.54901963 0.32941177 0.28235295]]]
train[0][1] () 6 -> frog

def plot_cifar(filepath, data, row, col, scale=3., label_list=None):
fig_width = data[0][0].shape[1] / 80 * row * scale
fig_height = data[0][0].shape[2] / 80 * col * scale
fig, axes = plt.subplots(row,
col,
figsize=(fig_height, fig_width))
for i in range(row * col):
# train[i][0] is i-th image data with size 32x32
image, label_index = data[i]
image = image.transpose(1, 2, 0)
r, c = divmod(i, col)
axes[r][c].imshow(image)  # cmap='gray' is for black and white picture.
if label_list is None:
axes[r][c].set_title('label {}'.format(label_index))
else:
axes[r][c].set_title('{}: {}'.format(label_index, label_list[label_index]))
axes[r][c].axis('off')  # do not show axis value
plt.tight_layout()   # automatic padding between subplots
plt.savefig(filepath)
plot_cifar(os.path.join(basedir, 'cifar10_plot.png'), train, 4, 5,
scale=4., label_list=CIFAR10_LABELS_LIST)
plot_cifar(os.path.join(basedir, 'cifar10_plot_more.png'), train, 10, 10,
scale=4., label_list=CIFAR10_LABELS_LIST)

## CIFAR-100

CIFAR-100 is really similar to CIFAR-10. The difference is the number of classified label is 100. chainer.datasets.get_cifar100 method is prepared in Chainer to get CIFAR-100 dataset.

CIFAR100_LABELS_LIST = [
'apple', 'aquarium_fish', 'baby', 'bear', 'beaver', 'bed', 'bee', 'beetle',
'bicycle', 'bottle', 'bowl', 'boy', 'bridge', 'bus', 'butterfly', 'camel',
'can', 'castle', 'caterpillar', 'cattle', 'chair', 'chimpanzee', 'clock',
'cloud', 'cockroach', 'couch', 'crab', 'crocodile', 'cup', 'dinosaur',
'dolphin', 'elephant', 'flatfish', 'forest', 'fox', 'girl', 'hamster',
'house', 'kangaroo', 'keyboard', 'lamp', 'lawn_mower', 'leopard', 'lion',
'lizard', 'lobster', 'man', 'maple_tree', 'motorcycle', 'mountain', 'mouse',
'mushroom', 'oak_tree', 'orange', 'orchid', 'otter', 'palm_tree', 'pear',
'pickup_truck', 'pine_tree', 'plain', 'plate', 'poppy', 'porcupine',
'possum', 'rabbit', 'raccoon', 'ray', 'road', 'rocket', 'rose',
'sea', 'seal', 'shark', 'shrew', 'skunk', 'skyscraper', 'snail', 'snake',
'spider', 'squirrel', 'streetcar', 'sunflower', 'sweet_pepper', 'table',
'tank', 'telephone', 'television', 'tiger', 'tractor', 'train', 'trout',
'tulip', 'turtle', 'wardrobe', 'whale', 'willow_tree', 'wolf', 'woman',
'worm'
]

train_cifar100, test_cifar100 = chainer.datasets.get_cifar100()

The dataset structure is quite same with MNIST dataset, it is TupleDataset.

train[i] represents i-th data, there are 50000 training data. Total train data is same size while the number of class label increased. So the training data for each class label is fewer than CIFAR-10 dataset.

test data structure is same, with 10000 test data.

print('len(train_cifar100), type ', len(train_cifar100), type(train_cifar100))
print('len(test_cifar100), type ', len(test_cifar100), type(test_cifar100))

print('train_cifar100[0]', type(train_cifar100[0]), len(train_cifar100[0]))

x0, y0 = train_cifar100[0]
print('train_cifar100[0][0]', x0.shape)  # , x0
print('train_cifar100[0][1]', y0.shape, y0)

len(train_cifar100), type 50000 <class 'chainer.datasets.tuple_dataset.TupleDataset'>
len(test_cifar100), type 10000 <class 'chainer.datasets.tuple_dataset.TupleDataset'>
train_cifar100[0] <class 'tuple'> 2
train_cifar100[0][0] (3, 32, 32)
train_cifar100[0][1] () 19

plot_cifar(os.path.join(basedir, 'cifar100_plot_more.png'), train_cifar100,
10, 10, scale=4., label_list=CIFAR100_LABELS_LIST)