Chainer basic module introduction 2

This post is just a copy of chainer_module2.ipynb on github, you can execute interactively using jupyter notebook.

In previous tutorial, we learned

• Variable
• Function
• Chain

Let’s try training the model (Chain) in this tutorial.
In this section, we will learn

• Optimizer – Optimizes/tunes the internal parameter to fit to the target function
• Serializer – Handle save/load the model (Chain)

For other chainer modules are explained in later tutorial.

Training

What we want to do here is regression analysis (Wikipedia).
Given set of input x and its output y,
we would like to construct a model (function) which estimates y as close as possible from given input x.

This is done by tuning an internal parameters of model (this is represented by Chain class in Chainer).
And the procedure to tune this internal parameters of model to get a desired model is often denoted as “training”.

Initial setup

Below is typecal import statement of chainer modules.

Our task is to make regression

Linear regression using sklearn

You can skip this section if you are only interested in Chainer or deep learning. At first, let’s see linear regression approach. using sklearn library,

Optimizer

Chainer optimizer manages the optimization process of model fit.

Concretely, current deep learning works based on the technic of Stocastic Gradient Descent (SGD) based method. Chainer provides several optimizers in chainer.optimizers module, which includes following

• SGD
• MomentumSGD

Around my community, MomentumSGD and Adam are more used recently.

Construct model – implement your own Chain

Chain is to construct neural networks.

Let’s see example,

Here L.Linear is defined with None in first argument, input size. When None is used, Linear Link will determine its input size at the first time when it gets the input Variable. In other words, Link’s input size can be dynamically defined and you don’t need to fix the size at the declaration timing. This flexibility comes from the Chainer’s concept “define by run”.

Notes for data shape: x_data and y_data are reshaped when Variable is made. Linear function input and output is of the form (batch_index, feature_index). In this example, x_data and y_data have 1 dimensional feature with the batch_size = sample_num (20).

At first, optimizer is set up as following code. We can choose which kind of optimizing method is used during training (in this case, MomentumSGD is used).

Once optimizer is setup, training proceeds with iterating following code.

By the update, optimizer tries to tune internal parameters of model by decreasing the loss defined by lossfun. In this example, squared error is used as loss

Serializer

Serializer supports save/load of Chainer’s class.

After training finished, we want to save the model so that we can load it in inference stage. Another usecase is that we want to save the optimizer together with the model so that we can abort and restart the training.

The code below is almost same with the training code above. Only the difference is that serializers.load_npz() (or serializers.load_hdf5()) and serializers.save_npz() (or serializers.save_hdf5() are implemented. So now it supports resuming training, by implemeting save/load.

I also set iteration times of update as smaller value 20, to emulate training abort & resume.

Note that model and optimizer need to be instantiated to appropriate class before load.

Please execute above by setting resume = False at the first time, and then please execute the same code several times by setting resyne = True.

You can see “the dynamics” of how the model fits to the data by training proceeds.

Save format

Chainer supports two format, NPZ and HDF5.

• NPZ : Supported in numpy. So it does not require additional environment setup.
• HDF5 : Supported in h5py library. It is usually faster than npz format, but you need to install the library.

In my environment, it took

• NPZ : load 2.5ms, save 22ms
• HDF5: load 2.0ms, save 15ms

In one words, I recommend to use HDF5 format version, serializers.save_hdf5() and serializers.load_hdf5(). Just run pip install h5py` if you haven’t install the library.

Predict

Once the model is trained, you can apply this model to new data.

Compared to “training”, this is often called “predict” or “inference”.

Compare with the black dot and blue line.

It is preferable if the black dot is as close as possible to the blue line. If you train the model with enough iteration, black dot should be shown almost on the blue line in this easy example.

Summary

You learned Optimizers and Serializers module, and how these are used in training code. Optimizers update the model (Chain instance) to fit to the data. Serializers provides save/load functionality to chainer module, especially model and optimizer.

Now you understand the very basic modules of Chainer. So let’s proceed to MNIST example, this is considered as “hello world” program in machine learning community.