Chainer basic module introduction

[Update 2017.06.11] Add Chainer v2 code.

 

This post is just a copy of chainer_module1.ipynb on github, you can execute interactively using jupyter notebook.

Advanced memo is written as “Note”. You can skip reading this for the first time reading. 

In this tutorial, basic chainer modules are introduced and explained

  • Variable
  • Link
  • Function
  • Chain

For other chainer modules are explained in later tutorial.

Initial setup

Below is typecal import statement of chainer modules.

 

Variable

Variable

Variable class. It holds several properties, and several properties behaves same as numpy array.

Chainer variable can be created by Variable constructor, which creates chainer.Variable class object.

When I write Variable, it means chainer’s class for Variable. Please do not confuse with the usual noun of “variable”.

Note: the reason why chainer need to use own Variable, Function class for the calculation instead of just using numpy is because back propagation is necessary during deep learning training. Variable holds its “calculation history” information and Function has backward method which is differencial function in order to process back propagation. See below for more details

In the above code, numpy data type is explicitly set as dtype=np.float32. If we don’t set data type, np.float64 may be used as default type in 64-bit environment. However such a precision is usually “too much” and not necessary in machine learning. It is better to use lower precision for computational speed & memory usage.

 

attribute

Chainer Variable has following attributes

  • data
  • dtype
  • shape
  • ndim
  • size
  • grad

They are very similar to numpy.ndarray. You can access following attributes.

 

 

 

One exception is data attribute, chainer Variable‘s data refers numpy.ndarray

 

Function

Function

1. Its input/output is Variable 2. chainer.functions provides many implementations of Function

 

We want to process some calculation to Variable. Variable can be calculated using

  • arithmetric operation (Ex. +, -, *, /)
  • method which is subclass of chainer.Function (Ex. F.sigmoid, F.relu)

 

Only basic calculation can be done with arithmetric operations.

Chainer provides a set of widely used functions via chainer.functions, for example sigmoid function or ReLU (Rectified Linear Unit) function which is popularly used as activation function in deep learning.

 

Note: You can find capital letter of Function like F.Sigmoid or F.ReLU. Basically, these capital letter is actual class implmentation of Function while small letter method is getter method of these capital lettered instance.

It is recommended to use small letter method when you use F.xxx.

Just a side note, sigmoid and ReLU function are non-linear function whose form is like this.

 

 

sigmoid

sigmoid function. It takes the value between 0 and 1. It saturates to 0 when x goes to negative infinity value, and saturates to 1 when x goes to positive infinity.

relu

relu (rectified linear unit) function. When x is negative it always returns 0, and when x is positive it behaves like identity map.

 

Link

Link

1. Link acts like Function but it contains internal parameter to tune behavior 2. chainer.links provides many implementations of Link

 

Link is similar to Function, but it owns internal parameter. This internal parameter is tuned during training of machine learning.

Link is similar notion of Layer in caffe. Chainer provides layers which is introduced in popular papers via chainer.links. For example, Linear layer, Convolutional layer.

Let’s see the example, (below explanation is almost same with official tutorial)

LinkLinear

L.Linear is one example of Link. 1. L.Linear holds Internal parameter self.W and self.b 2. L.Linear computes function, F.linear. Its output depends on internal parameter W and b.

 

 

Note that internal parameter W is initialized with a random value. So every time you execute above code, the result will be different (try and check it!).

This Linear layer will take 3-dimensional vectors [x0, x1, x2…] (Variable class) as input and outputs 2-dimensional vectors [y0, y1, y2…] (Variable class).

In equation form,$$ y_i = W * x_i + b $$where i = 0, 1, 2... denotes each “minibatch” of input/output.

[Note] See source code of Linear class, you can easily understand it is just calling F.linear by

 

 

Let me emphasize the difference between Link and Function. Functions input-output relationship is fixed. On the other hand,Link` module has internal parameter and the function behavior can be changed by modifying (tuning) this internal parameter.

 

The value of output y is different compared to above code, even though we input same value of x.

These internal parameters are “tuned” during training in machine learning. Usually, we do not need to set these internal parameter W or b manually, chainer will automatically update these internal parameters during training through back propagation.

 

Chain

Chain is to construct neural networks. It usually consists of several combination of Link and Function modules.

Let’s see example,

 

Memo:
Above init_scope() method is introduced in chainer v2,
and Link class instances are initialized inside this scope.

In chainer v1, Chain was initialized as follows.
Concretely, Link class instances are initialized in the argument of super method.
For backward compatibility, you may use this type of initialization in chainer v2 as well.

 

Based on the official doc, Chain class provides following functionality

  • parameter management
  • CPU/GPU migration support
  • save/load features

to provide convinient reusability of your neural network code.

I will use the word “model” to denote this neural network, implemented as Chain class in Chainer.

Proceed to Chainer basic module introduction 2 to learn how to write “training” code of model, using Optimizer and Serializer

 

 

Sponsored Links

Leave a Reply

Your email address will not be published.