Chainer basic module introduction

[Update 2017.06.11] Add Chainer v2 code.

This post is just a copy of chainer_module1.ipynb on github, you can execute interactively using jupyter notebook.

Advanced memo is written as “Note”. You can skip reading this for the first time reading.

In this tutorial, basic chainer modules are introduced and explained

Variable
Link
Function
Chain

For other chainer modules are explained in later tutorial.

Contents

1 Initial setup
2 Variable
- 2.1 attribute
3 Function
4 Link
5 Chain

Initial setup

Below is typecal import statement of chainer modules.

# Initial setup following http://docs.chainer.org/en/stable/tutorial/basic.html
import numpy as np
import chainer
from chainer import cuda, Function, gradient_check, report, training, utils, Variable
from chainer import datasets, iterators, optimizers, serializers
from chainer import Link, Chain, ChainList
import chainer.functions as F
import chainer.links as L
from chainer.training import extensions

Variable

Chainer variable can be created by Variable constructor, which creates chainer.Variable class object.

When I write Variable, it means chainer’s class for Variable. Please do not confuse with the usual noun of “variable”.

Note: the reason why chainer need to use own Variable, Function class for the calculation instead of just using numpy is because back propagation is necessary during deep learning training. Variable holds its “calculation history” information and Function has backward method which is differencial function in order to process back propagation. See below for more details

Chainer Tutorial Forward/Backward Computation

from chainer import Variable
# creating numpy array
# this is `numpy.ndarray` class
a = np.asarray([1., 2., 3.], dtype=np.float32)

# chainer variable is created from `numpy.ndarray` or `cuda.ndarray` (explained later) 
x = Variable(a)

print('a: ', a, ', type: ', type(a))
print('x: ', x, ', type: ', type(x))

a: [ 1. 2. 3.] , type: <class 'numpy.ndarray'>
x: <var@1b45519df60> , type: <class 'chainer.variable.Variable'>

In the above code, numpy data type is explicitly set as dtype=np.float32. If we don’t set data type, np.float64 may be used as default type in 64-bit environment. However such a precision is usually “too much” and not necessary in machine learning. It is better to use lower precision for computational speed & memory usage.

attribute

Chainer Variable has following attributes

data
dtype
shape
ndim
size
grad

They are very similar to numpy.ndarray. You can access following attributes.

# These attributes return the same

print('attributes', 'numpy.ndarray a', 'chcainer.Variable x')
print('dtype', a.dtype, x.dtype)
print('shape', a.shape, x.shape)
print('ndim', a.ndim, x.ndim)
print('size', a.size, x.size)

attributes numpy.ndarray a chcainer.Variable x
dtype float32 float32
shape (3,) (3,)
ndim 1 1
size 3 3

# Variable class has debug_print function, to show this Variable's properties.
x.debug_print()

"<variable at 0x1b45519df60>\n- device: CPU\n- volatile: OFF\n- backend: <class 'numpy.ndarray'>\n- shape: (3,)\n- dtype: float32\n- statistics: mean=2.00000000, std=0.81649661\n- grad: None"

One exception is data attribute, chainer Variable‘s data refers numpy.ndarray

# x = Variable(a)

# `a` can be accessed via `data` attribute from chainer `Variable`
print('x.data is a : ', x.data is a)  # True -> means the reference of x.data and a are same. 
print('x.data: ', x.data)

x.data is a :  True
x.data:  [ 1.  2.  3.]

Function

We want to process some calculation to Variable. Variable can be calculated using

arithmetric operation (Ex. +, -, *, /)
method which is subclass of chainer.Function (Ex. F.sigmoid, F.relu)

# Arithmetric operation example
x = Variable(np.array([1, 2, 3], dtype=np.float32))
y = Variable(np.array([5, 6, 7], dtype=np.float32))

# usual arithmetric operator (this case `*`) can be used for calculation of `Variable`
z = x * y
print('z: ', z.data, ', type: ', type(z))

z:  [  5.  12.  21.] , type:  <class 'chainer.variable.Variable'>

Only basic calculation can be done with arithmetric operations.

Chainer provides a set of widely used functions via chainer.functions, for example sigmoid function or ReLU (Rectified Linear Unit) function which is popularly used as activation function in deep learning.

# Functoin operation example
import chainer.functions as F

x = Variable(np.array([-1.5, -0.5, 0, 1, 2], dtype=np.float32))
sigmoid_x = F.sigmoid(x)  # sigmoid function. F.sigmoid is subclass of `Function`
relu_x = F.relu(x)        # ReLU function. F.relu is subclass of `Function`

print('x: ', x.data, ', type: ', type(x))
print('sigmoid_x: ', sigmoid_x.data, ', type: ', type(sigmoid_x))
print('relu_x: ', relu_x.data, ', type: ', type(relu_x))

x:  [-1.5 -0.5  0.   1.   2. ] , type:  <class 'chainer.variable.Variable'>
sigmoid_x:  [ 0.18242553  0.37754068  0.5         0.7310586   0.88079709] , type:  <class 'chainer.variable.Variable'>
relu_x:  [ 0.  0.  0.  1.  2.] , type:  <class 'chainer.variable.Variable'>

Note: You can find capital letter of Function like F.Sigmoid or F.ReLU. Basically, these capital letter is actual class implmentation of Function while small letter method is getter method of these capital lettered instance.

It is recommended to use small letter method when you use F.xxx.

Just a side note, sigmoid and ReLU function are non-linear function whose form is like this.

%matplotlib inline
import matplotlib.pyplot as plt


def plot_chainer_function(f, xmin=-5, xmax=5, title=None):
    """draw graph of chainer `Function` `f`

    :param f: function to be plotted
    :type f: chainer.Function
    :param xmin: int or float, minimum value of x axis
    :param xmax: int or float, maximum value of x axis
    :return:
    """
    a = np.arange(xmin, xmax, step=0.1)
    x = Variable(a)
    y = f(x)
    plt.clf()
    plt.figure()
    # x and y are `Variable`, their value can be accessed via `data` attribute
    plt.plot(x.data, y.data, 'r')
    if title is not None:
        plt.title(title)
    plt.show()

plot_chainer_function(F.sigmoid, title='Sigmoid')
plot_chainer_function(F.relu, title='ReLU')

*sigmoid function. It takes the value between 0 and 1. It saturates to 0 when x goes to negative infinity value, and saturates to 1 when x goes to positive infinity.*

*relu (rectified linear unit) function. When x is negative it always returns 0, and when x is positive it behaves like identity map.*

Link

Link is similar to Function, but it owns internal parameter. This internal parameter is tuned during training of machine learning.

Link is similar notion of Layer in caffe. Chainer provides layers which is introduced in popular papers via chainer.links. For example, Linear layer, Convolutional layer.

Let’s see the example, (below explanation is almost same with official tutorial)

*L.Linear is one example of Link. 1. L.Linear holds Internal parameter self.W and self.b 2. L.Linear computes function, F.linear. Its output depends on internal parameter W and b.*

import chainer.links as L

in_size = 3  # input vector's dimension
out_size = 2  # output vector's dimension

linear_layer = L.Linear(in_size, out_size)  # L.linear is subclass of `Link`

"""linear_layer has 2 internal parameters `W` and `b`, which are `Variable`"""
print('W: ', linear_layer.W.data, ', shape: ', linear_layer.W.shape)
print('b: ', linear_layer.b.data, ', shape: ', linear_layer.b.shape)

W:  [[-0.11581965  0.44105673 -0.15657252]
 [-0.16952933 -0.57037091 -0.57939512]] , shape:  (2, 3)
b:  [ 0.  0.] , shape:  (2,)

Note that internal parameter W is initialized with a random value. So every time you execute above code, the result will be different (try and check it!).

This Linear layer will take 3-dimensional vectors [x0, x1, x2…] (Variable class) as input and outputs 2-dimensional vectors [y0, y1, y2…] (Variable class).

In equation form,where i = 0, 1, 2... denotes each “minibatch” of input/output.

[Note] See source code of Linear class, you can easily understand it is just calling F.linear by

return linear.linear(x, self.W, self.b)

x0 = np.array([1, 0, 0], dtype=np.float32)
x1 = np.array([1, 1, 1], dtype=np.float32)

x = Variable(np.array([x0, x1], dtype=np.float32))
y = linear_layer(x)
print('W: ', linear_layer.W.data)
print('b: ', linear_layer.b.data)
print('x: ', x.data)  # input is x0 & x1
print('y: ', y.data)  # output is y0 & y1

W:  [[-0.11581965  0.44105673 -0.15657252]
 [-0.16952933 -0.57037091 -0.57939512]]
b:  [ 0.  0.]
x:  [[ 1.  0.  0.]
 [ 1.  1.  1.]]
y:  [[-0.11581965 -0.16952933]
 [ 0.16866457 -1.31929541]]

Let me emphasize the difference between Link and Function. Functions input-output relationship is fixed. On the other hand,Link` module has internal parameter and the function behavior can be changed by modifying (tuning) this internal parameter.

# Force update (set) internal parameters
linear_layer.W.data = np.array([[1, 2, 3], [0, 0, 0]], dtype=np.float32)
linear_layer.b.data = np.array([3, 5], dtype=np.float32)

# below is same code with above cell, but output data y will be different
x0 = np.array([1, 0, 0], dtype=np.float32)
x1 = np.array([1, 1, 1], dtype=np.float32)

x = Variable(np.array([x0, x1], dtype=np.float32))
y = linear_layer(x)
print('W: ', linear_layer.W.data)
print('b: ', linear_layer.b.data)
print('x: ', x.data)  # input is x0 & x1
print('y: ', y.data)  # output is y0 & y1

W:  [[ 1.  2.  3.]
 [ 0.  0.  0.]]
b:  [ 3.  5.]
x:  [[ 1.  0.  0.]
 [ 1.  1.  1.]]
y:  [[ 4.  5.]
 [ 9.  5.]]

1234567

W: [[ 1. 2. 3.] [ 0. 0. 0.]]b: [ 3. 5.]x: [[ 1. 0. 0.] [ 1. 1. 1.]]y: [[ 4. 5.] [ 9. 5.]]

The value of output y is different compared to above code, even though we input same value of x.

These internal parameters are “tuned” during training in machine learning. Usually, we do not need to set these internal parameter W or b manually, chainer will automatically update these internal parameters during training through back propagation.

Chain

Chain is to construct neural networks. It usually consists of several combination of Link and Function modules.

Let’s see example,

from chainer import Chain, Variable


# Defining your own neural networks using `Chain` class
class MyChain(Chain):
    def __init__(self):
        super(MyChain, self).__init__()
        with self.init_scope():
            self.l1 = L.Linear(2, 2)
            self.l2 = L.Linear(2, 1)
        
    def __call__(self, x):
        h = self.l1(x)
        return self.l2(h)
    

x = Variable(np.array([[1, 2], [3, 4]], dtype=np.float32))

model = MyChain()
y = model(x)

print('x: ', x.data)  # input is x0 & x1
print('y: ', y.data)  # output is y0 & y1

x:  [[ 1.  2.]
 [ 3.  4.]]
y:  [[-0.54882193]
 [-1.15839911]]

Memo:
Above init_scope() method is introduced in chainer v2,
and Link class instances are initialized inside this scope.

In chainer v1, Chain was initialized as follows.
Concretely, Link class instances are initialized in the argument of super method.
For backward compatibility, you may use this type of initialization in chainer v2 as well.

from chainer import Chain, Variable


# Defining your own neural networks using `Chain` class
class MyChain(Chain):
    def __init__(self):
        super(MyChain, self).__init__(
            l1=L.Linear(2, 2),
            l2=L.Linear(2, 1)
        )
        
    def __call__(self, x):
        h = self.l1(x)
        return self.l2(h)
    

x = Variable(np.array([[1, 2], [3, 4]], dtype=np.float32))

model = MyChain()
y = model(x)

print('x: ', x.data)  # input is x0 & x1
print('y: ', y.data)  # output is y0 & y1

x:  [[ 1.  2.]
 [ 3.  4.]]
y:  [[-0.54882193]
 [-1.15839911]]

Based on the official doc, Chain class provides following functionality

parameter management
CPU/GPU migration support
save/load features

to provide convinient reusability of your neural network code.

I will use the word “model” to denote this neural network, implemented as Chain class in Chainer.

Proceed to Chainer basic module introduction 2 to learn how to write “training” code of model, using Optimizer and Serializer.