Refactoring MNIST training

Previous section, we learned minimum implementation ( for the training code for MNIST. Now, let’s refactor the codes.



argparse is used to provide configurable script code. User can pass variable when executing the code.

Below code is added to the training code

import argparse

def main():
    parser = argparse.ArgumentParser(description='Chainer example: MNIST')
    parser.add_argument('--initmodel', '-m', default='',
                        help='Initialize the model from given file')
    parser.add_argument('--batchsize', '-b', type=int, default=100,
                        help='Number of images in each mini-batch')
    parser.add_argument('--epoch', '-e', type=int, default=20,
                        help='Number of sweeps over the dataset to train')
    parser.add_argument('--gpu', '-g', type=int, default=-1,
                        help='GPU ID (negative value indicates CPU)')
    parser.add_argument('--out', '-o', default='result/2',
                        help='Directory to output the result')
    parser.add_argument('--resume', '-r', default='',
                        help='Resume the training from snapshot')
    parser.add_argument('--unit', '-u', type=int, default=50,
                        help='Number of units')
    args = parser.parse_args()


Then, these variables are configurable when executing the code from console. And these variables can be accessed by (e.g. args.batchsizeargs.epoch etc.).

For example, to set gpu device number 0,

$ python -g 0


$ python --gpu 0

or even adding “=”, works the same

$ python --gpu=0

You can also see what command is available using --help command or simply -h.

xxx:~/workspace/pycharm/chainer-hands-on-tutorial/src/mnist$ python -h
usage: [-h] [--initmodel INITMODEL]
                                             [--batchsize BATCHSIZE]
                                             [--epoch EPOCH] [--gpu GPU]
                                             [--out OUT] [--resume RESUME]
                                             [--unit UNIT]

Chainer example: MNIST

optional arguments:
  -h, --help            show this help message and exit
  --initmodel INITMODEL, -m INITMODEL
                        Initialize the model from given file
  --batchsize BATCHSIZE, -b BATCHSIZE
                        Number of images in each mini-batch
  --epoch EPOCH, -e EPOCH
                        Number of sweeps over the dataset to train
  --gpu GPU, -g GPU     GPU ID (negative value indicates CPU)
  --out OUT, -o OUT     Directory to output the result
  --resume RESUME, -r RESUME
                        Resume the training from snapshot
  --unit UNIT, -u UNIT  Number 


[hands on] Try running the training with 10 epoch.

It can be done by

$ python -e 10

and you don’t need to modify the python source code thanks to the

[hands on] If you have GPU, use GPU for training with model unit size = 1000.

$ python -g 0 -u 1000

save/resume training

Save and load the model or optimizer can be done using serializers, below code is to save the training result. The directory to save the result can be configured by -o option.

    parser.add_argument('--out', '-o', default='result/2',
                        help='Directory to output the result')


    # Save the model and the optimizer
    print('save the model')
    serializers.save_npz('{}/classifier.model'.format(args.out), classifier_model)
    serializers.save_npz('{}/mlp.model'.format(args.out), model)
    print('save the optimizer')
    serializers.save_npz('{}/mlp.state'.format(args.out), optimizer)

If you want to resume the training based on the previous training result, load the model and optimizer before start training.

Optimizer also owns internal parameters and thus need to be loaded for resuming training. For example, Adam holds the “first moment” m and “second moment” v explained in adam.

    parser.add_argument('--initmodel', '-m', default='',
                        help='Initialize the model from given file')
    parser.add_argument('--resume', '-r', default='',
                        help='Resume the training from snapshot')

    # Init/Resume
    if args.initmodel:
        print('Load model from', args.initmodel)
        serializers.load_npz(args.initmodel, classifier_model)
    if args.resume:
        print('Load optimizer state from', args.resume)
        serializers.load_npz(args.resume, optimizer)

[hands on] Check resume the code after first training, by running

xxx:~/workspace/pycharm/chainer-hands-on-tutorial/src/mnist$ python -m result/2/classifier.model -r result/2/mlp.state
GPU: -1
# unit: 50
# Minibatch-size: 100
# epoch: 20

Load model from result/2/classifier.model
Load optimizer state from result/2/mlp.state
epoch 1
graph generated
train mean loss=0.037441188701771655, accuracy=0.9890166732668877, throughput=57888.5195400998 images/sec
test  mean loss=0.1542429528321469, accuracy=0.974500007033348

 You can check pre-trained model is used and accuracy is high (98%) from the beginning.

Note that these codes are not executed if no configuration is specified, model and optimizer is not loaded and default initial value is used.


Build-in Link, L.Classifier is used instead of custom class SoftmaxClassifier in

    classifier_model = L.Classifier(model)

I implemented SoftmaxClassifier, to let you understand the loss calculation (suing softmax for classification task). However, most of the classification task use this function and it is already supported as a build-in Link L.Classifier

You can consider using L.Classifier when coding a classification task.

[hands on]

Read the source code of  L.Classifier, and compare it with SoftmaxClassifier in

Next: Design patterns for defining model

Leave a Comment

Your email address will not be published. Required fields are marked *