[Update 2017.06.11] Add Chainer v2 code
Machine learning consists of training phase and predict/inference phase, and what model need to calculate is different
- Training phase: calculate loss (between on output and target)
- Predict/Inference phase: calculate output
To manage this, I often see below 2 patterns to manage this.
Predictor – Classifier framework
See train_mnist_2_predictor_classifier.py (train_mnist_1_minimum.py and train_mnist_4_trainer.py are also implemented in Predictor – Classifier framework)
2 Chain
classes, “Predictor” and “Classifier” are used for this framework.
- Training phase: Predictor’s output is fed into Classifier to calculate loss.
- Predict/Inference phase: Only predictor’s output is used.
- Predictor
Predictor simply calculates output based on input.
# Network definition Chainer v2 # 1. `init_scope()` is used to initialize links for IDE friendly design. # 2. input size of Linear layer can be omitted class MLP(chainer.Chain): def __init__(self, n_units, n_out): super(MLP, self).__init__() with self.init_scope(): # input size of each layer will be inferred when omitted self.l1 = L.Linear(n_units) # n_in -> n_units self.l2 = L.Linear(n_units) # n_units -> n_units self.l3 = L.Linear(n_out) # n_units -> n_out def __call__(self, x): h1 = F.relu(self.l1(x)) h2 = F.relu(self.l2(h1)) return self.l3(h2)
model = mlp.MLP(args.unit, 10)
- Classifier
Classifier “wraps” predictors output y
to calculate loss between y
and actual target t
.
classifier_model = L.Classifier(model)
optimizer.update(classifier_model, x, t)
which invokes classifier_model(x, t)
internally, calculates loss and update internal parameter by back propagation.
Refer source code of Classifier for the detail.
Train flag framework
[Update] In chainer v2, global flag chainer.config.train
is introduced. This framework may not be the recommended way for now.
See train_mnist_3_train_flag.py.
Both the loss calculation in train phase and predict code for inference phase are implemented within one model, and the behavior is managed by “train flag” (or “test flag”/”predict flag”).
# Network definition class MLP(chainer.Chain): def __init__(self, n_units, n_out): super(MLP, self).__init__() with self.init_scope(): self.l1 = L.Linear(None, n_units) # n_in -> n_units self.l2 = L.Linear(None, n_units) # n_units -> n_units self.l3 = L.Linear(None, n_out) # n_units -> n_out # Define train flag self.train = True def __call__(self, x, t=None): h1 = F.relu(self.l1(x)) h2 = F.relu(self.l2(h1)) y = self.l3(h2) if self.train: # return loss in training phase #y = self.predictor(x) self.loss = F.softmax_cross_entropy(y, t) self.accuracy = F.accuracy(y, t) return self.loss else: # return y in predict/inference phase return y
As default, self.train = True
, and this model will calculate loss so that optimizer can update its internal parameters.
To predict value, we can set train
flag to False
,
model.train = False y = model(x) # model.train = True # if necessary
Comparison
Predictor – Classifier framework has an advantage that Classifier module can be independent and it will be reusable. However, if loss calculation is complicated, it is difficult to apply this framework.
In train flag framework, train loss calculation and predict calculation can be independent. You can implement any loss calculation, even the loss calculation is very different from predict calculation.
Basically, you can use Predictor – Classifier framework if the loss function is typical. Use train flag framework otherwise.
Next: Writing organized, reusable, clean training code using Trainer module