Predict code for simple sequence dataset

Predict code is easy, implemented in predict_simple_sequence.py.

First, construct the model and load the trained model parameters,

    # Model Setup
    model = archs[args.arch](n_vocab=N_VOCABULARY, n_units=args.unit)
    if args.gpu >= 0:
        chainer.cuda.get_device(args.gpu).use()  # Make a specified GPU current
        model.to_gpu()                           # Copy the model to the GPU
    xp = np if args.gpu < 0 else cuda.cupy

    serializers.load_npz(args.modelpath, model)

Then we only specify the first index (corresponds to word id), primeindex, and generate next index. We can generate next index repeatedly based on the generated index.

    # Predict
    predicted_sequence = [prev_index]
    for i in range(args.length):
        prev = chainer.Variable(xp.array([prev_index], dtype=xp.int32))
        current = model(prev)
        current_index = np.argmax(cuda.to_cpu(current.data))
        predicted_sequence.append(current_index)
        prev_index = current_index

    print('Predicted sequence: ', predicted_sequence)

The result is the following, successfully generate the sequence.

Predicted sequence: [1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5]

This is a simple example to check if RNN has an ability to remember past sequence, so I didn’t prepare validation data. I just wanted to check if the RNN model can “memorize” the training data sequence or not.

Note that the situation is little bit different during training phase and inference/predict phase. In training phase, the model is trained to generate \(xt\) based on the correct sequence \([x_0, x_1, \dots, x_{t-1}]\).

However in predict phase, we only specify the first index \(x_0\), and the model will generate \(x’_1\) (here ‘ indicates output from the model). After that, the model need to generate \(x’_2\) based on \(x’_1\). Therefore, the model will generate \(x’_t\) based on the predicted sequence \([x_0, x’_1 \dots, x’_{t-1}]\).

Leave a Comment Cancel reply