Long Short Term Memory (LSTM) introduction

Long Short Term Memory

Diagrom of Long Short Term Memory. Cite from https://en.wikipedia.org/wiki/File:Peephole_Long_Short-Term_Memory.svg Originally created by Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton

Long short term memory is advanced version of RNN, which have “Cell” c to keep long term information.

LSTM network Implementation with Chainer

LSTM function and link is provided by Chainer, so we can just use it to construct a neural network with LSTM.

Sample implementation is following, (referred from official example code)

import numpy as np

import chainer
import chainer.functions as F
import chainer.links as L


# Copied from chainer examples code
class RNNForLM(chainer.Chain):
    """Definition of a recurrent net for language modeling"""

    def __init__(self, n_vocab, n_units):
        super(RNNForLM, self).__init__()
        with self.init_scope():
            self.embed = L.EmbedID(n_vocab, n_units)
            self.l1 = L.LSTM(n_units, n_units)
            self.l2 = L.LSTM(n_units, n_units)
            self.l3 = L.Linear(n_units, n_vocab)

        for param in self.params():
            param.data[...] = np.random.uniform(-0.1, 0.1, param.data.shape)

    def reset_state(self):
        self.l1.reset_state()
        self.l2.reset_state()

    def __call__(self, x):
        h0 = self.embed(x)
        h1 = self.l1(F.dropout(h0))
        h2 = self.l2(F.dropout(h1))
        y = self.l3(F.dropout(h2))
        return y

Update: [Note]

self.params() will return all the “learnable” parameter in this Chain class (for example W and b in Linear link to calculate x * W + b

Thus, below code will replace all the initial parameter by uniformly distributed value between -0.1 and 0.1.

for param in self.params():
            param.data[...] = np.random.uniform(-0.1, 0.1, param.data.shape)

Appendix: chainer v1 code

It was written as follows until chainer v1. From Chainer v2, the train flag in function (ex. dropout function) has been removed ans chainer global config is used instead.

import numpy as np

import chainer
import chainer.functions as F
import chainer.links as L


# Copied from chainer examples code
class RNNForLM(chainer.Chain):
    """Definition of a recurrent net for language modeling"""

    def __init__(self, n_vocab, n_units, train=True):
        super(RNNForLM, self).__init__()
        with self.init_scope():
            self.embed = L.EmbedID(n_vocab, n_units)
            self.l1 = L.LSTM(n_units, n_units)
            self.l2 = L.LSTM(n_units, n_units)
            self.l3 = L.Linear(n_units, n_vocab)

        for param in self.params():
            param.data[...] = np.random.uniform(-0.1, 0.1, param.data.shape)
        self.train = train

    def reset_state(self):
        self.l1.reset_state()
        self.l2.reset_state()

    def __call__(self, x):
        h0 = self.embed(x)
        h1 = self.l1(F.dropout(h0, train=self.train))
        h2 = self.l2(F.dropout(h1, train=self.train))
        y = self.l3(F.dropout(h2, train=self.train))
        return y

Leave a Comment

Your email address will not be published. Required fields are marked *