Long Short Term Memory
Long short term memory is advanced version of RNN, which have “Cell” c
to keep long term information.
LSTM network Implementation with Chainer
LSTM function and link is provided by Chainer, so we can just use it to construct a neural network with LSTM.
Sample implementation is following, (referred from official example code)
import numpy as np import chainer import chainer.functions as F import chainer.links as L # Copied from chainer examples code class RNNForLM(chainer.Chain): """Definition of a recurrent net for language modeling""" def __init__(self, n_vocab, n_units): super(RNNForLM, self).__init__() with self.init_scope(): self.embed = L.EmbedID(n_vocab, n_units) self.l1 = L.LSTM(n_units, n_units) self.l2 = L.LSTM(n_units, n_units) self.l3 = L.Linear(n_units, n_vocab) for param in self.params(): param.data[...] = np.random.uniform(-0.1, 0.1, param.data.shape) def reset_state(self): self.l1.reset_state() self.l2.reset_state() def __call__(self, x): h0 = self.embed(x) h1 = self.l1(F.dropout(h0)) h2 = self.l2(F.dropout(h1)) y = self.l3(F.dropout(h2)) return y
Update: [Note]
self.params()
will return all the “learnable” parameter in this Chain
class (for example W
and b
in Linear
link to calculate x * W + b
)
Thus, below code will replace all the initial parameter by uniformly distributed value between -0.1 and 0.1.
for param in self.params(): param.data[...] = np.random.uniform(-0.1, 0.1, param.data.shape)
Appendix: chainer v1 code
It was written as follows until chainer v1. From Chainer v2, the train flag in function (ex. dropout
function) has been removed ans chainer global config is used instead.
import numpy as np import chainer import chainer.functions as F import chainer.links as L # Copied from chainer examples code class RNNForLM(chainer.Chain): """Definition of a recurrent net for language modeling""" def __init__(self, n_vocab, n_units, train=True): super(RNNForLM, self).__init__() with self.init_scope(): self.embed = L.EmbedID(n_vocab, n_units) self.l1 = L.LSTM(n_units, n_units) self.l2 = L.LSTM(n_units, n_units) self.l3 = L.Linear(n_units, n_vocab) for param in self.params(): param.data[...] = np.random.uniform(-0.1, 0.1, param.data.shape) self.train = train def reset_state(self): self.l1.reset_state() self.l2.reset_state() def __call__(self, x): h0 = self.embed(x) h1 = self.l1(F.dropout(h0, train=self.train)) h2 = self.l2(F.dropout(h1, train=self.train)) y = self.l3(F.dropout(h2, train=self.train)) return y