Recurrent Neural Network (RNN) introduction

[Update 2017.06.11] Add chainer v2 code

 

How can we deal with the sequential data in deep neural network?

This formulation is especially important in natural language processing (NLP) field. For example, text is made of sequence of word. If we want to predict the next word from given sentence, the probability of the next word depends on whole past sequence of word.

So, the neural network need an ability to “remember” the past sentence to predict next word.

text_sequence_predict

In this chapter, Recurrent Neural Network (RNN) and Long Short Term Memory (LSTM) are introduced to deal with sequential data.

Recurrent Neural Network (RNN)

rnn1_graph

Recurrent Neural Network

Recurrent Neural Network is similar to Multi Layer Perceptron introduced before, but a loop is added in its hidden layer (Shown in above figure with \( W_{hh} \)).
Here the subscript \(t\) represents the time step (sequence index). Due to this loop hidden layer unit \(h_{t-1}\) is fed again to construct hidden unit \(h_{t}\) of next sequence. Therefore, information of past sequence can be “stored” (memorized) in hidden layer and passed to next sequence.

You might wonder how the loop works in neural network in the above figure, below figure is the expanded version which explicitly explain how the loop works. 

 

rnn1_expand

Expanded figure of Recurrent Neural Network.

In this figure, data flow is from bottom (\(x\)) to top (\(y\)) and horizontal axis represents time step from left (time step=1) to right (time step=\(t\)).

Every time of the forward computation, it depends on the previous hidden unit \(h_{t-1} \). So the RNN need to keep this hidden unit as a state, see implementation below.

Also, we need to be careful when executing back propagation, because it depends on the history of consecutive forward computation. The detail will be explained in later.

 

 

RNN implementation in Chainer

Below code shows implementation of the most simple RNN implementation with one hidden (recurrent) layer, drawn in above figure. 

 

 

EmbedID link

L.EmbedID is used in the above RNN implementation. This is convenient method if you want to input data which can be represented as ID.

embedid

EmbedID takes integer ID as input, and output 1-d vector with size out_size.

In NLP with text processing, each word is represented as ID in integer. EmbedID layer convert this id into vector which can be considered as vector representation of the word.

More precisely, EmbedID layer works as combination of 2 operations:

  1. Convert integer ID into in_size dimensional one-hot vector.
  2. Apply Linear layer (with bias \(b = 0\)) to this one-hot vector to output out_size units.

See official document for details,

Creating RecurrentBlock as sub-module

If you want to create more deep RNN, you can make recurrent block as a sub module layer like below.

 

Next: Training RNN with simple sequence dataset

Sponsored Links

Leave a Reply

Your email address will not be published.