Predict code for Penn Bank Tree (ptb) dataset


Predict code is pretty much the same with Predict code for simple sequence dataset, so I won’t explain in detail.



The code is on the github,



Given the first text by the index, args.primeindex, model will predict the following sequence as word id.

The last three line converts the word id sequence into readable word sentence using ptb_id_word_dict.




When I run, (the model is RNN model)

I got the text

Predicted text: executive vice president and chief operating officer of <unk> <unk> & <unk> a <unk> mass. newsletter <eos> the <unk> <unk> <unk> <unk> <unk> <unk> from the <unk> <eos> the <unk> <unk> <unk> <unk> <unk> <unk> from the <unk> <eos> the <unk> <unk> <unk> <unk> <unk> <unk> from the <unk> <eos> the <unk> <unk> <unk> <unk> <unk> <unk> from the <unk> <eos> the <unk> <unk> <unk> <unk> <unk> <unk> from the <unk> <eos> the <unk> <unk> <unk> <unk> <unk> <unk> from the <unk> <eos> the <unk> <unk> <unk> <unk> <unk> <unk> from the <unk> <eos> the <unk> <unk> <unk> <unk> <unk> <unk>


It seems the model can predict a first shot sentence but once it has reached to <unk> or <eos>, it will keep returning the same symbol. Also “the” will appear quite often than other words.

I think the model is not trained well enough yet, and you may try training the model more to get more good result!



Sponsored Links

Leave a Reply

Your email address will not be published.