LSTM Adventures

Following this as a guide: http://machinelearningmastery.com/text-generation-lstm-recurrent-neural-networks-python-keras/

1: Started by training with the single-layer LSTM

model = keras.models.Sequential() 
model.add(kl.LSTM(256, input_shape=(X.shape[1], X.shape[2]))) 
model.add(kl.Dropout(0.2)) 
model.add(kl.Dense(y.shape[1], activation="softmax")) 
model.compile(loss="categorical_crossentropy", optimizer="adam")

results:

  • final loss after 20 epochs: 1.9054
  • time to train (approx): 1hr
  • mostly gibberish words, but word lengths and spacing look right!
  • can handle opening and closing quotation marks
  • sometimes gets ” ‘?’, said the . “

the mort of the sorts!’ she katter wept on, ‘and toene io the doer wo thin iire.’

‘io she mo tee toete of ther ’ou ’ould ’ou ’ould toe tealet ’our majesty,’ the match hare seid to tee jury, the was aoling to an the sooeo.

‘he d crust bi iele at all,’ said the mock turtle.

‘ie doersse toer miter ’hur paae,’ she mick turtle replied, ‘in was a little soiee an in whnl she firl th the kook an in oare

the rieet hor ane the was so tea korte of the sable, bnt the hodrt was no kently and shint oo the gan on the goor, and whnn hes lene the rueen sas so aeain, and whnn she gad been to fen ana tuiee oo thin shaee th the crrr asd then the was no toeeen at the winte tabbit wat she wiite rabbit wat she mitt of the gareen, and she woole tee whst her al iere at she could,

*note: I tried this again with a 512 unit single layer LSTM, the results were along these lines:

‘i con’t sein mo,’ said alice, ‘ho would ba au all aare an ierse then and what io the bane af in eine th the bare aadi in the cinrsn, and she was soit ion hor horo aerir the was oo the cane and the was oo the tanl oa teing the was soie in her hand

and saed to herself, ‘oh tou dane the boomer ’fth the semeen at the soiee tf the bareee and see seat in was note go the pine afd roe banl th the grore of the grureon,

Training time was slightly longer, and the results are not meaningfully different. One observation here is that training for 20 epochs might be too short…

2:  Added another LSTM layer

model = keras.models.Sequential()
model.add(kl.LSTM(256, input_shape=(X.shape[1], X.shape[2]), return_sequences=True))
model.add(kl.Dropout(0.2))
model.add(kl.LSTM(256))
model.add(kl.Dropout(0.2))
model.add(kl.Dense(y.shape[1], activation="softmax"))
model.compile(loss="categorical_crossentropy", optimizer="adam")

results:

  • final loss after 20 epochs: 1.5144
  • time to train (approx): 2hrs
  • seems to fall into loops
  • closer to ‘english’ words

and mowte it as all, and i dan see the that was a little boom as the soog of the sable it as all. and the pueen was a little boowle and the thing was she was a little boowle and the that was a little boowle and the that was a little boowse and the thing was the white rabbit say off and the thing was she was a little boowle and the that was a little boowle and the that was a little boowse and the thing was the white rabbit say off and the thing was she was a little boowle and the that was a little boowle and the that was a little boowse and the thing was the white rabbit say off and the thing was she was a little boowle and the that was a little boowle

3: Trying again with 3 LSTM layers

model = keras.models.Sequential()
model.add(kl.LSTM(256, input_shape=(X.shape[1], X.shape[2]), return_sequences=True))
model.add(kl.Dropout(0.2))
model.add(kl.LSTM(256, return_sequences=True))
model.add(kl.Dropout(0.2))
model.add(kl.LSTM(256))
model.add(kl.Dropout(0.2))
model.add(kl.Dense(y.shape[1], activation="softmax"))
model.compile(loss="categorical_crossentropy", optimizer="adam")

results:

  • final loss after 20 epochs (took much longer!):
  • time to train (approx): 2.5 hrs
  • real words, better sentence structure… but it’s also falling into pretty small loops.
  • Different seed text results in the first line being quite different but then quickly devolving into the same loop of the hatter and the mock turtle going back and forth.
at the way that the was to tee the court.

‘i should like the dormouse say “ said the king as he spoke. 
‘i don’t know what it moog and she beginning of the sea,’ the hatter went on, ‘i’ve sat a little sable. 
‘i don’t know what you con’t think it,’ said the mock turtle.

‘i don’t know what it moog and she beginning of the sea,’ the hatter went on, ‘i’ve sat a little sable. 
‘i don’t know what you con’t think it,’ said the mock turtle.

‘i don’t know what it moog and she beginning of the sea,’ the hatter went on, ‘i’ve sat a little sable. 
‘i don’t know what you con’t think it,’ said the mock turtle.

4: Single Layer GRU (for comparison with the single layer LSTM)

model = keras.models.Sequential()
model.add(kl.GRU(256, input_shape=(X.shape[1], X.shape[2])))
model.add(kl.Dropout(0.2))
model.add(kl.Dense(y.shape[1], activation="softmax"))
model.compile(loss="categorical_crossentropy", optimizer="adam")

results:

  • final loss (20 epochs): 1.87897 (approx the same as the 1 layer LSTM)
  • time to train: 40 minutes (faster!)

tone of the hoore of the house, and the whrt huon a little crrree the sabbit say an in oish of the was soiereing to be a gooa tu and gerd an all, and she whrt huow foonnge to teyerke to herself, ‘it was the tait wuinten thi woils oottle goos,’

‘hht ser what i cen’t tamling,’ said alice, ‘ih’ le yhit herseree to teye then.’

‘i shanl sht me wiitg,’ said alice,

‘what ie y sheu wasy loteln, and teed the porers oaser droneuse toe thet, and the tored harden wo cnd too fuor of the word the was soteig of the sintle the white rabbit, and the tored hard a little shrenge the fab foott of the was sotengi th theee th the was soe tabdit and ger fee toe pame oo her loot ou teie the was soee i rher to thi kittle gorr, and the tar so tae it the lad soe th the waited to seterker

Still gibberish, though more english words. Training for more than 20 epochs might be the key!

Here’s a graph of the loss dropoff over time

alice_loss.png

 

Advertisements