Blog: Writing Fairy Tales Using AI
Something that not too many people know, is that quite a few of Disney’s stories, along with fairy tales like “The Three Little Pigs” and “Hansel and Gretel” were written by the Grimm Brothers.
Unfortunately, they wrote these stories a couple of centuries ago, so there’s no way for them to make more stories 😞.
Now imagine there was a way to create more stories from famous writers. I’m not just talking about the Grimm brothers, but also other adored writers like Shakespeare and Charles Dickens.
Unfortunately, it’s practically impossible for any person to read all of a writer’s work and completely understand their style. Especially when you have someone like Shakespeare with roughly 200 pieces of work.
But now, we have the power of artificial intelligence. So I wanted to get my computer to write good stories. Because honestly … the last time I tried writing fiction was in grade 7, and I got a C- on the assignment ¯\_(ツ)_/¯.
Computers are wayyyyy faster than people so they can read all of a writer’s work in seconds!
So now we can just train a neural network to learn the styles of writers like the Grimm brothers and have it write endless stories.
Using LSTMs Networks
There are so many patterns in writing that an AI can pick up on. Algorithms have found patterns in emotional beat, uses of nouns and punctuation etc. in various popular books. There are also some more obvious patterns like Shakespeare’s use of iambic pentameter.
So now the question is: what type of neural networks should be used to make a story writer?
In case you didn’t know, there are several types of neural networks
For the challenge of writing stories, I chose to use a type of Recurrent Neural Network (RNN) called Long/Short Term Memory (LSTM).
This type of network is perfect for writing stories because it’s a master at understanding context — and context super important in writing stories.
Knowing the context of a sentence helps choose which parts of speech to use (verbs, nouns, adjectives, etc. )
For example, we wouldn’t want to have 3 verbs in a row. That doesn’t fit proper English syntax.
Using an RNN network structure means that any words the AI generates will be fed back into itself as input to write the next word. This helps the AI learn syntax rules like only using verbs after adverbs.
Regular RNNs gradually forget information as it gets older. So it will remember what it wrote a couple words ago, but it’ll barely remember what it wrote at the beginning of the story.
This is a big problem for a story writer because it prevents RNNs from creating a plot. How can it be expected to build on a plot when it doesn’t even remember it?
This is where the “long” part of LSTMs come in. By using memory cells, LSTMs can understand the context of a sentence as well as the whole story.
These cells are special because they choose what to remember and what to forget. They don’t gradually forget things with time like RNNs.
Each LSTM layer is made up of multiple mini neural networks that are trained to optimally use memory to make accurate predictions.
The ignore gate just ignores any irrelevant information the AI gets so it doesn’t mess up predictions.
The memory cell collects the possible outputs the network can come up with and stores the relevant ones for later use.
The forget gate decides which long-term memories are irrelevant to the decision-making process and gets rid of them. For example, the AI doesn’t need to remember that it used a comma in the first sentence, only important plot points.
The selection gate uses the memories to select a final output from all the possibilities that the network comes up with. Let’s pretend that the previous words in the sentence are “he went to the”. The memory cells will have used context to determine that it needs to output the name of a place to complete the sentence. The selection gate uses its experience learning from the Grimm book to determine which word is the best fit.
Because of LSTMs, my AI program called Grimm Writer can remember every part of the story that it writes so that it can create a plot accordingly.
Teaching Grimm Writer to Write
To teach Grimm Writer how to write stories, first I found a book filled with stories by the Grimm Brothers
I picked up a text file of the book online and manually deleted all of the publisher information so that only the actual stories were left.
Next, I had a python iterate through the book, and separate each word and punctuation mark, and put them in an array. So if there were 200 words and 56 punctuation marks in the book, there would be 256 values in the array.
Since the AI will be trained using supervised learning, it needs to be trained on inputs and corresponding outputs.
The input it will be trained on is a sequence of words from the book, and the corresponding/desired output will be the next word that comes in the sequence.
This technique introduces sequence length (the number of words given as input to predict the next word) as a new hyper-parameter that can be tweaked to affect the performance.
In the early prototypes of Grimm Writer, I tried a sequence length of 5, but after training it for a long time and consulting with various sources I realized that 5 words didn’t seem to be enough context for Grimm Writer.
The latest version of Grimm Writer uses a sequence length of 25. This means that its uses the last 25 word in the story as input to generate the next one.
Since Grimm Writer outputs word choices, the number of output nodes it has will be equal to number of words in its vocabulary. This is really computationally expensive to train.
To speed up training, I removed all words that appeared 3 or less times in the book. This meant that the AI wouldn’t have to waste time training to learn a word that rarely appears and could easily be substituted with another.
The entire book was split into sequences as input (X) and the next word (Y). Any sequences in X that used words that were deemed infrequent were removed from the training data.
I then used a function to one hot encode all the information.
If you don’t know anything about AI, all that probably confused you, so here’s an example to clear everything up.
Let’s say we have the following sentence:
The quick brown fox jumped over the lazy dogs.
The program turns that into this:
Input: “The”, “quick”, “brown”, “fox”, “jumped” Output: “over”
Input: “quick”, “brown”, “fox”, “jumped”, “over” Output: “the”
Input: brown fox jumped over the Output: ‘lazy’
Input: fox jumped over the lazy Output: ‘dogs’
Input: jumped over the lazy dogs Output: ‘.’
And here’s an example of what one hot-encoding does to the words:
Now all the data is ready to train the Grimm Writer.
For the architecture of the neural network I used a bi-directional LSTM. This just means that the AI learns the data in reverse.
It also uses future context in order to make predictions. Now it will learn to predict what comes before a certain word in addition to what comes after one.
This gives Grimm Writer an even better understanding of context.
After running through a bi-directional LSTM, the AI feeds the data through a softmax function to give a probability for each word.
It then adds the word the story and uses it to predict the next word.
Here’s a sample story that as written by Grimm Writer:
In old times, when wishing was having, there lived a king whose daughters were all beautiful, but the youngest was so beautiful that they did not one of which the time was out of the time , but it was more with them to go on the forest by the forest. And one must be up in the old woman and could one with the old woman.
He took her in his great mother of him, and the mother took her with his; and when the king was a golden this time to him his beautiful from the maiden. his wife was more with a great golden asked, but “you with a” asked he which you had more has a beautiful get home.
As then as one from the golden more. then they thought, we “shall be more from asked, now “shall I are the evening, when we are do up, give you the maiden to the golden other again, and take a other.
We shall have been a golden house. I will get no more must be away. the mother has golden.
If I did not come. I am great home, but they were all; and down home.
When the king’s daughter was great man asked, he “will you however!
Do we for me: Do it with you, however, who has good has here! here has me!
Me have I must be back again, golden has no give me get shall of this more, but this has a wife, and has great wife where from the away with which. they has thought, there, let me let me have I give”
We will be back and well, and he do up!
You have them! but at last is time down with the! we then more”
The king asked the back, who “do you to be! but the wife
Not give her it more has come back, and has golden one would take her with great now she great the golden woman, however, could take get a beautiful. but then one more from her by the forest, and queen, there is evening the golden house.
However, she was no one of the other, he and thought to last, this “shall I get away, and has come back to my come, away.
You will come to well, other must be; and we must take them home then you. If I will do you back we must be! asked him came with you.
But you have come back, and has golden other on be with a great house, and the maiden daughter to the mother we down, the golden son then we.
The story doesn’t make complete sense, but this acts as a proof of concept.
When analyzed by Microsoft Word, there are only 6 grammar errors. This means that Grimm Writer knows how to right sentences, but it just needs to learn to write better plots.
Grimm Writer was trained for 200 epochs. Adjusting the sequence length and training the sequence for more epochs could help the model. Also using batch-training would allow for faster training and might help the model converge faster.
While there are other amazing story writers out like Shelly, they require human collaboration and don’t focus on replicating an author’s style.
Once Grimm writer’s algorithm is perfected, we will be able to give immortality to the talent of famous authors!