The above quote is good enough to explain the power of artificial intelligence in the upcoming future, but trust me it seems like magic if you don’t indulge yourself in research in this field. Math is everything and yes it is applied in everything in a field of artificial intelligence, data science or machine learning. Here you go I am going to throw some light on neural machine translation, speech recognition, text summarization, Image captioning, etc.


This article generally requires knowledge of the recurrent neural network and have some basic knowledge about LSTM and GRU. because this article is going to frequently use these words in order to explain.

Uses of Sequence to sequence model

Nowadays the popularity of sq2seq model is increasing exponentially in the market the famous company like Google, Amazon is using this model in their voice interface applications and machine translation. Chatbots are frequently backed up with this model.

Top major use cases of the sequence models are:

Speech recognition: speech recognition and you talk to your watch you’re assisting like Google.

Machine translation: taking reference of google translate

Music Generation: the seq2seq is very useful for modeling the music generation

Sentiment classification: reviewing the movie an also be automated using seq2seq model.

Chatbot applications: Deep NLP is usually based on seq2seq model in order to make response dynamic.

Text summarization: story or an essay a model it has to summarize everything wherein it retains the useful information.

Image captioning: we can automate the image captioning via using seq2seq model.


In Sequence to Sequence Learning, RNN is trained to map an input sequence to an output sequence which is not necessarily of the same length.

In most of the cases, generally, the output sequence and input sequence are of different length. and the inorder every input sequence have to predict the target output. Hence their requirement of advance setup, the seq2seq model uses this approach.

  • An RNN layer (or stack thereof) acts as “encoder”: it processes the input sequence and returns its own internal state.
  • Another RNN layer acts as “decoder”: it is trained to predict the next characters of the target sequence, given previous characters of the target sequence.

Why sequence to sequence model and how it is different from standard networks

For more clarification let’s take the example

Problem with standard networks:

  • The major problem with given above example is Inputs, the output can be a different length and it is difficult for a model to render
  • Also, the standard neural network doesn’t share features learned across different position text.

The features of the recurrent neural network

The apart from the basic standard network the key features that put the recurrent neural network stand apart are:

  • The recurrent neural network scans through the data from left to right and uses each previous time steps to predict the next time steps.(that is sequencing)
  • Notation in above fig a<0> activation function with layer 0, x<1> first input to first layer , y<1> output from first layer. The above figure used to show many to many models of seq2seq of machine translation.

The LSTM’s are nowadays frequent used model of seq2seq models and it is used for deep NLP also and show best results for dynamic responses because of an addition gate known as forgetting gate is used to create this model.

In the above figure, we can see the three gates used in it, forget gate, update gate and output gate it improves its quality than GRU (gated recurrent unit having 2 gates only).

Why RNN and LSTM is taking over and LSTM is ahead of RNN

Remember RNN and LSTM and derivatives use mainly sequential processing over time. See the horizontal arrow in the diagram below:

Here arrow means that information which is long term has to sequentially travel through all cells ahead and before getting to the present processing cell and due to this it takes lots of time. This means it can be corrupted easily by being multiplied much time by small numbers is less than zero. This is the basic cause of vanishing gradients and this problem may cause underfitting also.

To help this problem, here come the LSTM model, which can be seen as multiple switch gates, and a bit like “ResNet” it can bypass units and thus remember for longer time steps. LSTM thus have a way to remove some of the vanishing gradients problems.

And the greatest issue of RNN is that they are not good om hardware. explanation: it acquires a lot of resources we do not have to train these network fast. Also, it takes many resources to run these model in the cloud, and given that the demand for speech-to-text is growing rapidly, the cloud is not scalable these days hence the LSTM is taking over RNN.


Follow this link for an overview for how to run a machine translation model

Neural machine translation from scratch


The content of this article, we went through a brief discussion of sequence to sequence model and throw some light on machine translation commonly used in google translate.

The trend of using seq2seq model is increasing day by day and these also help to increase the popularity of ANI(artificial narrow intelligence).

In the above article, it is explained the intuition behind seq2seq model and we can apply this model for our purpose in the future.

Source: Artificial Intelligence on Medium