Sequence-to-Sequence Part 1 models

Original author: Mark Daoust
  • Transfer
Good day everyone!

And we have again opened a new stream for the refined “Data Scientist” course : another excellent teacher , a little refined, based on the updates of the program. Well, as usual interesting open lessons and collections of interesting materials. Today we will begin the analysis of seq2seq models from Tensor Flow.

Go.

As already discussed in the RNN tutorial (we recommend reading it before reading this article), recurrent neural networks can be taught to simulate a language. And an interesting question arises: is it possible to train the network on certain data to generate a meaningful answer? For example, can we teach a neural network to translate from English to French? It turns out that we can.

This guide will show you how to create and train such an end-to-end system. Copy the main Tensor Flow repository and the TensorFlow model repository from GitHub . Then, you can start by running the translation program:

cd models/tutorials/rnn/translate
python translate.py --data_dir [your_data_directory]



She will download the data for translation from English to French from the WMT'15 website , prepare it for training and train. This will require about 20 GB of hard disk space and quite a lot of time for downloading and preparation, so you can start the process now and continue reading this tutorial.

The manual will refer to the following files:

FileWhat is in it?
tensorflow / tensorflow / python / ops / seq2seq.pyLibrary for creating sequence-to-sequence models
models / tutorials / rnn / translate / seq2seq_model.pySequence-to-sequence model of neural translation
models / tutorials / rnn / translate / data_utils.pyAuxiliary functions for preparing translation data
models / tutorials / rnn / translate / translate.pyA binary that trains and runs a translation model.

Basics of the sequence-to-sequence

The basic sequence-to-sequence model, as presented by Cho et al., 2014 ( pdf ), consists of two recurrent neural networks (RNN): an encoder, which processes input data, and a decoder ( decoder) which generates output data. The basic architecture is shown below:



Each rectangle in the picture above is a cell in RNN, usually a GRU cell - a managed recurrent block, or an LSTM cell - a long short-term memory (read the RNN tutorial for more details). Encoders and decoders can have common weights or, more often, use different sets of parameters. Multi-layered cells have been used successfully in sequence-to-sequence models, for example, to translateSutskever et al., 2014 ( pdf ).

In the base model described above, each input must be encoded in a state of a fixed-size state, since this is the only thing that is transmitted to the decoder. To give the decoder more direct access to input data, a attention mechanism was introduced in Bahdanau et al., 2014 ( pdf ). We will not go into the details of the mechanism of attention (for this you can get acquainted with the work of the link); suffice it to say that it allows the decoder to look into the input data at each decoding step. The multilayered sequence-to-sequence network with LSTM cells and the attentional mechanism in the decoder looks like this:



TensorFlow library seq2seq

As you can see above, there are different sequence-to-sequence models. All of them can use different RNN cells, but all of them accept encoder input data and decoder input data. This is the basis of the TensorFlow seq2seq library interface (tensorflow / tensorflow / python / ops / seq2seq.py). This basic, RNN, codec, sequence-to-sequence model works as follows.

outputs, states = basic_rnn_seq2seq(encoder_inputs, decoder_inputs, cell)

In the call indicated above, encoder_inputsis a list of tensors representing the encoder input data, corresponding to the letters A, B, C from the picture above. Similarly, the decoder_inputs tensors representing the input data decoder. GO, W, X, Y, Z from the first picture.

The argument cell is an instance of a class tf.contrib.rnn.RNNCellthat determines which cell will be used in the model. You can use existing cells, for example, GRUCell or LSTMCell, and you can write your own. In addition, it tf.contrib.rnnprovides shells to create multi-layered cells, add exceptions to cell input and output data, or other transformations. Read the RNN Tutorial for examples.

The call basic_rnn_seq2seqreturns two arguments: outputs andstates. They both represent a list of tensors of the same length as decoder_inputs. outputs corresponds to the output of the decoder at each time step, in the first picture it is W, X, Y, Z, EOS. The returned states represents the internal state of the decoder at each time step.

In many applications using the model's sequence-to-sequence, the decoder output at time t is transmitted back to input to the decoder at time t + 1. When testing, during the decoding sequence, this is how a new one is constructed. On the other hand, during training, it is customary to transmit to the decoder correct input data at each time step, even if the decoder was previously mistaken. Functions in seq2seq.pysupport both modes with an argument feed_previous. For example, let's analyze the following usage of the nested RNN model.

outputs, states = embedding_rnn_seq2seq(
    encoder_inputs, decoder_inputs, cell,
    num_encoder_symbols, num_decoder_symbols,
    embedding_size, output_projection=None,
    feed_previous=False)

In the model, embedding_rnn_seq2seqall input data (both encoder_inputs, and decoder_inputs) are integer tensors reflecting discrete values. They will be enclosed in a dense representation (for details on the attachment refer to the Guide to Vector Representations ), but to create these attachments you need to specify the maximum number of discrete characters: num_encoder_symbolson the side of the encoder and num_decoder_symbolson the side of the decoder.

In the call above, we set the feed_previousvalue to False. This means that the decoder will use tensors decoder_inputsas they are provided. If we set the feed_previousvalue to True, the decoder will use only the first element.decoder_inputs. All other tensors from the list will be ignored, and the previous value of the decoder output will be used instead. This is used to decode translations in our translation model, but it can also be used during training, to improve the resilience of the model to its errors. Approximately like Bengio et al., 2015 ( pdf ).

Another important argument used above is - output_projection. Without clarification, the conclusions of the nested model will be the form tensors of the training samples on num_decoder_symbols, since they represent the logits of each generated symbol. When training models with large output dictionaries, for example with largenum_decoder_symbols, storing these large tensors becomes impractical. Instead, it is better to return smaller tensors, which will subsequently be projected onto a large tensor with output_projection. This allows us to use our seq2seq models with sampled softmax losses, as described by Jean et. al., 2014 ( pdf ).

In addition to basic_rnn_seq2seqand embedding_rnn_seq2seqin seq2seq.pythere are several more sequence-to-sequence models. Pay attention to them. All of them have a similar interface, so we will not go into their details. For our translation model below we use embedding_attention_seq2seq.

Continuation will follow.

Also popular now: