aragaer March 2, 2018 at 16:24

Getting command parameters from a human phrase

Although I managed to deal with the classification of intent, the more difficult task remained - to extract additional parameters from the phrase. I know that this is done using tags. Once I have already successfully applied sequence_tagging , but I'm not very happy that I need to keep a dictionary of vector representations of words larger than 6 gigabytes.

The attempt is zero

I found an example of implementing a tagger on Keras, and, in the best tradition of my experiments, I began to mindlessly copy pieces of code from there. In the example, the neural network processes the input string as a sequence of characters, without dividing it into words. But further down the text there is an example using the Embedding layer. And since I learned to use hashing_trick, I felt a strong desire to use this skill.

What I did was trained much more slowly than the classifier. I turned on the debug output in Keras, and looking thoughtfully at the slowly appearing lines, I drew attention to the value of Loss. It didn’t especially decrease, but it seemed to me quite large. And accuracy was small. I was too lazy to sit and wait for the result, so I remembered one of Andrew Ng's recommendations - to try my neural network on a smaller set of training data. In view of the dependence of Loss on the number of examples, one can evaluate whether good results should be expected.

Therefore, I stopped training, generated a new set of training data - 10 times smaller than the previous one - and started training again. And almost immediately he got the same Loss and the same Accuracy. It turns out that an increase in the number of case studies will not improve.

Still, I waited for the end of training (about an hour, despite the fact that the classifier studied in a few seconds) and decided to try it out. And I realized that it was necessary to copy more, because in the case of seq2seq, different models are needed for training and for real work. I got a little deeper with the code and decided to stop and think what to do next.

Before me was a choice - again to take a ready-made example, but without initiative, or to take a ready-made seq2seq, or return to the tool that already worked for me - sequence tagger on NERModel. True to without GloVe.

I decided to try all three in the reverse order.

NER model of sequence tagging

The desire to edit the existing code disappeared as soon as I looked inside. Therefore, I went on the other hand - pulling out different classes and methods from sequence tagging, take gensim.models.Word2Vec and feed it all there. After an hour of trying, I was able to make training data sets, but I couldn’t replace the dictionary. I looked at the error that came from somewhere from the depths of numpy, and abandoned this venture.

Made a commit so that just in case it wasn’t lost.

Seq2seq

The Seq2Seq documentation only describes how to cook it, but not how to use it. I had to find an example and try again to fine-tune my own. Another couple of hours of experiments and the result - the accuracy in the learning process is stably equal to 0.83. Regardless of the size of the training data. So again, I mixed up something somewhere.

Here, in the example, I didn’t really like that, firstly, manual training data is broken into pieces, and secondly, embedding is done manually. As a result, I screwed into one Keras model, first Embedding, then Seq2Seq, and prepared the data in one big piece.

it turned out beautifully

    model = Sequential()
    model.add(Embedding(256, TOKEN_REPRESENTATION_SIZE,
                        input_length=INPUT_SEQUENCE_LENGTH))
    model.add(SimpleSeq2Seq(input_dim=TOKEN_REPRESENTATION_SIZE,
                            input_length=INPUT_SEQUENCE_LENGTH,
                            hidden_dim=HIDDEN_LAYER_DIMENSION,
                            output_dim=output_dim,
                            output_length=ANSWER_MAX_TOKEN_LENGTH,
                            depth=1))
    model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

But beauty did not save - the behavior of the network has not changed.

Another commit , move on to the third option.

Seq2seq manually

At first, I honestly copied everything and tried to run it as it is. The input is simply a sequence of characters of the original phrase, the output should be a sequence of characters that can be spliced by spaces and get a list of tags. The accuracy seemed to be good. Because the neural network quickly learned that if it began to spell a tag, then it would write it to the end without errors. But the tags themselves didn’t at all match the desired result.

We make a small change - the result should not be a sequence of characters, but a sequence of tags typed from the final list. The accuracy immediately fell - because now it has become honestly clear that the network is not coping.

Nevertheless, I completed the training of the network to the end and looked at what exactly it gives out. Because a stable result of 20% probably means something. As it turned out, the network found a way not to strain too much:

please, remind me tomorrow to buy stuff
O

That is, it pretends that there is only one word in the phrase that does not contain any data (in the sense of those that the classifier has not yet eaten). We look at the training data ... indeed, about 20% of phrases are just that - yes, no, part ping (that is, all sorts of hello) and part acknowledge (all sorts of thanks).

We begin to put the network sticks in the wheels. I cut the number of yes / no by 4 times, ping / acknowledge by 2 times and add any more “garbage” in one word, but containing data. At this stage, I decided that I did not need to have an explicit binding to the class in the tags, so for example it B-makiuchi-countturned into simple B-count. And the new “garbage” was just numbers with a class B-count, “time” in the form of “4:30” with the expected tag B-time, an indication of a date such as “now”, “today” and “tomorrow” with the tag B-when.

It still doesn’t work. The network no longer gives a definite answer “O and that's it”, but at the same time accuracy remains at the level of 18%, and the answers are completely inadequate.

not yet
expected ['O', 'O']
actual ['O', 'O', 'B-what']
what is the weather outside?
expected ['O', 'O', 'O', 'O', 'O']
actual ['O', 'O', 'B-what']

So far, a dead end.

Interlude - Understanding

Lack of result is also a result. I got a superficial, but understanding of what exactly happens when I design models in Keras. I learned how to save, load and even finish them as needed. But at the same time, I did not achieve what I wanted - the translation of “human” speech into “bot language”. I had no more leads.

And then I started writing an article. Previous article. In its original version, everything ended here - I have a classifier, but no tagger. After some thought, I abandoned this venture and left only about a more or less successful classifier and mentioned problems with the tagger.

The calculation was justified - I got a link to Rasa NLU . At first glance, it looked like something very suitable.

Rasa nlu

For several days I did not return to my experiments. Then he sat down and for an hour with a little screwed Rasa NLU to his experimental scripts. This is not to say that it was very difficult.

the code

make_sample

tag_var_re = re.compile(r'data-([a-z-]+)\((.*?)\)|(\S+)')
def make_sample(rs, cls, *args, **kwargs):
    tokens = [cls] + list(args)
    for k, v in kwargs.items():
        tokens.append(k)
        tokens.append(v)
    result = rs.reply('', ' '.join(map(str, tokens))).strip()
    if result == '[ERR: No Reply Matched]':
        raise Exception("failed to generate string for {}".format(tokens))
    cmd, en, rasa_entities = cls, [], []
    for tag, value, just_word in tag_var_re.findall(result):
        if just_word:
            en.append(just_word)
        else:
            _, tag = tag.split('-', maxsplit=1)
            words = value.split()
            start = len(' '.join(en))
            if en:
                start += 1
            en.extend(words)
            end = len(' '.join(en))
            rasa_entities.append({"start": start, "end": end,
                                  "value": value, "entity": tag})
            assert ' '.join(en)[start:end] == value
    return cmd, en, rasa_entities

After this, saving the training data is not difficult at all:

    rasa_examples = []
    for e, p, r in zip(en, pa, rasa):
        sample = {"text": ' '.join(e), "intent": p}
        if r:
            sample["entities"] = r
        rasa_examples.append(sample)
    with open(os.path.join(data_dir, "rasa_train.js"), "w") as rf:
        json.dump({"rasa_nlu_data": {"common_examples": rasa_examples,
                                     "regex_features": [],
                                     "entity_synonims": []}},
                  rf)

The most difficult part in creating a model is the correct config

    training_data = load_data(os.path.join(data_dir, "rasa_train.js"))
    config = RasaNLUConfig()
    config.pipeline = registry.registered_pipeline_templates["spacy_sklearn"]
    config.max_training_processes = 4
    trainer = Trainer(config)
    trainer.train(training_data)    
    model_dir = trainer.persist(os.path.join(data_dir, "rasa"))

And the hardest thing to use is finding her

    config = RasaNLUConfig()
    config.pipeline = registry.registered_pipeline_templates["spacy_sklearn"]
    config.max_training_processes = 4
    model_dir = glob.glob(data_dir+"/rasa/default/model_*")[0]
    interpreter = Interpreter.load(model_dir, config)

    parsed = interpreter.parse(line)
    result = [parsed['intent_ranking'][0]['name']]
    for entity in parsed['entities']:
        result.append(entity['entity']+':')
        result.append('"'+entity['value']+'"')
    print(' '.join(result))

please, find me some pictures of japanese warriors

find what: "japanese warriors"

remind me to have a breakfast now, sweetie

remind action: "have a breakfast" when: "now" what: "sweetie"

... although there is still work to do.

Of the shortcomings - the learning process is completely silent. Surely it turns on somewhere. However, all the training took about three minutes. Still, spacy still requires a model for the source language. But it weighs significantly less than GloVe - for English it is less than 300 megabytes. True, there is no model for the Russian language yet - and the ultimate goal of my experiments should work specifically with Russian. One will have to look at the other pipeline available in Rasa.

All code is available in github .

Tags: