Analysis of restaurant visitor reviews with Meanotek NeuText API

    The analysis of restaurant user reviews was part of the SentiRuEval-2015 testing task , which took place as part of the Dialogue 2015 conference. In this article, we’ll talk about what such analyzers actually do, why this is necessary in practice, and how to create such a tool with your own hands using the Meanotek NeuText API

    Analysis of feedback on aspects is often divided into several stages. Consider, for example, the sentence "Japanese dishes were delicious, but the waiter worked slowly." At the first stage, we distinguish from it words or phrases that are important to us. In this case, it is “Japanese dishes”, “tasty”, “waiter”, “slow”. This allows you to understand what is meant in the proposal. Further, we may want to group terms - for example, refer to “dishes” and “tasty” as food, and “waiter” as service. Such a grouping will allow the generation of aggregated statistics. Finally, we may want to appreciate the tonality of the terms; something positive or negative is said about them.


    Why:It is not so easy to answer this question now. Under the assumption of the test organizers, the task is ultimately to assess the tonality of the review as a whole, i.e., to say that the author of this review considers the service good and the interior poor. But this problem is being solved today by other methods - on most otzovik sites it is provided that when a user leaves a review, he also manually fills in assessments on aspects. The availability of such information dramatically reduces the value of such a synthetic analysis. Although you can, of course, highlight additional aspects, analyze posts from forums, but all this is of secondary importance to the user.

    But let's say we need to solve the inverse problem. For example, you are the owner of a restaurant, and you are wondering why a poor rating is indicated in the “interior” section. And the connection with the source text on sites with manual evaluation has been lost for the most part. You have to look at all the reviews with a negative rating entirely to find the necessary information. And these reviews are, quite frankly, quite long and containing a lot of "water", like "yesterday my friend had a birthday. We got together and thought for a long time where to go. Usually we ... "and so on. Having highlighted important aspect terms, you can highlight them in the text, or show only sentences containing them, or even count them and display the following summary:

    Loud music - 34
    Smoked - 8
    Air conditioning - 4
    Dirty in the toilet - 2

    Reducing the labor costs for analysis, we will increase the efficiency of the business, plus the restaurant will be able to respond to various problems and requests of visitors. Although of course, not all restaurant owners are subject to such careful monitoring of reviews, but this is another matter.

    Implementation : To implement using the Meanotek NeuText API, you will need a free API key, if you do not already have one, you can get it here . Like last time, we need a training sample. In the training data created by the developers of SentiRuEval-2015, explicit terms (dishes, food, waiter, restaurant, table, etc.) are distinguished from implicit terms (tasty, loud, salted). You can use ready-made markup, or come up with your own notation.

    The original SentiRuEval 2015 samples are publicly available as XML files. There is no pre-processing in this data (breakdown into sentences, words, etc.). Therefore, we prepared the data in the desired format of use with our API ( download ). Our sample presents only explicit aspect terms (explicit) and it is also divided into two files: rest_expl_train.txt - data for training the model and rest_expl_test.txt - data for checking the results (both files are made from the original training sample SentiRuEval-2015).

    japaneseexplicit
    dishesexplicit
    were
    tastyimplicit
    ,
    waiterexplicit
    ...


    There is really one subtlety here - if a phrase like “waiter brought fish dishes” is found, then “waiter fish dishes” will be highlighted as one term, although in fact there are two of them - “waiter” and “fish dishes”. Therefore, separate notation is often used for the first and last words of a term - so that you can later separate it. But this is not always justified, because in the absence of a sufficient number of examples of such merging terms, the model may still not learn how to correctly begin and end, and an increase in the number of classes will require an increase in the training sample in order to obtain an adequate quality of the model. Therefore, it makes sense to try both options and compare the quality of the results, but if there is no time, the first option is quite suitable.

    You can create a model in the same way.previous example with extracting product names. To simplify working with the API, you can use the library for the .NET Framework
    Model MyModel  = new Model("Ваш api ключ","RestExplModel"); 
    MyModel.CreateModel(); 
    Console.WriteLine("Загружаем обучающие данные");
    MyModel.UploadTrainData("rest_expl_train.txt");
    Console.WriteLine("Загружаем проверочные данные");
    MyModel.UploadTestData  ("rest_expl_dev.txt");
    Console.WriteLine("Обучение модели");
    MyModel.TrainModel();
    


    also included is an example executable file with which you can download arbitrary files and check the results without writing code.

    After training the model, we request statistics on a test sample, as well as analysis for a new example:
    Console.WriteLine(MyModel.GetValidationResults());  
    string p = model.GetPredictionsJson("В ресторане оказалась приятная обстановка, был хороший официант, подавали вкусные блюда из рыбы");
    Console.WriteLine(p);
    


    Here's what happened:


    Especially for this article, I also made an online demo in php , in the form of a form in which you can enter data and get highlighted terms.

    You can read more about the API for extracting information from text in a previous post , and the technical details of the work are in our article published in the collection "Computer Linguistics and Intelligent Technologies" (text in English).

    Also popular now: