RubikS August 31, 2016 at 18:14

iKnow Review Analyzer (iKRA)

Intro

Using InterSystems iKnow technology, we created a review rating system called iKnow Reviews Analyzer (iKRA). You can read about the prototype of the project here . iKRA analyzes textual user reviews, automatically numerically evaluating the subject of study. These functions can be very useful, for example, on online sales sites, thematic forums or collections of media content. In other words, wherever the community is discussing things.

What makes a decision?

iKnow Reviews Analyzer performs analysis of any subject area, whether it is the sphere of online sales of household appliances or booking travel hotels in hot countries. To get the results you need to go through the following key steps:

collect feedback in the subject area of interest to us;
create dictionaries - a database of words for calculation;
create an area for loading and analyzing data;
run the model for calculation;
drink coffee / wait;
see the results.

Usage example

Now how does it look in practice ... As an example, we will analyze reviews on smartphones. We will select five manufacturers:

Apple
HTC
LG
Samsung
Sony

Suppose that each of us is interested in two models of smartphones. For each selected model, we’ll upload 50 reviews - 500 in total. We’ll take comments from Yandex.Market: We will place

each of the reviews in a separate file and for convenience use the following file layout (Figure 1):

Figure 1. File hierarchy

in parentheses Assessment of the smartphone that the user put when writing a review. It is recorded in metadata and used later to optimize the calculation algorithm. Original reviews are here .

For analysis, you need to create an iKnow domain - a repository of unstructured data. We will not dwell on this in detail, since this issue is described in detail here .

When the domain is created and filled with reviews, we begin to analyze its contents. When choosing a smartphone, the following parameters are critical for me:

performance;
connection quality;
comfort / convenience.

For simplicity of further narration, I will introduce the following concepts:

category - parameter to be evaluated;
functional (f) marker - a term that characterizes the estimated parameter / category;
functional dictionary - many f-Markers;
emotional (e) marker - a word reflecting the author’s attitude to the described subject;
emotional vocabulary - many e-Markers.

Based on the selected characteristics, we compile a functional dictionary, where for each of the indicated categories we select f-Markers - defining words. For example, for the category “performance” it can be: “speed”, “processor”, “memory”, “speed”, “core” and so on. All f-Markers are recorded in a special file . Figure 2 shows an example of the "Performance" / "Performance" category:

Figure 2. f-Markers

Next, we compose an emotional dictionary, filling it with appropriate e-Markers. The entire list can not be given here, but for clarity, I will list some of them: “good”, “convenient”, “liked”, “problematic”, “lack”. e-Markers give a positive or negative color to the sentence in the text. Each e-Marker will have a numerical rating. For simplicity, we use +1 for positive, -1 for negative. All e-Markers are also recorded in a special file . Figure 3 shows an example of e-Markers:

Figure 3. e-Markers

Once the dictionaries are ready, you can calculate grades. To do this, on the “Domains” tab, select the desired one and click “Perform calculation” (Figure 4):

Figure 4. Calculation of estimates

To see the result, open the table of the ikra.Dictionary.MarksUnit class - which contains the ratings for each smartphone model or refer to the ikra.Dictionary.MarksReview class - which contains the ratings for each individual review. Information is displayed in the management portal. Select the SQL section and browse the table of interest to us. Figure 5 shows an example of viewing the table of the ikra.Dictionary.MarksUnit class.

Figure 5. Viewing the ikra.Dictionary.MarksUnit table

Using DeepSee, let's see what happened. We created a cube that uses the results of calculating grades by category and built a graph for each apparatus under study (Figure 6):

Figure 6. Graph of grades by category

Among the analyzed data, the first places were distributed as follows:

performance - HTC ONE;
call quality - HTC ONE;
comfort / convenience - Samsung Galaxy S5 SM-G900F.

But what if you need to add another category?

Previously, to evaluate each individual category, it was necessary to register the corresponding class property manually. This created an inconvenience when, when analyzing new subject areas, categories and their number changed, it became necessary to edit the code with each such change, which is obviously the most fun and productive use of time. To get out of this situation, we examined two solutions:

Reservation of a large number of class properties;
Using a database.

The first option allows you to forget about the ever-changing number of categories, without bothering with the structure of the database. But storing such a volume of properties is inconvenient, and besides, no one guarantees that there will not be a greater number of evaluated parameters. We abandoned this path.

The second option solves the problem with an indefinite number of categories and does not require a fixed amount of memory for storing each instance of the class. Based on the database, the system easily adapts to the calculation of any subject area with any number of categories.

Given the advantages of the second option, it is he who is implemented in the iKRA system.

Adding a new category

“And then I realized that I needed to evaluate another parameter of the smartphone - the camera! (Oh, if you catch Pokemon, then with pleasure) ”

Adding a new category is not difficult, for this we change the composition of the functional dictionary and enter a new name - Camera (Figure 7).

Figure 7. Adding the Camera category

We define the category by adding f-Markers on the corresponding tab (Figure 2).
On the domains tab, select the desired one and start the calculation (Figure 4).
We are waiting for the end and move on to viewing (Figure 8):

Figure 8. Updated schedule of ratings for the categories of

Hurray! We easily introduced a new category and rated it. Now the picture is as follows:

Camera - iPhone 4S
performance - HTC ONE;
call quality - HTC ONE;
comfort / convenience - Samsung Galaxy S5 SM-G900F.

To be continued

Now we quickly and without costs for rewriting the code get estimates of any product categories according to their reviews. Now for this you need to configure the dictionary and run the calculation. The challenge is still the process of loading reviews into the database, but we will talk about solving this issue in the next article.

» GitHub

Tags: