kalinin84 November 25, 2014 at 14:07

A few thoughts on analyzing other people's opinions

Ideally, any site in the search results (SERP) is a response to a certain need of the target audience, for which the information search process itself takes place. The Internet made it possible to quickly and easily search for information, as well as very simply analyze the opinions of different people who blog, write books (electronic versions), leave comments on social networks or on forums, and vote in polls.

In today's world, a huge number of people use search engines to solve a variety of problems, including the problems of personal life. This is quite logical, because taking into account the experience of other people and the knowledge of professionals will help not only to avoid common mistakes, but also significantly increase the likelihood of the right choice of tactics and strategy. Unfortunately, this process is difficult to automate, however, this should not mean at all that we cannot partially automate the process of collecting and processing data. This note is designed for beginner level. It will help a person who wants to increase the efficiency of information analysis to partially automate this process.

Assessment of people's opinions

First of all, let me give you an example. Imagine that a young man torments a search engine with the following key phrases: “How to get to know a girl?”, “How to become confident?”, “What to talk about with a girl?” And so on. In fact, he does not have sufficient knowledge of applied psychology to critically evaluate his beliefs. Instead of understanding the cause of his problem and gaining the necessary experience (the absence of which causes fears and insecurity), such a person is looking for ways to hide the symptoms. Any confidence is based on the achievement of results in significant areas and on the recognition of these results by other people. A figurative example: a novice car driver does not need to go to confidence-building training, but learn to drive (by studying theory and practice), then he will have more confidence when driving. And what prevents this collective image from establishing normal relations with a girl? Incorrect beliefs (he thinks he has correctly identified the cause of his problem) and the inability to critically evaluate them. Perhaps he is looking in the search engine for confirmation of his opinion, without seeing real problems.

It is important to consider opinion as a set of several beliefs. The most interesting thing is that a person can criticize the point of view of famous scientists (which is verified by a lot of experiments and described in detail in books), but can easily believe in the most naive personal beliefs (which my grandmother told about in her childhood). The problem is that a person evaluates information by comparing with his standards (often uncritically accepted). Therefore, in order to evaluate your opinion, you need to learn and consciously consider the opinion of authoritative sources. If an opinion is formed consciously, i.e. if a person receives higher education, good work experience, reads a huge number of quality books and communicates with other professionals, then he maximizes the likelihood of correctness of his beliefs on the relevant topic. But you can’t be a specialist in all areas,

Where to begin? It will not be about writing the queries correctly in the search engine, but about analyzing the results. If this is a significant issue for you, then be sure to try to write down the most important thoughts. To do this, we will try to create an electronic summary with notes in the microblog format, which will be located on the local computer, as it is intended exclusively for personal use. Naturally, there can be many ways to implement such an auxiliary tool, but we will consider only a few of the possible options. We can try the Laravel framework, which allows us to solve our problem very simply and quickly.

The model will be like this:

class Report extends Eloquent {
	protected $table = 'report';
}

And the controller is no more complicated:

class ReportController extends BaseController {
	public function index() {
		return View::make('report', ['report' => Report::paginate(5)]);
	}
}

Yes, Laravel makes pagination so easy. And here is the view:

@extends('main')
@section('content')
@forelse($report as $v)
	{{ $v->author }}
{{ $v->opinion }}
{{ $v->url }}
@empty
	Нет данных
@endforelse
{{ $report->links() }}
{{ dd(DB::getQueryLog()) }}
@stop

What do you think, the proposed table structure (columns: id, name of the author, his opinion, link to the source, date) is quite convenient for statistics analysis? It seems to me that such a format is very inconvenient for working with statistics, where a table with numbers is desirable. Let's try to develop the structure of a table suitable for this task. The studied opinion can have a different degree of reliability (authority of the source), rating (the author shares or refutes the studied opinion, you can give a rating from 0 to 10), type of source (message on the forum, blog post, comment on the social network, official document). In addition, it will be useful to save a link to the source in the database, as well as a short comment (as brief as possible). It would be possible to write a copy of the text, but this is at your discretion. Since opinions on different objects can be recorded in one table, I will add the identifier of the object (of which the opinion is collected). Thus, you will have a table with data on which it is more convenient to make a variety of reports.

The structure of our table may be as follows:

CREATE TABLE IF NOT EXISTS `opinions` (
	`id` int(11) NOT NULL AUTO_INCREMENT,
	`url` varchar(255) NOT NULL,
	`description` varchar(255) NOT NULL,
	`type` int(11) NOT NULL,
	`credibility` int(11) NOT NULL, 
	`object` int(11) NOT NULL,
	`rate` double NOT NULL,
	`date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
	PRIMARY KEY (`id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 AUTO_INCREMENT=1;

Suppose we are interested in knowing the estimates of a fact about which we could collect opinions (since the data type is double, then you can do all kinds of gradations). But you need to know not the assessment of the opinions of all people, but only with a certain level of authority (an approximate assessment of the level of reliability of the source). Such a table structure will allow us to execute the following query:

SELECT
	COUNT(`rate`) AS 'count',
	MIN(`rate`) AS 'min', 
	MAX(`rate`) AS 'max',
	SUM(`rate`) AS 'sum', 
	AVG(`rate`) AS 'avg',
	STDDEV_POP(`rate`) AS 'stddev_pop',
	VARIANCE(`rate`) AS 'variance'
FROM `opinions` 
WHERE 
	`object` = 1 AND `credibility` IN (2,3);

I decided to remove the source code examples for processing statistics and implementing CRUD from this note, otherwise it would be similar to the story about Laravel. Friends, understand that a particular implementation is not important, since you could do it worse with other frameworks, for example, Yii 2.0 in which “from birth” there is automatic model generation and CRUD. Moreover, Bootstrap is already there, which allows you to use ready-made styles to design your reports very simply and quickly. If we add ChartJS or jqPlot to the template, then we can very easily create beautiful charts. If you suddenly became interested to see my opinion on the implementation of the code for working with statistics, then I posted on GitHub examples (this is my opinion and it is not the best) of implementation in PHP and Java.

Descriptive Statistics

In the previous query (to the MySQL database), we used descriptive statistics (maximum, minimum, average, sum, variance, standard deviation). This is a very commonly used approach in various analytical systems. For example, Yandex.Metrica, Google Analytics and Piwik systems take into account general indicators (traffic, bounce rates, average time on a site, average number of page views per session, percentage of repeat visits, etc.) and site-specific indicators (configured goals and events). Some enterprise systems may use the API to load data from these systems. Very often, not all data is needed, but only pre-processed results, for example,

Similarly, in all areas of life. You must try to evaluate everything significant for you in the form of accurate indicators of achieving goals. It is extremely important for a person to practice knowledge, to receive feedback not only from specialists, but from circumstances as a whole. Trying to accurately assess (with specific numbers) the percentage of achievement of each goal helps to significantly better understand and critically assess the situation. It is very commonplace and everyone knows it, but try to start practicing. Just try, but consciously.

For these purposes, you can use Excel or write queries to the database using Hibernate (sometimes even Solr using SolrJ), but we will try to create a Java training project that uses The Apache Commons Mathematics Library. I automatically (using the IDE) generated the DescriptiveStatisticsInfo POJO class based on the following properties:

private Double variance;
private Double standardDeviation;
private Double summ;
private Double max;
private Double min;
private Double mean;
private Long count;

Created an interface with one single method:

package kalinin.example.report;
import java.util.List;
import kalinin.example.help.statistics.DescriptiveStatisticsInfo;
public interface IReport {
	DescriptiveStatisticsInfo statInfo(List in);
}

For the factory, I need an enumeration as long as it goes like this:

package kalinin.example.report;
public enum EReport {
	DESCRIPTIVE
}

The factory itself turned out like this:

package kalinin.example.report;
public class ReportFactory {
	public IReport getReport(EReport reportType) {
		switch (reportType) {
			case DESCRIPTIVE:
				return new DescriptiveStatisticsReport();
			 default:
				return null;
		}
	}
}

I use the free and worldwide popular The Apache Commons Mathematics Library:

package kalinin.example.report;
import java.util.List;
import org.apache.commons.math3.stat.descriptive.DescriptiveStatistics;
import kalinin.example.help.statistics.DescriptiveStatisticsInfo;
public class DescriptiveStatisticsReport implements IReport {
	private DescriptiveStatistics stats = new DescriptiveStatistics();
	public DescriptiveStatisticsInfo statInfo(List in) {
		this.stats.clear();
		for (Double iStatistics : in) {
			this.stats.addValue(iStatistics);
		}
		DescriptiveStatisticsInfo stat = new DescriptiveStatisticsInfo(
				this.stats.getVariance(),
				this.stats.getStandardDeviation(),
				this.stats.getSum(),
				this.stats.getMax(),
				this.stats.getMin(),
				this.stats.getMean(),
				this.stats.getN()
				);
		return stat;
	}
}

In fact, we selected only the most relevant data for decision making. Agree that if you unload all the statistical data from a large number of sites, then the analysts physically do not have enough time to read all this. And descriptive statistics help us build a summary table, on the basis of which decisions will be made. We don’t need to know the whole gigantic ArrayList containing certain numbers (Java doesn’t store objects there, but links to them) in order to make a decision, therefore, we won’t write down all this data (which can be very, very much), but we will transfer into the system the necessary descriptions of this finite set. Is this just about sites? To verify the correct operation, I simply wrote the result to a file using a simple class:

package kalinin.example.run;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.commons.io.FileUtils;
import org.apache.log4j.Logger;
import kalinin.example.help.statistics.DescriptiveStatisticsInfo;
import kalinin.example.report.EReport;
import kalinin.example.report.ReportFactory;
public class Run {
	final static Logger logger = Logger.getLogger(Run.class);
	public static void main(String[] args) {
		List data = new ArrayList();
		data.add(12.0);
		data.add(8.4);
		data.add(6.1);
		data.add(3.5);
		data.add(9.154);
		DescriptiveStatisticsInfo result = new ReportFactory().getReport(EReport.DESCRIPTIVE).statInfo(data);
		try {
			FileUtils.writeStringToFile(new File(Config.getConf("Config.fileName")), result.toString(), "UTF-8");
		} catch (IOException e) {
			logger.error(e);
		} 
	}
}

There is also a heuristic opinion analysis algorithm. At first glance, everything is simple: there is a sign and its degree of reliability, which were determined empirically. But the most difficult thing is to identify these signs and their weight. After collecting the data, we summarize the “weight of confidence” for each matching attribute. In this case, we need an empirically identified scale by which we will evaluate the reliability. As you understand, one object can have any number (integer, greater than zero) of attributes. As a result, we get a finite set (array), each element of which is the weight of the attribute that coincided during the verification. If the sum of all the elements of this finite set is greater than the empirically determined number, then the heuristic algorithm gives a positive response. However, heuristics are very difficult to apply,

Another commonplace and very simple fact: the human brain is so arranged that it requires not just information, but also attempts to systematize it in the form of an abstract or table. Next, you need to find confirmation and refutation of each significant opinion for you in this table. Understand that a person may not see very banal things, and a conscious analysis with specific numbers will make him pay attention to many significant things. Of course, throughout life, a person is constantly evolving and tomorrow he may already have a completely different opinion. Even this note will be perceived differently at each stage of development. It is very dangerous to be confident in everything and stop looking for quality information on important issues in all areas of life. The main thing is to start practicing in small steps (how training begins with light weights), not hoping to become a great bodybuilder in a week. And if you possess at least basic programming skills and working with databases, then the analysis of information mentioned above will be not only useful, but also fascinating.

Tags:

data analysis

A few thoughts on analyzing other people's opinions

Assessment of people's opinions

{{ $v->author }}

Descriptive Statistics

Also popular now: