Identify virtuals on Wikipedia

    Wikipedia is a free, open, multilingual universal Internet encyclopedia created by the efforts of many users. Today Wikipedia contains 25 million entries in 285 languages, almost half a billion people access it every month. In terms of completeness and depth of coverage, Wikipedia is comparable to the famous British Encyclopedia. Thousands of volunteer editors from around the world are constantly replenishing it with fresh articles. Thanks to their selfless work, this gigantic repository of knowledge is created and developed.

    Wikipedia has become the world's most popular source of educational, historical, and scientific knowledge and is among the top 10 most visited sites on the Internet. It attracts not only those who seek knowledge, or wants to share it disinterestedly, but also marketers and PR managers trying to use the site as an advertising platform to place custom-paid paid articles there. Wiki-PR was createdspecializing in writing and posting articles and edits of an advertising nature on Wikipedia. The placement price of one such article ranged from 500 to 1000 dollars. Separately, a monthly fee of about 50-70 dollars was paid to ensure that the article or revision was not deleted, or vice versa, so that material undesirable for the customer was removed and no longer appeared on Wikipedia pages. This point deserves special attention.

    Wikipedia is an open community, the first phrase that greets users when entering the site is: “Welcome to Wikipedia, a free encyclopedia that everyone can edit.” Thus, everyone can add an article to Wikipedia or make changes. But if they are advertising or biased in nature, they will certainly be noticed and deleted during editing. In order to avoid their deletion, hundreds of additional accounts were created - sokappetov (born sock puppet - a doll from a stocking or sock, worn on the hand, and entering into dialogs on its own behalf even with the puppeteer), which participated in the discussion of edits and created the appearance of their active support and approval.

    Here you need to make a small digression. Additional accounts created by one user are not prohibited on Wikipedia. It is recognized that there may be good reasons for creating such accounts, for example, for editing articles on various topics, or for discussing controversial controversial topics. But to participate in the discussion of a particular topic simultaneously from multiple accounts, Wikipedia prohibits.

    After the Daily Dot published an articlethat the placement of custom-made materials on Wikipedia is not an isolated one, but has moved into the category of business services, mass checks have been made on the project. As a result of these checks, 250 additional user accounts were blocked, from which laudatory articles about products or companies were posted on the resource pages, and their interests were actively lobbyed.



    On my blogSue Gardner, CEO of the Wikimedia Foundation, said that the actions of editors whose accounts were blocked violate the basic principles that make Wikipedia highly regarded by many people. “Our readers know that Wikipedia is not perfect, but they also know that it serves their interests exclusively and never tries to sell them or recommend any product in one form or another,” writes the Wikimedia Foundation Executive Director.
    Gardner emphasized that the investigation into the use of virtuals to edit articles has not yet been completed and the company intends to verify the disinterest and independence of Wikipedia editors in the future.

    One of the problems of detecting virtuals is that only some site administrators are entitled to use technical methods for this, consisting in comparing IP users, who resort to this only if there are good reasons. Therefore, the main way to identify doubles is the behavioral method: comparing edits and comments that suggest that they belong to the same person. This requires appropriate experience, such work takes a lot of time, but even in this case it may end in failure.

    To help Wikipedia, researchers at the University of Alabama in Birmingham, Ragib Hassan and Tamara Solorio created a program that can help identify sockpuppets - multiple accounts owned by one person. The program is able to analyze text fragments that are added from different accounts, based on which determines the likelihood that they belong to one person. For comparison, grammar, punctuation, syntactic and some lexical features of the text are used.
    The experiment showed that the accuracy of determining additional accounts of one person using this program is 70-75%, while it is assumed that further work on the program will increase its effectiveness.

    The program itself, as well as the tools that were used to create and test it, can be found on the project page: docsig.cis.uab.edu/?page_id=68

    Compared to another similar program JStylo , which was presented at the 29C3 conference in Berlin, this project has the advantage that it can analyze small text fragments, while JStylo requires that 6.5 thousand words be collected for each “suspect”, and the length of the text whose authorship needs to be established is at least 500 words.

    A program that can analyze and determine the authorship of short texts can be used not only to help Wikipedia identify clones, but also to identify additional user accounts in forums, discuss news, post tweets, and other types of interaction on the Internet, where short comments are added and text.

    Also popular now: