Parsing sites: what does one of the most useful IT tools in the world (and in Russia) look like from the point of view of the law?

    image

    Let's try to consider one of the best ways to collect information on the Internet - parsing - from a legal point of view. Attention! This publication addresses some general legal issues related to parsing, but does not constitute legal advice. The article is a continuation of the publication " 10 tools to parse information from websites, including competitor prices + legal assessment for Russia "

    Parsing is an automated process of extracting data from someone else’s website. But it’s worthwhile to find out if this is really one of the most useful IT tools for collecting data or a trap that leads to inevitable problems with the law? Parsing could certainly become one of the most perfect ways to extract content throughout the network, but there is a caveat to it: this tool is very difficult to deal with from the legal side. Parsing is the process by which the automated part of the software retrieves website data by combing through multiple pages. Search engines like Google and Bing do something similar when they index web pages, and parsing mechanisms go further and convert information into a format that allows you to use this data, enter it into databases or spreadsheets.

    Parsing is not the same as an API. For example, a company may open access to the API to allow other systems to interact with its data; however, the quality and quantity of data available through the API is usually lower than can be obtained using parsing. In addition, parsing provides more relevant information than through the API, and is much easier to configure from a structural point of view.

    Fields of application of "parsing" information are very numerous. A sports journalist can use parsing to examine baseball statistics for an article. Or, for example, in e-commerce, you can extract the names of goods and their prices from various sources for subsequent analysis (as an example in Russia - the open service for parsing and monitoring the prices of competitors xmldatafeed.com ).
    image

    But, although parsing is undoubtedly a powerful tool, when it comes to legal issues, difficulties may arise. Since in the process of parsing the initially existing content from various sources is assigned to those who use this tool, ethical and legal difficulties arise.

    To date, there is no clearly defined legal framework in the parsing environment, it is a state of constant movement, but you can try to roughly outline the areas of greatest risk. Below, the most striking cases that have taken place in the United States and become precedent are described in general terms.

    2000-2009: eBay


    After the parsing of legal problems did not arise for quite some time. But in 2000, the use of this tool provoked a real battle - eBay opposed the auction data collection company Bidder's Edge. EBay has accused Bidder's Edge of illegally using data mining, referring to the Moving Property Boundary Doctrine. The judge supported the plaintiff, saying that the high activity of robot programs could undermine the work of eBay.

    Then, in a 2003 Intel Intel lawsuit against Hamidi, the California Supreme Court dismissed the rationale that eBay used against Bidder's Edge, stating that the Do Not Move Movable Property Doctrine could not be distributed in a computer environment unless real damage to personal property was inflicted.

    All the earliest cases against parsing were based on the Doctrine of the violation of movable property boundaries and ended in the success of the plaintiffs. But this approach is no longer valid.

    2009: Facebook


    In 2009, Facebook sued Power.com - a site that bundled various social networks into one centralized resource - when the latter included Facebook in its service. Since Power.com parsed Facebook content, instead of adhering to the established standards of the giant, Facebook sued for copyright infringement. Facebook accused Power.com of copying the Facebook website in the process of extracting user information. Facebook claimed that this process was a direct and indirect violation of copyright. The court decision was in favor of Facebook, and from that time decisions on the legality of parsing began to be made in favor of the authors of the content on the sites.

    Even if the parser ignores counterfeit content in the process of searching for publicly available information, its actions may be characterized as copyright infringement, because technically the counterfeit content is still “copied”.

    2011-2014: Auernheimer


    In 2010, hacker Andrew Auernheimer found a security hole in the AT&T website and retrieved the email addresses of users who visited the site from their iPad. Using a security flaw and parsing, Auernheimer was able to access thousands of email addresses from the AT&T website. Auernheimer was found guilty of unauthorized access to the AT&T server and misappropriation of other people's data.

    Using parsing to extract confidential personal information can lead to a charge, even if this information was nominally publicly available. You can try to convince the court that neither passwords nor codes were cracked to gain access to information, however, this is a dangerous territory.

    2013: Meltwater


    Meltwater is a software company whose product Global Media Monitoring uses parsing to collect news. The Associated Press sued Meltwater for parsing articles, some of which were copyrighted, and for misappropriation of news. The facts cannot be protected by copyright, but the court decided that the articles themselves and the author’s statement of facts were illegal to copy. In addition, the use of articles by Meltwater did not meet established standards. Authoring content is not always possible to parse!

    2014: QVC


    In 2014, QVC (a well-known television retailer) and Resultly (app store) sued for what QVC called "excessive parsing." The QVC's accusation was that Resultly disguised its search engines to hide the original IP address, so QVCs could not block the parsers that were undesirable for them. Due to the fact that the bots were quite aggressive to QVC servers, there was an overload with a power outage, which caused damage of $ 2 million. The court acquitted Resultly, having ruled that there was no intention of harm.

    And what about Russia?


    Let's start with the simplest and most common question - photographing price tags in stores, although this is not directly related to site parsing, the problems are similar (indeed, it seems that there is no difference in photographing price tags in stores or parsing prices from competitors' websites).

    So, the question is: Is it possible to establish a rule for buyers prohibiting unauthorized photo and video shooting in the store? If we do not delve into a detailed interpretation of the law, let's look at the most important article about information:

    In accordance with article No. 5 of the Law “On INFORMATION, INFORMATION TECHNOLOGIES AND ON THE PROTECTION OF INFORMATION”:

    1. Information may be the subject of public, civil and other legal relations. Information can be freely used by any person and transferred by one person to another person, unless federal laws establish restrictions on access to information or other requirements for the procedure for its provision or distribution.

    2. Information depending on the category of access to it is divided into publicly available information, as well as information, access to which is limited by federal laws (information of limited access).

    3. Information, depending on the procedure for its provision or distribution, is divided into:

    1) information freely distributed;
    2) information provided by agreement of persons participating in relevant relations;
    3) information which, in accordance with federal laws, is subject to provision or distribution;
    4) information, the distribution of which in the Russian Federation is limited or prohibited.

    4. The legislation of the Russian Federation may establish the types of information depending on its content or holder. Thus, information about prices in stores is publicly available , as There is no legislation restricting access to such information. In this connection, it is not prohibited to rewrite and remove prices in the store.

    Indeed, there are no violations of the law. Moreover, article 29 of the Constitution of the Russian Federation enshrines the right of every citizen to “freely seek, receive, transmit, produce and disseminate information in any legal way”.

    Now for parsing sites. The question that we asked the law firm (Frese and partners): “Is the organization entitled to carry out automated collection of information posted on the public Internet sites (parsing)?”

    In accordance with the legislation in force in the Russian Federation, everything is permitted that is not prohibited by law. Parsing sites is legal if there are no violations of prohibitions established by law during its implementation. Thus, with the automated collection of information, it is necessary to comply with applicable law. The legislation of the Russian Federation establishes the following restrictions related to the Internet:

    • No violation of Copyright and related rights.
    • Unauthorized access to legally protected computer information is not permitted.
    • It is not allowed to collect information constituting a trade secret in an illegal way.
    • Obviously unfair exercise of civil rights (abuse of law) is not allowed.
    • The use of civil rights in order to limit competition is not allowed.

    It follows from the above prohibitions, the organization may carry out automated data collection (parsing sites), placed in the public domain on the Internet site if met the following conditions:

    • Information is in the public domain and is not protected by copyright and related rights laws.
    • Automated collection is carried out by legal means.
    • Automated collection of information does not lead to disruption in the operation of sites on the Internet.
    • Automated collection of information does not limit competition.

    There are recommendations that should be followed if parsing is used:

    • Content retrieved must not be copyrighted.
    • The parsing process should not interfere with the operation of the site, which is parsed
    • Parsing must not violate the terms of use of the site
    • The parser should not extract the user's personal (personal) information
    • Parsed Content Must Meet Fair Use Standards

    ps The thinnest point is the possibility of claims that “parsing interferes with the operation of our site and we incur losses”. In response to such a claim, one can refer to the fact that the search engines Google and Yandex are engaged in parsing (indexing) of the entire site and collect all available information, doing this fairly regularly. Accordingly, it sounds logical that a similar parser that goes to the company's website to collect price information performs the same technical action. It can be difficult to prove that a similar action interferes with the work of the site, and the operation of search engines does not interfere. But in any case, a good parser should follow the rules in robots.txt ...

    Only registered users can participate in the survey. Please come in.

    Are you interested in parsing sites, especially in the context of monitoring the prices of competitors?

    • 84.5% yes 115
    • 15.4% no 21

    Also popular now: