Robots exclusion profile

    Very often, a page that is worth indexing contains information that is not intended for indexing.


    This is the fourth result for the query "here and there" on the hub

    And do not think that this applies only to navigation, which is repeated on each page. Probably, almost no one wants news feeds from other sites, ads, and very dynamic content to be indexed (“they are now on the site ...”). Someone would disable indexing of comments, and someone would like to hide the content of their posts to search engines and leave only the headings.

    In principle, the semantic web will not have such a problem; but each of us has a chance not to live to those bright times.

    It turns out that the solution has long existed: the Robot Exclusion Profile microformat .

    Here's how it should look:

    There once was a man from Nantucket…

    This page is not about pornography.

    * This source code was highlighted with Source Code Highlighter.

    This tale is overshadowed by only one thing: as far as I know, at the moment the microformat is not finally accepted and is not supported by search engines.

    Someone who happens to be on Google Developer Day or Yandex Subbotniks, ask the developers if they would like to include at least draft in the algorithms of their search engine. :)

    PS If it is already possible to exclude part of the page from the index, please tell us about it.

    UPD: ProI know. But it violates the standard and is not perceived by Google.

