Page Rank in the Web 2.0 era - Part 1
Elections are held in order to find out whose election forecast turned out to be more accurate. (c) Robert OrbenTo evaluate Google’s contribution to search engine development, you need to move about 20 years ago. In those troubled times, the amount of information on the Internet was hundreds of times less than now, but the search for the necessary information was much more difficult. The user could spend a long time on the search engine site, trying to formulate a search engine query in different ways and still not get the desired result. There were even agencies that offered their services to search the Internet for money. At the dawn of search engines, the importance of a page was determined by a multitude of subjective factors, such as html markup, the number of terms, headings, and the boldness of the font on the page. Not infrequently, a specially created page or a copy of the original page, filled with the necessary headings and terms, turned out to be in top.
In 1997, two students at Stanford University proposed the famous Page Rank algorithm. In fact, this is the rare case when engineers jumped out of a perennial swamp and found a simple elegant solution that in one simple step closed a pile of problems and predetermined the outcome of the battle between CEO specialists and search engines for many years to come. The essence of the Page Rank - is "democracy" in the world of the Web. Each page on a site that contains a link to another site “votes” for it. In this way, the most frequently cited, authoritative sites of the primary sources rise to the top. Page Rank helps to bring up the most popular sites that, like air bubbles in the water, pop up based on the “opinions” of a large number of less popular sites. This scheme worked well in the ecosystem of the early 2000s, where small websites dominated, the content of which were engaged webmasters and content managers. With the advent of Web 2.0, Internet users themselves have become the main source of information on the Internet, which has modified the Internet. First, the huge flow of information from users led to the emergence of giant sites with millions and sometimes tens and hundreds of millions of pages. Secondly, the sites began to contain a large amount of unstructured and unadapted information for search engines, a large number of local memes and syntax errors. Once a topic has been created, say on a forum or blog under one heading, it can easily move to another area for discussion. When searching on such sites, the main problem is no longer to determine the authority of the site, but to correctly rank the pages within the site itself, because now hundreds and thousands of pages can be entered into a search query. Of course, in such cases, Page Rank does not work, and many search engines use techniques from the "pre-Google" era, such as analysis of headers, tags, and so on.
In the next part, I will tell you whether it is possible to get around this problem using machine learning, how to make the machine rank pages within the site itself given its unique terminology using the example of searching this site.