rssbot September 19, 2008 at 10:58

Google Search Quality: Introducing

From RSS

Posted by Udi Manber, Vice President, Engineering, Search

Quality, Search Quality - This is the name of the department that is responsible for ranking Google search results. The meaning of our work is simple and clear: people send requests to Google several hundred million times a day, and in a few fractions of a second, the Google system must decide which of the billions of pages to show them, and in what order. Recently, we have also been doing other things. However, more on that later.

Surprisingly, very little is known about the ranking of Google results, which is used by so many people and so often. It is entirely our fault, and this is done on purpose. Frankly, we don’t talk much about what we do. There are two reasons for this: competition and abuse. Everything is clear about competition. No company will share secret recipes with its competitors. Regarding abuse: if we make our rating formulas too accessible, we will increase the possibility of fraud with the system. Securing security through secrecy is, of course, not the most effective way, but we do not rely solely on it, but in this way we prevent a large number of violations.

Rating algorithms are one of Google’s most attractive assets. We are very proud of them and protect them very much. According to some estimates, if you add up all the years of work of programmers and researchers who have gone to the development of these algorithms, you will get more than 1000 years, and the speed of innovation has not decreased since then.

Nevertheless, complete secrecy is not ideal, and this blog post will allow us to discover a little more information than we did before. We will try to issue such messages periodically, talking about innovations, explaining the existing functions, sharing tips, news and entering into a dialogue. I would like to start with general information about our department. In the future, we are planning new blog posts on this topic.

Now let me introduce myself. My name is Udi Manber. I am vice president of engineering at Google and head of search quality. I have been with Google for more than two years, and have been doing search technology for almost 20 years.

The basis of the department is a team that works on determining the main ratings. Rating is not an easy task. It is much more complicated than many might think. One of the reasons for this is the differences in languages and the lack of rules that should be followed when creating documents. There are no standards defining the way information is presented. Therefore, we must "understand" all the web pages that any person could create, for any occasion. This is only half the problem. We also need to understand user-entered queries, which average less than three words, and compare them with how we recognize all the documents. Not to mention that different people are looking for different things. And all this we need to do in a few milliseconds.

The most famous part of the ranking algorithm isPageRank , an algorithm developed by Larry Page and Sergey Brin, founders of Google. PageRank is still in use today, but now it is part of a much more complex system. Other elements include language models (the ability to process phrases, synonyms, diacritics, spelling errors, and so on), query models (this relates not only to the language, but to how people use it), temporary models (for some queries the best answers can be found on a page created just 30 minutes ago, and some on pages that have stood the test of time) and personalized models (because all people are different).

Another group of employees in our department is responsible for evaluating the effectiveness of our work. This is done in various ways, but the goal is always the same: improving the user experience. This is not the main goal - this is the only task. Every minute, automated performance checks are performed (so that everything works as it should), as well as a periodic assessment of the quality as a whole, and, most importantly, an evaluation of the algorithm's improvements. When an engineer has a new idea and develops a new algorithm, we carefully check it. We have a group for working with statistics, which considers all the data and determines the value of a new idea. Every week there are meetings (sometimes twice a week) at which we work out new thoughts and approve of innovations. In 2007, we introduced over 450 enhancements, which averaged about 9 improvements per week. Some of them are simple and obvious - for example, we fixed a mistake when processing queries containing abbreviations in Hebrew (in Hebrew, abbreviations are indicated by a symbol (") next to the last letter, that is, IBM abbreviation is written as IB" M), and some were very complex "For example, in January, we made significant changes to the PageRank algorithm. Most of the time we looked for ways to increase relevance, but we also worked on projects with the sole purpose of simplifying the algorithms. The simpler, the better." and some were very complex - for example, in January, we made significant changes to the PageRank algorithm. Most of the time we searched for ways to increase relevance, but we also worked on projects with a single goal - simplification of algorithms. The simpler the better. and some were very complex - for example, in January, we made significant changes to the PageRank algorithm. Most of the time we searched for ways to increase relevance, but we also worked on projects with a single goal - simplification of algorithms. The simpler the better.

Over the past two years, one of the main areas of our activity has been international search. This means that we have worked with all languages, not just the most common. For example, last year we made significant improvements to the work of the system in the Azerbaijani language, which is spoken by about 8 million people. Over the past few months, we have launched spellcheckers in Estonian, Catalan, Serbian, Serbo-Croatian, Ukrainian, Bosnian, Latvian, Tagalog, Slovenian and Farsi. We have organized a worldwide network whose members provide us with feedback. In addition, we have many volunteers at Google who speak different languages and help us improve our search.

Another group is working on new features and new user interfaces. For a great car you need a good engine. But one is not enough. The machine should be comfortable and easy to drive. The Google search user interface is pretty simple. Only some of our users read the help pages - they can do without them (although they are easy to read and we continue to work on improving them). When we add new features, we try to make them have an intuitive way of working and are convenient to use for all people. One of the most noticeable changes we have introduced is the universal search . Among others - Google Notepad , custom search engines, and of course, many improvements to the iGoogle page. The user interface development team is assisted by usability experts who conduct user surveys and evaluate new features. They travel all over the world. Sometimes they go into homes to look at the work of users in a natural setting. (Don’t worry, they won’t come without an invitation or without warning!)

We have a group whose work is fully dedicated to combating spam and other types of violations. This group works on a variety of problems: from hidden text to pages that are not relevant to the topic that have sets of keywords, plus other schemes used to obtain a higher ranking on search results pages. The group notes and fights against new spam trends, providing scalability of the proposed solutions. Like all other groups, it works internationally, covering different languages and countries. The web spam team works closely with the Google Webmaster Center team to share ideas with users and get the opinions of site owners.

The company has other departments that work on a variety of projects. In general, the organizational structure is quite informal. People move from one group to another, and new projects begin constantly.

One of the most important aspects related to the search is that user expectations grow very quickly. Tomorrow's requests will be much more difficult to fulfill than today. In the same way that Moore’s law determines the doubling of the speed of computer calculation every 18 months, there is also a hidden unwritten law according to which the most complex queries in a short time become twice as complicated. It is impossible to express with exact numbers, but we all feel it. We know that we cannot rest peacefully on our laurels - we need to work hard to solve this problem. As I said earlier, we will continue to inform you about updates in the future as a search, so stay tuned.

Tags:

Google Search Quality: Introducing

Also popular now: