Make your site more searchable.

  • From RSS
John Muller, Web Trending Specialist (Google Zurich)

Purpose - Accessibility and Usefulness of Information

Google’s mission is to organize global information, make it accessible and useful to everyone. The key to the success of this mission is the continuous scanning of the Internet in search of fresh content and adding it to our index. We regularly scan billions of pages, and we know about the existence of even more documents : we index web pages , forums , images , news , videos , books and much more. But sometimes users want to find more. Often this is information that is published online, but for one reason or another is not available for our scanners. If the crawlers do not have access to documents, it will be difficult for a search engine to fully index and provide them to users.

Are your web pages indexed?

Checking how indexed the content of your site is is easy: to do this, search for the URL of your domain with the operator "site:". For example, to check which part of Google Groups is indexed by our search engine, you need to make a request [ site: groups.google.com] (in the text, we usually enclose search queries in square brackets, but do not do this directly in the search query string; also note that there is no space after the site: operator in the query).



This example shows that a large number of pages are indexed, and the first result is the Google Groups homepage. This is good - a lot of information is available, many messages are already indexed and available to users.
If your site is poorly indexed, then in the search results window you will either find that there are no links to the content of your site, or there will be few such links. This problem is illustrated in the following example. In this case, the example.com domain is not crawled by our search robot. If you make a request [ site: example.com], you’ll see in the search results that, unlike Google Groups, the example.com pages are not indexed:



My site looks like the screenshot above! What to do to correct the situation?

If your site is indexed in the same way as shown in the previous example, or not indexed at all, you should not panic. On the Internet, nothing is permanent. In most cases, you can find and fix this problem quickly enough. Here are a few things to check:
  • Is your site new?
The process of crawling and indexing a site may take some time. If your site is new, we may not have found it yet. Wait a while and check again. At the same time, you should check your site for convenience for Google .
  • Is your site open for crawling by search robots?
Search robots often follow the instructions described in the robots.txt file of each site (if such a file is created). This text file is usually written by hand and is located in the root of the site. It tells search engines which parts of your site are open / closed for crawling and indexing. Sometimes, when creating this file, webmasters accidentally block access to all search robots with incorrect Allow / Disallow settings. This may also happen due to the fact that the robots.txt file has not been edited or fixed since the beginning of the development of the site.

In some cases, webmasters block access to all search robots in order to avoid excessive load on the web server, which can occur during intensive crawling of the site. In such a situation, instead of prohibiting crawling of the entire site, it will be useful to identify the individual pages that are causing the problem and block only them. It’s also worth setting the crawl frequency in the settings for Google’s webmaster tools, if you think this will help reduce server load.
The contents of the robots.txt file for a specific site (also yours) can be viewed in any browser. For example, you can see the contents of the robots.txt file on YouТube.com .

The Google Webmaster Tools Console has a program foranalysis of the robots.txt file . You can also create a robots.txt file for your site there if you don’t have one (although the presence of a robots.txt file on the site is optional).

The following lines in the robots.txt file prevent all search robots from accessing the entire contents of the site ("/" indicates the root level of the site’s file tree):

    User-agent: *
    Disallow: /
The following lines in robots.txt allow access to the contents of the site to all robots :
    User-agent: *
    Disallow:
Note that nothing is written after the “Disallow:” directive. The absence of a robots.txt file on your site has the same effect.

Comments in robots.txt files can be added using the # symbol at the beginning of the line, for example like this:
    # this is a comment
Information about crawl errors (for example, URLs that are prohibited by robots.txt ) can be found in the Google webmaster tool console. To access this information, make sure your site has been added and verified .
  • Does your site prohibit content indexing?
Some sites allow crawling content from a web server, but at the same time prohibit its indexing. Index prohibition is usually done using the “noindex” attribute in the “robots” meta tag of files . You can check if there is such a tag on your site if you look at the source code of the main page of your site (remember that this tag can be placed on each page individually, and not just on the main one).
Often the reason for this is that the setting turned on by default was not turned off in the site software. Sometimes the names of such settings may be unclear or loosely related to this meta tag. For example, the name of the setting may be “site visibility” or “allow search bots to search on your site”.
  • Are you sure there are no other technical issues blocking the search engines?
Sometimes there are technical problems that prevent us from crawling your site. If you suspect that it is precisely these problems that are causing, perhaps you should ask a question in our help forum for webmasters or ask your hosting provider for help.
  • Does your site comply with Google’s quality guidelines?
We can remove a site from search results if it does not meet the quality guidelines . If you think that your site may violate these recommendations, bring it into line with the requirements, and then send a request to review the site from your account in the webmaster tool console. If the meaning of some of our recommendations is not entirely clear, you should contact the webmasters or Google employees in the webmaster help forum .

If you checked your site using the above methods, and it seems to you that your site should have been crawled and indexed for a long time, it may be useful for you to look in the forum archive if other webmasters have faced a similar problem. You can also ask your question in the forum. Once you have taken the necessary measures, crawling and indexing your site in most cases is a matter of time.

Thank you for taking the time and patience to check your site. We hope this helps make your site search engine friendly and improves its visibility for your users!

Also popular now: