
Effective site search. Statement of problems - the search for solutions.
This is a reprint of an article by Ivan Nikitin, which was published on our website Nomagic.ru in September . This article contains only the statement of the problem and a discussion of possible solutions. Links to articles describing the solution of the problem with the LiveSearch API on ASP and PHP can be found at the end of the article.
Any modern site with more than 5-10 pages of content should have a search engine. No matter how well we planned the navigation bar, or the catalog of products / sections of the site, anyway any of our attempts to intuitively systematize , ultimately, will be incomprehensible to the 101st user of the site.
Want to make sure that this is so? Here are some simple tasks, try spending a few minutes on them (all examples are taken absolutely by accident from a list of sites I personally know and visit. These examples are by no means meant to downgrade the quality of these resources):
- Find on the site http://www.specialist.ru/ (without using the search!) 2 (two) courses on Microsoft SharePoint 2007. Record how much time you spent on this.
- Find confirmation on the site http://www.sipnet.ru that the VoIP D-Link DVG-2001S gateway works with the Sipnet service, as well as a brief description of it. Write down how much time you spent on it.
- On the site www.megafon.ru find the annual report on the results of work for 2006 in Microsoft Word format (without using the search). Did you succeed?
Should I continue? I think you already agree with me. And just as the developers of sites reason, when they face the problem of creating a search engine. Unfortunately, most developers underestimate the complexity of this solution and believe that the search can be reduced (simplified) to an SQL query: So it can be so, only the value of such a search will be zero. You can, of course, complicate, add a search by words and their combinations (I am so touched by the phrase that you sometimes find on websites: “ You can use AND, OR, NOT .” Yeah! You explain the Boolean algebra to the user). But the problem with such a search is that the developer believes that the user will enter
SELECT * FROM products WHERE
title LIKE ‘%что-то%’
OR description LIKE ‘%что-то%’
product names or news headlines are exactly the same as they are indicated on the site, and the user enters simply what he needs now, in completely arbitrary form, and besides, he, the user, as a rule, enters short queries consisting of one two words. That is, the user is looking for courses on SharePoint 2007, he will write “SharePoint 2007”, and not “Windows SharePoint Services v3”. And as a result, we get a completely inoperative search engine, because such a search will either dump hundreds of links as a result, and it will be impossible to find something as a result of the search, or it won’t return anything. Want to make sure? Take two powerful resources with large development budgets and try to test the search on them:
- On the site www.mts.ru , use the search to find the credit form for paying for calls, that is, how to arrange it and how to pay for calls ... What request do you enter? " Credit form of payment ." The result will be something like this:
- On the website www.alfabank.ru find information on mortgage lending. What query do you enter? " Mortgage ." Here is the result:
It is easy to replace that both times you got a negative result. In the first case, you did not receive anything, in the second - completely unnecessary information (how did you like the link to the banner about the mortgage?). Note that both times an unsuccessful search can force the client to leave forever : I will not switch to MCT, since there is no credit form for paying for calls (in fact, there is!), And I will not contact Alfa Bank, as I could not find mortgage terms (again - these are just examples! Nothing personal!).
How to solve this problem?
Effective Search Implementation
First, you need to realize that a good search is far from a trivial task. Moreover, we can say this: the complexity of implementing a good search far exceeds the complexity of implementing the functionality of the entire site. Therefore, you need to think a hundred times before setting yourself such a task. Are you ready to start writing a system of morphological analysis, assessing the relevance of documents, an algorithm for ranking results? And most importantly, how many man-hours and thousands of lines of code are you willing to put on this?
But we can nevertheless solve this problem! We have at least three ways to solve it:
- Using search engine forms
- Using Available Web Services
- Using third-party solutions
All these methods differ from themselves in labor costs, cost, and the result obtained, but all three methods give an order of magnitude higher quality result than the above examples.
Using search engine forms
This is the cheapest and easiest to implement. Instead of writing your raw and low-quality search code, you simply embed a form on the pages of your site that sends the request to the search engine. As such a system we will show the use of Google, although you can use any other, for example, here are the forms of Yandex: http://company.yandex.ru/forms/ . But I like Google more, because, in my opinion, its search quality is much higher than that of other search engines.
So, let's draw something like this: Please note that an indication that the search is provided by the Google search engine is mandatory! That's all! Due to hidden fields, we ask Google to search only on the specified site. Moreover, the quality of the search will be obviously higher than in the above examples. Let's make sure:

The first link indicates the design of a credit form of payment on the MTS website.
Example with Alfa Bank:

The first result is all the information about Alfa Bank's mortgage!
Of course, with all the simplicity of this method, its drawback immediately catches the eye: the user goes from your site to the search engine. Actually, this in itself is not so scary, because all the links from the search engine back to you and only to you, but here’s contextual advertising. I do not think that Alfa-Bank will agree with a similar proposal to use a similar scheme. :-)
Nevertheless, this method can be highly recommended to low-budget or non-commercial sites, since the quality of the search far outweighs the negative aspects in the form of contextual advertising.
Using Available Web Services
In this method, we will try to refuse to display other people's advertisements in the search results. Many search engines provide services for automatic searches. This is Yandex.XML ( http://xml.yandex.ru/), and Google services and others. The general meaning is that we provide our search form, which sends the user’s request to our server, which in turn passes it to the search engine. Having received the results, our server displays them in any design, in any form, on our website. The user does not even realize that the search was carried out by some external system, since he sees the results in the design of our site. True, Yandex.XML has some completely incomprehensible licensing system (the requirement to display Yandex.Direct ads in parallel), and Google quietly closed a similar service about a year ago, and now provides such a search only in conjunction with AdSense ads, again, with contextual advertising .
But here you can find a way out. Microsoft has an API for working with Live.com search (http://dev.live.com/livesearch/ ), which (the API is “it”) allows you to implement a similar system. True, this API limits the number of requests per day to about 1000 - 3000 requests, but this is enough for medium sites.
It is completely uncomplicated to implement such a search, especially since the Live Search API provides for SOAP calls to XML Web services, which means that these calls can be made from any platform, and from any site development tool: PHP, ASP.Net, etc.
Some time ago, we made the implementation of such a search when it became necessary to create a search for the site Specialit.ru. You can see it in action at: http://search.specialist.ru
If this topic seems interesting to you, please leave your feedback and suggestions in the comments to the publication, and in my next article I will give a detailed example code for implementing a search engine based on the Live Search API. Believe me, everything is much simpler there than it seems at first glance. :-)
Using third-party solutions
However, a method using available Web services, such as the Live Search API, has two notable disadvantages:
- Inability to quickly manage resource reindexing
- The impossibility of indexing (and as a result of the search) in the closed sections of the site
The first drawback is due to the fact that search engine robots set the update schedule for your site in the index themselves, and if, for example, your site does not give the correct HTTP Last-Modified response header (which is a disease of 90% of sites on the Internet!), Then this time may to be significant. That is, after the appearance of new materials on your site, it may take days, or even weeks, before they appear in the search results.
The second flaw is generally fatal. A search engine robot will not be able to access private sections of your site (for example, a private forum where authorization is required), and therefore information from private sections will never appear in the search results. You can, of course, dodge and make anonymized publication of information from closed sections (for example, display messages in a closed form without information about users), but this will not always happen. For example, what about the search in your corporate mail?
Here, third-party search engines can help us, for example, Yandex.Server ( http://company.yandex.ru/technology/products/yandex-server.xml ) or corporate Microsoft Office SharePoint Server ( http://office.microsoft.com /ru-ru/sharepointserver/FX100492001049.aspx) I know the second much better than the server from Yandex, and it has a fairly powerful search engine that you can use, among other things, to search your site.
Perhaps in one of the following articles we will also consider the integration of Microsoft Office SharePoint Server 2007 with your website to build an effective search engine.
Related Links
- An article about implementing site searches using the LiveSearch API on ASP.NET
- An article about implementing site searches using the LiveSearch API in PHP5