druzhkov March 15, 2010 at 09:13

Personal Book Search Service

Good afternoon friends.

Let me introduce you to the personalized book search service . Unlike the classic search, here the system, once having received requests from the user, will search them again and again. When each new match is found, the system sends a notification to the user. And so it is repeated until the user finds all the books that he needs and deletes his search queries.

Personal Search Idea

I believe that I am not the discoverer of the idea of personal search. Nevertheless, I will dwell briefly on this. So, suppose we got to some interesting site (for example, about selling and exchanging books, like mine). Here we see that users regularly add new books. And everything would be fine, only to constantly go to the site in the hope that something we need is about to appear - it’s somehow inconvenient ... Yes, and we are busy people, we can forget ...

Therefore, the idea immediately arises - what if we unload all the "dirty work" on the shoulders of a search robot? Sounds tempting! May he himself is looking for those books which we tell him, and worried (notify) us only in the case where such books do appear.

If you think about it, you can find a lot of cases where the same approach can be applied. For example, a notice that a suitable job vacancy has appeared in the city. That the required medicine was brought to the pharmacy (you never know, a person with a chronic illness, and the medicine is ending). That there was an interesting gadget / video card / hard drive ... It seems to be simple things, but you have to spend your time. Besides regularly. And if the information is still scattered over a dozen sites? In general, uncomfortable.

Personal Book Search

However, back to the books. Books are convenient in that they can easily determine whether a book matches a search query or not. For example, “Lukyanenko” is always Lukyanenko’s books, and “Bury me behind the baseboard” is just such a book, and no other. Therefore, 95 percent of all the work can be done right away by the search algorithm itself, but the remaining 5 percent remains with the editor. What to do - some search queries look rather ambiguous and give a large stream of irrelevant matches. I have to weed them out with my hands.

Nevertheless, even in such a simple model, the numbers are pretty good: out of about 3 thousand incoming books and about 200 search queries, about 300 matching books were found. That is, in fact, every tenth at the time of its addition already haspotential buyer (and sometimes several at once).

Finally, I’ll open one little technical secret: if the author is entered in the search query, the system searches not only for the direct version, but also its synonyms (for example, “Lukyanenko” = “Sergey Lukyanenko” = “Lukyanenko, Sergey” = “Lukyanenko S.”) . Synonyms are stored in the database and replenished as far as possible and possible, as well as the availability of funding from advertising on the site :-)

Service extension

Initially, a personal search was only possible for registered users of the site. Finally, after about three months of running-in, it was decided to open this opportunity for ordinary guests. Now anyone can leave on our site search queries for books.

However, this is still not the most delicious. Just the other day, it was possible to expand the scope of the search not only by books from our site, but also from the LiveJournal communities. About 30 rss feeds (those book communities that are active) were connected to the system. Next, the script downloads their contents and searches inside the messages for search words. For relevance, only those messages that are no more than 20 days from the current date are analyzed. In the first launch, about 50 books were immediately found that fit the user's search queries - again, a very good indicator.

Already, the algorithms are "damp", but in the plans it will be possible to connect any other book projects that display their books through rss feeds. In addition, it is planned to introduce a “personal search area” later - “in my city”, “in my and nearest cities”, “in all cities”. After all, it happens that someone is looking for a rare book and is ready to order it even from abroad, and some are bestsellers in their city.

Performance

Performance is a separate issue. The algorithm consumes resources quite a lot, so you have to run it on a local copy of the database (conveniently - a backup is made and a search is immediately performed on it).

Typical figures: the total size of the book table is about 11 thousand records, the search starts when every 200-300 records are added. The number of requests is about 200. The script runs on my machine for about a minute. While not annoying much, but as the service increases, you will need to think about optimization (now, apparently, he thinks for a long time because of the large number of relations between the tables). But for comparison: the run of the same 200 queries on the table of topics downloaded from LJ took only about 7 seconds. But there is only one table, in which about 70 entries. In general, the experiments continue.

Tags:

Personal Book Search Service

Personal Search Idea

Personal Book Search

Service extension

Performance

Also popular now: