Sphinx - Not Just For Search!


    I think that about almost all of Sphinx’s great search engines have heard about it. Surely many are already using it to search the site, to select similar articles, news, products, etc. It does a great job with search tasks even on very large numbers of entries. But, in my article I want to talk about using Sphinx not for search.

    It so happened that I continued to develop the news aggregator site. When I started to him, he “kept” 40–60 thousand news in the database and terribly slowed down. Rewriting everything from scratch, translating part of the site to statics and introducing caching, I was able to increase the number of news the site managed with up to about 500 thousand. Of course, I used Sphinx to search the news, as I heard a lot of enthusiastic reviews about it from colleagues. I must say, and he did not disappoint me.

    And so, on one not-so-wonderful day, the management set the following task: to show one latest news from each source. Speaking the language of SQL - it is sorted by date and grouped by source. An attempt to implement the appropriate selection from the database led to the fact that the site began to cope with the load.

    It turned out that MySQL first performs grouping (while taking the first record in the table) and only then sorts the selected records. The “sort before group” problem can be solved either by using a subquery or by using the aggregation function MAX (). And this is using a temporary table and very slow queries ...

    An alternative solution was to use Sphinx's ability to group results. After reading the documentation, I decided to try replacing MySQL in the task of selecting news to display on the site in Sphinx.

    To do this, it was necessary to set sorting by date, grouping by the numeric field of the Source ID, and as a request - the name of the news category. The request is as follows: The result exceeded all my expectations! I can’t give exact times for query execution using MySQL and using Sphinx, because I forgot to write them down for joy. But I will say that now we have more than two million news in our database and everything works quite fast.
    $sphinx = new SphinxClient();
    $sphinx->SetServer('localhost', 3312);

    $sphinx->SetMatchMode(SPH_MATCH_EXTENDED2);
    $sphinx->SetSortMode(SPH_SORT_ATTR_DESC, 'date');
    $sphinx->SetLimits($from,$count);
    $sphinx->SetGroupBy('site_id', SPH_GROUPBY_ATTR, 'date desc');
    $category = "@category_path {$cat['category_path']}";
    $result = $sphinx->Query($category, 'news news_delta');




    To keep the index up to date, a delta index is used, updated every 10 minutes, and a basic index, updated daily. MySQL unloading helped to avoid table locks during long queries; MySQL is used only to retrieve data by ID, which happens very quickly.

    Also popular now: