Zend_Search_Lucene + PHPMorphy is just

    Once looked at documentation on Zend_Search_Lucene. Everything is good, everything is clear. Take and embed in your website. But there isn’t a word about how to screw a stemmer or morphological analyzer to this thing. In fact, it turned out that making it friends, for example, with PHPMorphy, is very simple.
    Actually, how to do it - under the cut.
    The note will primarily be useful to developers who have not yet faced the problem of full-text search on the site.
    Here you will not find the manual for configuring Lucene or PHPMorphy - this information is already abundant on the Internet.


    So let's get started.
    Before adding to the index, the text is divided into tokens. For how this happens, the classes Zend_Search_Lucene_Analysis_Analyzer_ * are responsible. At the input of the analyzer is text, at the output is a list of tokens. A token is a word that is directly written to the index + its position in the document. At least I understand it that way. In addition to the analyzer, there are filters that convert words, say, to lower case, or do not skip words shorter than three letters.
    All we have to do is write a filter that will convert the word to some initial form. This form will be saved in the index. I forgot to say. All queries to the index also go through the same tokenization and filtering procedure. Thus, the search will be carried out according to the initial forms of words, which, in fact, is what we need. Below is the code:

    class My_PHPMorphy_TokenFilter extends Zend_Search_Lucene_Analysis_TokenFilter
    {
        public function normalize(Zend_Search_Lucene_Analysis_Token $srcToken)
        {
            // смотрим в Zend_Search_Lucene_Analysis_TokenFilter_LowerCaseUtf8
            // и делаем точно так же
        }
    } 
    $analyzer = new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8();
    $analyzer->addFilter(new My_PHPMorphy_TokenFilter());
    Zend_Search_Lucene_Analysis_Analyzer::setDefault($analyzer);
    


    All. We index what we need and display the search results for the user, as they teach us in the Zend_Framework manual.

    Also popular now: