kronos July 21, 2008 at 23:43

Rails + Sphinx =? Part I

Talk about Ruby on Rails search?

I decided to break the story into two parts: in the first, a boring project setup and a simple search in one field of one model. In the second, we dwell on the intricacies and I will try to talk about everything that the plugin can do. By the way, in the source code (link in the text), the project has already been slightly changed for the second part, but this will not cause problems.

Installation

We install Rails not lower than 2.0.2
Download the sphinx 0.9.8: www.sphinxsearch.com/downloads.html and assemble it yourself, or use the ports / portages / <insert necessary> Sphinx supports two sub-MySQL and PostgreSQL, but it’s quite easy to achieve support of any database. Check after installation: The path to searchd and indexer should be in the environment variable path. The sphinx consists of several utilities, some of them: searchd - the search daemon search - a console analogue of searchd for debugging / test searches. indexer is an indexer. Create a project: Forget to edit config / database.yml

$ sudo port install sphinx

Macintosh:sphinx-0.9.8 kronos$ searchd -h

Sphinx 0.9.8-release (r1371)

Copyright (c) 2001-2008, Andrew Aksyonoff

...

$ rails sphinxtest -d mysql

$ cd sphinxtest/

For convenience, we will use the plugin for working with the sphinx.
I believe that there are two adequate plugins for Rails - ultrasphinx and Thinking Sphinx (by the way, while writing an article, RailsCast came out about it). Since the latter conflicts with another plugin “redhill on rails” due to internal naming, I use the first. But perhaps the second is better - choose for yourself. :)
Installing the plugin:

$ script/plugin install git://github.com/fauna/ultrasphinx.git

Customization

$ mkdir config/ultrasphinx

cp vendor/plugins/ultrasphinx/examples/default.base config/ultrasphinx/

default.base - blank for sphinx configuration file. In the first part, we simply configure the paths to the logs / pids / indexes:

# ...

searchd

{

# ...

 log = /opt/local/var/db/sphinx/log/searchd.log

 query_log = /opt/local/var/db/sphinx/log/query.log

 pid_file = /opt/local/var/db/sphinx/log/searchd.pid

# ...

}

# ...

index

{

 # путь где будут лежать индексы

 path = /opt/local/var/db/sphinx/

 # ...

}

 # ...

Writing a code

For simplicity, we’ll make one controller with a form that, with the help of ajax, will look for, say ... Artists by name. The artist’s model will consist of one field - title: migration code, let the artist have only one title field (db / migrate /..._ create_artists.rb):

$ script/generate controller home index search

$ script/generate model artist

class CreateArtists <ActiveRecord :: Migration
  def self.up
    create_table: artists do | t |
      t.string: title,: null => false
      t.timestamps
    end
  end
  def self.down
    drop_table: artists
  end
end

Now we tell the sphinx that we will search for one field (app / models / artist.rb): The entry “is_indexed: fields => ['title']” means that indexing will take place on one field. Well, we create the databases and perform the migrations: It is also worth setting up the routes in the config / routes.rb file: Controller code (app / controllers / home_controller.rb):

class Artist < ActiveRecord::Base

 is_indexed :fields => ['title']

end

$ rake db:create

$ rake db:migrate

map.root :controller => 'home'

map.search 'search', :conditions => {:method => :get}, :controller => 'home', :action => 'search'

class HomeController <ApplicationController
  def index
  end
  def search
    query = params [: query] .split (/ '([^'] +) '| "([^"] +) "| \ s + | \ + /). reject {| x | x.empty?}. map {| x | x.inspect} * '&&'
    @artists = Ultrasphinx :: Search.new (: query => query, 
                                      : sort_mode => 'relevance', 
                                      : class_names => ["Artist"])    
    @artists.run
    respond_to do |format|
      format.js #search.js.erb
    end
  end
end

Первым регулярным выражением мы разбираем поисковый запрос, разбивая слова по пробелам, игнорируем пустые слова( например ,,) и добавляем ко всем слова кавычки. Операция && означает лишь набор слов, ну например запросу
«Bleed it out» => 'Bleed' && 'it' && 'out' будет соответствовать и запись «Sell it out» (два слова из трех совпали), т.е. && не диктует список обязательных слов, а лишь перечисляет их (если вам необходимо обязательное наличие всех слов, то нужно использовать AND, но об этом во второй части).
Коротко пробежимся по параметрам:
:query — поисковый запрос
:sort_mode — тип сортировки результатов
: class_names - an array of model class names that will be created as a result of the search. Sphinx internally stores each document as a set of fields and their values. In Rails, working with such a representation is not convenient, but much more convenient with a ready-made model object. Ultrasphinx will determine which model the found document belongs to and create an instance of it, so the search itself is no different from Artist.find (...) or Artist.paginate (yes, the search results are compatible with will_paginate).
The @ artists.run team executes the request. Requests are very fast. On a seven millionth base - thousandths of a second.
Views (templates) can be viewed in the finished project;

now you can add something to the database:

$ script/console

>> Artist.create(:title => 'Tiesto')

>> Artist.create(:title => 'Armin')

>> Artist.create(:title => 'ATB')

>> exit

We will perform the necessary preparations for the plugin to work (this must be done every time you change something in the models in the is_indexed description): We start and test)

$ rake ultrasphinx:configure

$ rake ultrasphinx:index

$ rake ultrasphinx:daemon:start (либо restart если уже запущен)

$ mongrel_rails -p 3001 -d

One small but

Indices separately data in the database separately. When we delete / modify / add to the database indexes do not change. In order for the base changes to be reflected in the indices, the full reindexing of the base: Of course, this is not very good. But I can assure you that a solution to the problem exists and is called delta indexing. About it in the next part.

$ rake ultrasphinx:index

Summary

Sphinx is a very cool thing :). Open source, free, smartly searches and indexes. Must have!

Analogs

I can mention acts_as_ferret as an analogue, it is ideal for small projects (for example, we used it at Hackfest Rambler ), but for large amounts of data it behaves mildly unimportantly - it takes a very long time to index it.
There is a seemingly not a bad plugin for post-tsearch2 tsearch2: Acts as tsearch , I didn’t use it in battle, I don’t know. Still have acts_as_solr

Tags: