Making the simplest filter on product properties using ElasticSearch on Symfony2

  • Tutorial
I was inspired to write this article by the lack of a ready-made step-by-step guide on the Internet “how to implement a product filter on ElasticSearch”, and the task to do this was clear and unshakable for me. It was possible to find fragmentary reference information, but not a cookbook for solving the most trivial problems.

I focus on symfony2, because I will use FOSElasticaBundle, which allows you to describe the mapping of elasticsearch indexes in convenient yaml configs and bind Doctrine ORM entities or Doctrine ODM documents to them. Industrialized indexes are populated from related doctrine entities with a single console command. In addition, it includes a vendor library for constructing search and filtering queries. Search results are returned as an array of entity objects or a Doctrine ORM / ODM document tied to the search index. More about FOSElasticaBundle, traditionally, on the github: github.com/FriendsOfSymfony/FOSElasticaBundle

Using a bundle allows you to completely disengage from manipulating with pure JSON, encode and decode something with the json_encode and json_decode functions, climb somewhere with the help of curl. Here is the only OOP approach!

A bit about data schema in SQL

Since my goods are stored in a relational DBMS, I needed to implement an EAV model for their properties and values ​​(more: en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model )

As a result, I got here such data scheme:
image


database dump: drive.google.com/file/d/0B30Ybwiexqx6S1hCanpISHVvcjQ/edit?usp=sharing
Using it we will create doctrine entities and we will map them to ElasticSearch.

Mapim EAV model in ElasticSearch

So, first install FOSElasticaBundle. In composer.json you need to specify:

"friendsofsymfony/elastica-bundle": "dev-master"


We update the dependencies and register the established bundle in AppKernel.php:

new FOS\ElasticaBundle\FOSElasticaBundle()


Now we prescribe the following settings in config.yml:

fos_elastica:
    clients:
            default: { host: localhost, port: 9200 }
    indexes:
        test:
            types:
	     product:
                    mappings:
                        name: ~
                        price: ~
                        category: ~
                        productsOptionValues:
                            type: "object"
                            properties:
                                productOption: 
                                      index: not_analyzed
                                value:
                                    type: string
                                    index: not_analyzed
                    persistence:
                        driver: orm
                        model: Vendor\TestBundle\Entity\Product
                        provider: ~
                        listener:
                            immediate: ~
                        finder: ~


To fill the index created above with data, execute the console command php app / console fos: elastica: populate. As a result, FOSElasticaBundle will populate the index with data from the database.

Note: Inside the product as an embedded object, we enclose the characteristics and their values. For everything to work as it should, you should specify exactly type: "object" instead of type: "nested" for the collection of characteristics of productsOptionValues. Otherwise, the characteristics will be stored as arrays as described here: www.elasticsearch.org/guide/en/elasticsearch/guide/current/complex-core-fields.html#_arrays_of_inner_objectsand the filter will not work properly. It should also be noted that the filtered fields should not be analyzed, for which the string index: not_analyzed is responsible. Otherwise, problems will arise when filtering strings containing spaces.

Now you can see the list of products with the characteristics embedded in them at localhost: 9200 / test / product / _search? Pretty In my case, the server response looks like this:
gist.github.com/ArFeRR/3976778079d64d5a72cd

Render a filtering form.


The form itself looks like this:


In the controller, we will execute requests for all properties and products, declare an empty filter array and pass it all to the TWIG template:

$options = $entityManager->getRepository("ParfumsTestBundle:ProductOption")->findAll();
$products = $entityManager->getRepository("ParfumsTestBundle:Product")->findAll();
$request = $this->get('request');
$filter = $request->query->get('filter');
return $this->render('ParfumsTestBundle:Default:filter.html.twig', array('options'=>$options, 'products' => $products, 'filter' => $filter));


Here you should group by property names to avoid duplicating them on the form, but I do not do this to save space. Write the DQL query to your entity / document repository yourself. FindAll product request is needed to display the entire list of products if nothing is selected on the filter.

And here is twig itself:
{% extends "TwigBundle::layout.html.twig" %}
{% block body %}
    

Фильтр

    {% for option in options %}
  • {{ option.name }}
      {% for value in option.productsOptionValues %}
    • {{ value.value }}
    • {% endfor %}
  • {% endfor %}

Товары

{% for product in products %} {% endfor %}
{{ product.name }}{{ product.price }} {% for option_value in product.productsOptionValues %} {{ option_value.productOption }} : {{ option_value.value }}
{% endfor %}
{% endblock %}


We process the form of filtration

Let's get down to the fun part.
Now we will need to construct a search query (or, more precisely, a JSON filter) that will be passed to ElasticSearch'y for processing. This is done using the Elastica.io library built into FOSElasticaBundle (more: elastica.io ).
So, in the action of your controller, we process the filtering array received from the form:

if(!empty($filter))
        {
            $finder = $this->container->get('fos_elastica.finder.parfums.product');
            $andOuter = new \Elastica\Filter\Bool();
            foreach($filter as $optionKey=>$arrValues)
            {
                $orOuter = new \Elastica\Filter\Bool();
                foreach($arrValues as $value)
                {
                    $andInner = new \Elastica\Filter\Bool();
                    $optionKeyTerm = new \Elastica\Filter\Term();
                    $optionKeyTerm->setTerm('productOptionValues.productOption', $optionKey);
                    $valueTerm = new \Elastica\Filter\Term();
                    $valueTerm->setTerm('productOptionValues.value', $value);
                    $andInner->addMust($optionKeyTerm);
                    $andInner->addMust($valueTerm);
                    $orOuter->addShould($andInner);
                }
                $andOuter->addMust($orOuter);
            }
            $filtered = new \Elastica\Query\Filtered();
            $filtered->setFilter($andOuter);           
            $products = $finder->find($filtered);
        }


Here I get the array passed through the address bar and iterate over the filter values ​​selected by the user to create a tree structure of class objects according to which the Elastica library will generate a JSON line according to which ElasticSearch will filter our data set:
gist.github.com/ArFeRR/97671e54515dfd7be012

This JSON approximately corresponds to the following condition in a relational database:
WHERE ((option = resolution AND value = 1980x1020) OR (option = resolution AND value = 1600x900)) AND (option = weight AND value = 2.7 kg)

As a result, we must receive goods for which the weight must be identical and at least one section ix two user-selected screen. In my data set - this is only 1 product.



It seems that everything is working correctly.

The above filtering example can be further developed. The next step should be the implementation of sorting the results by relevance, their pagination and setting up aggregations (private implementation of facets in ES). I will write about this later, if it will be interesting to the Habr community.

upd0:
At the request of readers, the filter form handler was rewritten using the safe object Symfony \ Component \ HttpFoundation \ Request. It should be embedded in the action (passed as a parameter) or obtained from the service via $ request = $ this-> get ('request') in the action.

Also popular now: