Smart bookmarks based on Elasticsearch

    From time to time I began to notice that I could not find the necessary article that I had seen before.
    Everything seems to be simple - according to the remembered information, the article can be easily found. But no. A search on Google often yields nothing. I only remember snippets of content, and search results contain a lot of noise.

    This is true at work. We used Skype to store and exchange useful links to various Github projects, articles, services, but now we began to use Yammer for these purposes. Both of these methods have their drawbacks. The main disadvantage of Skype for link exchange is the complexity of the history search. Yammer's problem is that it does not index the article text, but only a snippet. None of them have the ability to automatically categorize.

    In my free time, I wrote an application specifically tailored for finding articles. Its features:
    • add an article with one button from the browser
    • automatic categorization
    • Russian and English morphology
    • view article text
    • search query operators

    3 feeds are available for registered users: all articles (all), personal selection (selected), added articles (stars). A link to edit a personal feed will appear in the menu after registration. In the same drop-down list to the right of the search bar, you can specify a filter by category.

    The main technologies used for development: Ruby on Rails, Sidekiq, Elasticsearch, PostgeSQL.

    To implement a quality search, I used the morphology plugin and gem readability, which extracts important content from the source page.

    The definition of the category is as follows. Articles from the category “web development” contain the terms: html, html5, css, css3, javascript, js and others. Accordingly, in order to find articles on web development, you need to complete a query with a list of these keywords. In elastic there are 2 suitable types of queries: query string and simple query string, I chose the latter, because he will never throw an exception and drop the invalid part of the request.
    Web development category query example
        javascript* jQuery coffeescript
        bootstrap foundation
        backbone* angularjs
        less sass scss
        adaptive responsive
        html* haml DOM
        frontend "front-end"
        "image placeholder"
        mozilla firefox chrome opera

    This way you can find documents that fall into the category. This raises the opposite question - how to find the categories in which this or that document is included? Elasticsearch allows you to change documents and queries among themselves. A category is a saved query and now you can ask which categories are suitable for a given article. This is exactly the type of request, and if you add a new article or category, the changes will take effect immediately.

    I thought for a long time how simple and convenient it is to implement the addition of new categories. I would like to have a convenient query editor, the possibility of moderation, as well as evaluate the contribution of each user. There were many different thoughts, and in the end I settled on a repository on a github. Github allows you to fork the repository and edit categories online. To check the correctness of the category file, there is an rspec test, which automatically runs on travis-ci when sending a pull request.

    Also popular now: