The Flask Mega-Tutorial, Part 10: Full-Text Search

Original author: Miguel Grinberg
  • Transfer
  • Tutorial
This is the tenth article in a series where I describe my experience writing a Python web application using the Flask microframework.

The purpose of this guide is to develop a fairly functional microblogging application, which I decided to name for a complete lack of originality microblog.



Brief repetition


In the previous article, we improved our queries so that they returned posts to the page.

Today we will continue to work with our database, but for a different purpose. All applications that store content should be searchable.

For many types of websites, you can simply enable Google, Bing, etc. index everything and provide search results. This works well with sites that are based on static pages, such as a forum. In our small application, the basic unit of content is a short user post, not a whole page. We want a more dynamic search result. For example, if we search for the word "dog" we want to see all user messages that include this word. Obviously, the page of the search result does not exist until no one conducts the search, so the search engines will not be able to index it.

Introduction to Full Text Search Systems


Unfortunately, support for full-text search in relational databases is not standardized. Each database implements full-text search in its own way, and SQLAlchemy does not have a suitable abstraction for this case.

We are now using SQLite for our database, so we could just create a full-text index using the capabilities provided by SQLite, bypassing SQLAlchemy. But this is a bad idea, because if one day we decide to switch to another database, we will have to rewrite our full-text search for another database.

Instead, we are going to leave our base for working with regular data, and create a specialized database for search.

There are several open source full-text search engines. Only one, as far as I know, has a Flask extension called Whoosh, and its engine is also written in Python. The advantage of using pure Python is the ability to install it and run wherever Python is available. The disadvantage is search efficiency, which cannot be compared with engines written in C or C ++. In my opinion, it would be an ideal solution to have an extension for Flask that can connect to different systems and abstract us from the details, as Flask-SQLAlchemy does, freeing us from the nuances of various databases, but in the field of full-text search there is nothing like it yet. Django developers have a very nice extension that supports various full-text search engines called django-haystack.

But now, we are realizing our search with Whoosh. The extension we are going to use is Flask-WhooshAlchemy, which combines the Whoosh base with the Flask-SQLAlchemy model.

If you don't have Flask-WhooshAlchemy in your virtual environment yet, then it's time to install it. Windows users should do this:

flask\Scripts\pip install Flask-WhooshAlchemy


All others can do this:

flask/bin/pip install Flask-WhooshAlchemy


Configuration


The configuration of Flask-WhooshAlchemy is very simple. We just have to tell the extension the name of our full-text search base (file config.py):

WHOOSH_BASE = os.path.join(basedir, 'search.db')


Model changes


Since Flask-WhooshAlchemy integrates Flask-SQLAlchemy, we need to specify what data should be indexed in which models (file app/models.py):

from app import app
import flask.ext.whooshalchemy as whooshalchemy
classPost(db.Model):
    __searchable__ = ['body']
    id = db.Column(db.Integer, primary_key = True)
    body = db.Column(db.String(140))
    timestamp = db.Column(db.DateTime)
    user_id = db.Column(db.Integer, db.ForeignKey('user.id'))
    def__repr__(self):return'<Post %r>' % (self.body)
whooshalchemy.whoosh_index(app, Post)


The model now has a new field __searchable__, which is an array with all the fields of the database that should be in the index. In our case, we only need the index of the body field of our post.

We also initialize the full-text index for this model by calling a function whoosh_index.

Since we did not change the format of our database, we do not need to do a new migration.

Unfortunately, all the posts that were in the database before the addition of the full-text search engine will not be indexed. To make sure that the database and the search engine are synchronized, we must delete all posts from the database and start over. First, run the Python interpreter. For Windows users:

flask\Scripts\python

For everyone else:

flask/bin/python

With this request we delete all posts:

>>> from app.models import Post
>>> from app import db
>>> for post in Post.query.all():
...    db.session.delete(post)
>>> db.session.commit()


Search


Now we are ready to search. Let's first add some posts to the database. We have two ways to do this. We can run the application and add posts via a web browser as a regular user, or we can do this through an interpreter.

Through the interpreter, we can do this as follows:

>>> from app.models import User, Post
>>> from app import db
>>> import datetime
>>> u = User.query.get(1)
>>> p = Post(body='my first post', timestamp=datetime.datetime.utcnow(), author=u)
>>> db.session.add(p)
>>> p = Post(body='my second post', timestamp=datetime.datetime.utcnow(), author=u)
>>> db.session.add(p)
>>> p = Post(body='my third and last post', timestamp=datetime.datetime.utcnow(), author=u)
>>> db.session.add(p)
>>> db.session.commit()

The Flask-WhooshAlchemy extension is very cool because it connects to Flask-SQLAlchemy automatically. We do not need to maintain a full-text search index, everything is done transparently for us.

Now we have several posts indexed for full-text search and can try searching:

>>> Post.query.whoosh_search('post').all()
[<Post u'my second post'>, <Post u'my first post'>, <Post u'my third and last post'>]
>>> Post.query.whoosh_search('second').all()
[<Post u'my second post'>]
>>> Post.query.whoosh_search('second OR last').all()
[<Post u'my second post'>, <Post u'my third and last post'>]

As you can see in the examples, queries need not be limited to single words. In fact, Whoosh supports an excellent search query language .

Integration of full-text search in our application


To make the search accessible to users of our application, we need to make a few small changes.

Configuration


In the configuration, we must indicate how many search results need to be returned (file config.py):

MAX_SEARCH_RESULTS = 50


Search form


We are going to add a search form to the navigation bar at the top of the page. The location at the top is very good, as the search will be available from all pages.

First we need to add a search form class (file app/forms.py):

classSearchForm(Form):
    search = TextField('search', validators = [Required()])


Then we need to create a search form object and make it available for all templates. Put it in the navigation bar, which is common to all pages. An easy way to achieve this is to create a form in the handler before_request, and paste it into a global variable g (file app/views.py):

from forms import SearchForm
@app.before_requestdefbefore_request():
    g.user = current_user
    if g.user.is_authenticated():
        g.user.last_seen = datetime.utcnow()
        db.session.add(g.user)
        db.session.commit()
        g.search_form = SearchForm()


Then we will add the form to our template (file app/templates/base.html):

<div>Microblog:
    <ahref="{{ url_for('index') }}">Home</a>{% if g.user.is_authenticated() %}
    | <ahref="{{ url_for('user', nickname = g.user.nickname) }}">Your Profile</a>
    | <formstyle="display: inline;"action="{{url_for('search')}}"method="post"name="search">{{g.search_form.hidden_tag()}}{{g.search_form.search(size=20)}}<inputtype="submit"value="Search"></form>
    | <ahref="{{ url_for('logout') }}">Logout</a>{% endif %}</div>


Please note we only display the search form when the user is logged in. In the same way, the handler before_requestwill create the form only when the user is logged in, because our application does not show any content to unauthorized guests.

View Search Function


The field actionfor our form was set above to send all requests to the function of searchour submission. This is where we will execute our full-text queries (file app/views.py):

@app.route('/search', methods = ['POST'])@login_requireddefsearch():ifnot g.search_form.validate_on_submit():
        return redirect(url_for('index'))
    return redirect(url_for('search_results', query = g.search_form.search.data))


This function is actually not that big, it just collects the request from the form and redirects it to another page that takes the request as an argument. We do not search directly in this function so that the user's browser does not give a warning about the form being re-submitted if the user tries to refresh the page. This situation can be avoided by redirecting to a POST request, then when the page is refreshed, the browser will update the page to which the redirect was made, and not the request itself.

Results Page


After the request string is submitted by the form, the POST handler passes it through a redirect to the handler search_results(file app/views.py):

from config import MAX_SEARCH_RESULTS
@app.route('/search_results/<query>')@login_requireddefsearch_results(query):
    results = Post.query.whoosh_search(query, MAX_SEARCH_RESULTS).all()
    return render_template('search_results.html',
        query = query,
        results = results)


The function search_resultsends a request to Whoosh, passing along with the request a limit on the number of results to protect against a potentially large number of search results.

The search ends in the search_result template (file app/templates/search_results.html):

<!-- extend base layout -->{% extends "base.html" %}{% block content %}<h1>Search results for "{{query}}":</h1>{% for post in results %}{% include 'post.html' %}{% endfor %}{% endblock %}


And here we can reuse ours again post.html.

Final words


We have just completed another very important, albeit often overlooked, feature that a decent web application should have.

Below I post the updated version of the microblog application with all the changes made in this article.

Download microblog-0.10.zip .

As always, there is no database, you must create it yourself. If you follow this series of articles, you know how to do it. If not, then return to the article on the database to find out.

I hope you enjoyed this tutorial.

Miguel

Also popular now: