The Flask Mega-Tutorial, Part 10: Full-Text Search
- Transfer
- Tutorial
This is the tenth article in a series where I describe my experience writing a Python web application using the Flask microframework.
The purpose of this guide is to develop a fairly functional microblogging application, which I decided to name for a complete lack of originality
In the previous article, we improved our queries so that they returned posts to the page.
Today we will continue to work with our database, but for a different purpose. All applications that store content should be searchable.
For many types of websites, you can simply enable Google, Bing, etc. index everything and provide search results. This works well with sites that are based on static pages, such as a forum. In our small application, the basic unit of content is a short user post, not a whole page. We want a more dynamic search result. For example, if we search for the word "dog" we want to see all user messages that include this word. Obviously, the page of the search result does not exist until no one conducts the search, so the search engines will not be able to index it.
Unfortunately, support for full-text search in relational databases is not standardized. Each database implements full-text search in its own way, and SQLAlchemy does not have a suitable abstraction for this case.
We are now using SQLite for our database, so we could just create a full-text index using the capabilities provided by SQLite, bypassing SQLAlchemy. But this is a bad idea, because if one day we decide to switch to another database, we will have to rewrite our full-text search for another database.
Instead, we are going to leave our base for working with regular data, and create a specialized database for search.
There are several open source full-text search engines. Only one, as far as I know, has a Flask extension called Whoosh, and its engine is also written in Python. The advantage of using pure Python is the ability to install it and run wherever Python is available. The disadvantage is search efficiency, which cannot be compared with engines written in C or C ++. In my opinion, it would be an ideal solution to have an extension for Flask that can connect to different systems and abstract us from the details, as Flask-SQLAlchemy does, freeing us from the nuances of various databases, but in the field of full-text search there is nothing like it yet. Django developers have a very nice extension that supports various full-text search engines called django-haystack.
But now, we are realizing our search with Whoosh. The extension we are going to use is Flask-WhooshAlchemy, which combines the Whoosh base with the Flask-SQLAlchemy model.
If you don't have Flask-WhooshAlchemy in your virtual environment yet, then it's time to install it. Windows users should do this:
All others can do this:
The configuration of Flask-WhooshAlchemy is very simple. We just have to tell the extension the name of our full-text search base (file
Since Flask-WhooshAlchemy integrates Flask-SQLAlchemy, we need to specify what data should be indexed in which models (file
The model now has a new field
We also initialize the full-text index for this model by calling a function
Since we did not change the format of our database, we do not need to do a new migration.
Unfortunately, all the posts that were in the database before the addition of the full-text search engine will not be indexed. To make sure that the database and the search engine are synchronized, we must delete all posts from the database and start over. First, run the Python interpreter. For Windows users:
For everyone else:
With this request we delete all posts:
Now we are ready to search. Let's first add some posts to the database. We have two ways to do this. We can run the application and add posts via a web browser as a regular user, or we can do this through an interpreter.
Through the interpreter, we can do this as follows:
The Flask-WhooshAlchemy extension is very cool because it connects to Flask-SQLAlchemy automatically. We do not need to maintain a full-text search index, everything is done transparently for us.
Now we have several posts indexed for full-text search and can try searching:
As you can see in the examples, queries need not be limited to single words. In fact, Whoosh supports an excellent search query language .
To make the search accessible to users of our application, we need to make a few small changes.
In the configuration, we must indicate how many search results need to be returned (file
We are going to add a search form to the navigation bar at the top of the page. The location at the top is very good, as the search will be available from all pages.
First we need to add a search form class (file
Then we need to create a search form object and make it available for all templates. Put it in the navigation bar, which is common to all pages. An easy way to achieve this is to create a form in the handler
Then we will add the form to our template (file
Please note we only display the search form when the user is logged in. In the same way, the handler
The field
This function is actually not that big, it just collects the request from the form and redirects it to another page that takes the request as an argument. We do not search directly in this function so that the user's browser does not give a warning about the form being re-submitted if the user tries to refresh the page. This situation can be avoided by redirecting to a POST request, then when the page is refreshed, the browser will update the page to which the redirect was made, and not the request itself.
After the request string is submitted by the form, the POST handler passes it through a redirect to the handler
The function
The search ends in the search_result template (file
And here we can reuse ours again
We have just completed another very important, albeit often overlooked, feature that a decent web application should have.
Below I post the updated version of the microblog application with all the changes made in this article.
Download microblog-0.10.zip .
As always, there is no database, you must create it yourself. If you follow this series of articles, you know how to do it. If not, then return to the article on the database to find out.
I hope you enjoyed this tutorial.
Miguel
The purpose of this guide is to develop a fairly functional microblogging application, which I decided to name for a complete lack of originality
microblog
.Table of contents
Часть 1: Привет, Мир!
Часть 2: Шаблоны
Часть 3: Формы
Часть 4: База данных
Часть 5: Вход пользователей
Часть 6: Страница профиля и аватары
Часть 7: Unit-тестирование
Часть 8: Подписчики, контакты и друзья
Часть 9: Пагинация
Часть 10: Полнотекстовый поиск(данная статья)
Часть 11: Поддержка e-mail
Часть 12: Реконструкция
Часть 13: Дата и время
Часть 14: I18n and L10n
Часть 15: Ajax
Часть 16: Отладка, тестирование и профилирование
Часть 17: Развертывание на Linux (и даже на Raspberry Pi!)
Часть 18: Развертывание на Heroku Cloud
Часть 2: Шаблоны
Часть 3: Формы
Часть 4: База данных
Часть 5: Вход пользователей
Часть 6: Страница профиля и аватары
Часть 7: Unit-тестирование
Часть 8: Подписчики, контакты и друзья
Часть 9: Пагинация
Часть 10: Полнотекстовый поиск(данная статья)
Часть 11: Поддержка e-mail
Часть 12: Реконструкция
Часть 13: Дата и время
Часть 14: I18n and L10n
Часть 15: Ajax
Часть 16: Отладка, тестирование и профилирование
Часть 17: Развертывание на Linux (и даже на Raspberry Pi!)
Часть 18: Развертывание на Heroku Cloud
Brief repetition
In the previous article, we improved our queries so that they returned posts to the page.
Today we will continue to work with our database, but for a different purpose. All applications that store content should be searchable.
For many types of websites, you can simply enable Google, Bing, etc. index everything and provide search results. This works well with sites that are based on static pages, such as a forum. In our small application, the basic unit of content is a short user post, not a whole page. We want a more dynamic search result. For example, if we search for the word "dog" we want to see all user messages that include this word. Obviously, the page of the search result does not exist until no one conducts the search, so the search engines will not be able to index it.
Introduction to Full Text Search Systems
Unfortunately, support for full-text search in relational databases is not standardized. Each database implements full-text search in its own way, and SQLAlchemy does not have a suitable abstraction for this case.
We are now using SQLite for our database, so we could just create a full-text index using the capabilities provided by SQLite, bypassing SQLAlchemy. But this is a bad idea, because if one day we decide to switch to another database, we will have to rewrite our full-text search for another database.
Instead, we are going to leave our base for working with regular data, and create a specialized database for search.
There are several open source full-text search engines. Only one, as far as I know, has a Flask extension called Whoosh, and its engine is also written in Python. The advantage of using pure Python is the ability to install it and run wherever Python is available. The disadvantage is search efficiency, which cannot be compared with engines written in C or C ++. In my opinion, it would be an ideal solution to have an extension for Flask that can connect to different systems and abstract us from the details, as Flask-SQLAlchemy does, freeing us from the nuances of various databases, but in the field of full-text search there is nothing like it yet. Django developers have a very nice extension that supports various full-text search engines called django-haystack.
But now, we are realizing our search with Whoosh. The extension we are going to use is Flask-WhooshAlchemy, which combines the Whoosh base with the Flask-SQLAlchemy model.
If you don't have Flask-WhooshAlchemy in your virtual environment yet, then it's time to install it. Windows users should do this:
flask\Scripts\pip install Flask-WhooshAlchemy
All others can do this:
flask/bin/pip install Flask-WhooshAlchemy
Configuration
The configuration of Flask-WhooshAlchemy is very simple. We just have to tell the extension the name of our full-text search base (file
config.py
):WHOOSH_BASE = os.path.join(basedir, 'search.db')
Model changes
Since Flask-WhooshAlchemy integrates Flask-SQLAlchemy, we need to specify what data should be indexed in which models (file
app/models.py
):from app import app
import flask.ext.whooshalchemy as whooshalchemy
classPost(db.Model):
__searchable__ = ['body']
id = db.Column(db.Integer, primary_key = True)
body = db.Column(db.String(140))
timestamp = db.Column(db.DateTime)
user_id = db.Column(db.Integer, db.ForeignKey('user.id'))
def__repr__(self):return'<Post %r>' % (self.body)
whooshalchemy.whoosh_index(app, Post)
The model now has a new field
__searchable__
, which is an array with all the fields of the database that should be in the index. In our case, we only need the index of the body field of our post. We also initialize the full-text index for this model by calling a function
whoosh_index
. Since we did not change the format of our database, we do not need to do a new migration.
Unfortunately, all the posts that were in the database before the addition of the full-text search engine will not be indexed. To make sure that the database and the search engine are synchronized, we must delete all posts from the database and start over. First, run the Python interpreter. For Windows users:
flask\Scripts\python
For everyone else:
flask/bin/python
With this request we delete all posts:
>>> from app.models import Post
>>> from app import db
>>> for post in Post.query.all():
... db.session.delete(post)
>>> db.session.commit()
Search
Now we are ready to search. Let's first add some posts to the database. We have two ways to do this. We can run the application and add posts via a web browser as a regular user, or we can do this through an interpreter.
Through the interpreter, we can do this as follows:
>>> from app.models import User, Post
>>> from app import db
>>> import datetime
>>> u = User.query.get(1)
>>> p = Post(body='my first post', timestamp=datetime.datetime.utcnow(), author=u)
>>> db.session.add(p)
>>> p = Post(body='my second post', timestamp=datetime.datetime.utcnow(), author=u)
>>> db.session.add(p)
>>> p = Post(body='my third and last post', timestamp=datetime.datetime.utcnow(), author=u)
>>> db.session.add(p)
>>> db.session.commit()
The Flask-WhooshAlchemy extension is very cool because it connects to Flask-SQLAlchemy automatically. We do not need to maintain a full-text search index, everything is done transparently for us.
Now we have several posts indexed for full-text search and can try searching:
>>> Post.query.whoosh_search('post').all()
[<Post u'my second post'>, <Post u'my first post'>, <Post u'my third and last post'>]
>>> Post.query.whoosh_search('second').all()
[<Post u'my second post'>]
>>> Post.query.whoosh_search('second OR last').all()
[<Post u'my second post'>, <Post u'my third and last post'>]
As you can see in the examples, queries need not be limited to single words. In fact, Whoosh supports an excellent search query language .
Integration of full-text search in our application
To make the search accessible to users of our application, we need to make a few small changes.
Configuration
In the configuration, we must indicate how many search results need to be returned (file
config.py
):MAX_SEARCH_RESULTS = 50
Search form
We are going to add a search form to the navigation bar at the top of the page. The location at the top is very good, as the search will be available from all pages.
First we need to add a search form class (file
app/forms.py
):classSearchForm(Form):
search = TextField('search', validators = [Required()])
Then we need to create a search form object and make it available for all templates. Put it in the navigation bar, which is common to all pages. An easy way to achieve this is to create a form in the handler
before_request
, and paste it into a global variable g
(file app/views.py
):from forms import SearchForm
@app.before_requestdefbefore_request():
g.user = current_user
if g.user.is_authenticated():
g.user.last_seen = datetime.utcnow()
db.session.add(g.user)
db.session.commit()
g.search_form = SearchForm()
Then we will add the form to our template (file
app/templates/base.html
):<div>Microblog:
<ahref="{{ url_for('index') }}">Home</a>{% if g.user.is_authenticated() %}
| <ahref="{{ url_for('user', nickname = g.user.nickname) }}">Your Profile</a>
| <formstyle="display: inline;"action="{{url_for('search')}}"method="post"name="search">{{g.search_form.hidden_tag()}}{{g.search_form.search(size=20)}}<inputtype="submit"value="Search"></form>
| <ahref="{{ url_for('logout') }}">Logout</a>{% endif %}</div>
Please note we only display the search form when the user is logged in. In the same way, the handler
before_request
will create the form only when the user is logged in, because our application does not show any content to unauthorized guests.View Search Function
The field
action
for our form was set above to send all requests to the function of search
our submission. This is where we will execute our full-text queries (file app/views.py
):@app.route('/search', methods = ['POST'])@login_requireddefsearch():ifnot g.search_form.validate_on_submit():
return redirect(url_for('index'))
return redirect(url_for('search_results', query = g.search_form.search.data))
This function is actually not that big, it just collects the request from the form and redirects it to another page that takes the request as an argument. We do not search directly in this function so that the user's browser does not give a warning about the form being re-submitted if the user tries to refresh the page. This situation can be avoided by redirecting to a POST request, then when the page is refreshed, the browser will update the page to which the redirect was made, and not the request itself.
Results Page
After the request string is submitted by the form, the POST handler passes it through a redirect to the handler
search_results
(file app/views.py
):from config import MAX_SEARCH_RESULTS
@app.route('/search_results/<query>')@login_requireddefsearch_results(query):
results = Post.query.whoosh_search(query, MAX_SEARCH_RESULTS).all()
return render_template('search_results.html',
query = query,
results = results)
The function
search_result
sends a request to Whoosh, passing along with the request a limit on the number of results to protect against a potentially large number of search results. The search ends in the search_result template (file
app/templates/search_results.html
):<!-- extend base layout -->{% extends "base.html" %}{% block content %}<h1>Search results for "{{query}}":</h1>{% for post in results %}{% include 'post.html' %}{% endfor %}{% endblock %}
And here we can reuse ours again
post.html
.Final words
We have just completed another very important, albeit often overlooked, feature that a decent web application should have.
Below I post the updated version of the microblog application with all the changes made in this article.
Download microblog-0.10.zip .
As always, there is no database, you must create it yourself. If you follow this series of articles, you know how to do it. If not, then return to the article on the database to find out.
I hope you enjoyed this tutorial.
Miguel