
Modern Tornado Part 2: Blocking Operations
- Tutorial

Application configuration
The Application constructor accepts the configuration parameters with keyword arguments. We have already encountered this by passing the
debug=True
second parameter to the Application constructor. However, you should not hardcode such settings, otherwise how to run a script on production, where should this parameter obviously be False
? The standard technique for django and other python frameworks is to store the general configuration in a file settings.py
, at the end of which to import settings_local.py
, overwriting the settings specific to this environment. Of course, you can well use this trick, but in tornado there is the ability to change specific settings using command-line options. Let's see how this is implemented:from tornado.options import define, options
define('port', default=8000, help='run on the given port', type=int)
define('db_uri', default='localhost', help='mongodb uri')
define('db_name', default='habr_tornado', help='name of database')
define('debug', default=True, help='debug mode', type=bool)
options.parse_command_line()
db = motor.MotorClient(options.db_uri)[options.db_name]
With help,
define
we define parameters in optparse syntax . And then in the right place we get them using options. By calling, options.parse_command_line()
we overwrite the default values of the parameters with data from the command line. That is, on production, for us now it is enough to run the application with the parameter --debug=False
. A launch with a parameter --help
will show us all the possible parameters:$python3 app.py --help
Usage: app.py [OPTIONS]
Options:
--db_name name of database (default habr_tornado)
--db_uri mongodb uri (default localhost)
--debug debug mode (default True)
--help show this help information
--port run on the given port (default 8000)
/home/imbolc/.pyenv/versions/3.4.0/lib/python3.4/site-packages/tornado/log.py options:
--log_file_max_size max size of log files before rollover
(default 100000000)
--log_file_num_backups number of log files to keep (default 10)
--log_file_prefix=PATH Path prefix for log files. Note that if you
are running multiple tornado processes,
log_file_prefix must be different for each
of them (e.g. include the port number)
--log_to_stderr Send log output to stderr (colorized if
possible). By default use stderr if
--log_file_prefix is not set and no other
logging is configured.
--logging=debug|info|warning|error|none
Set the Python log level. If 'none', tornado
won't touch the logging configuration.
(default info)
As you can see, the tornado automatically added logging options.
CSRF
Now add to the application settings
xsrf_cookies=True
. Try to get a new image, we will see the error: HTTP 403: Forbidden ('\_xsrf' argument missing from POST)
. This worked csrf protection . To restore the application it is sufficient to form download add {% module xsrf_form_html() %}
in the html code of your page it will turn into something like:
.Image Thumbnails
When displaying thumbnails in the list of recent images, we used full images for simplicity. It is time to fix this moment. We will need a pillow (this is a modern fork of PIL, the famous library for working with images):
pip3 install pillow
However, a tornado is single-threaded and such a resource-intensive operation as image processing will negate all our dances with asynchrony. The simplest solution is to put this task in a separate thread:
import os
import io
from concurrent.futures import ThreadPoolExecutor
from PIL import Image
class UploadHandler(web.RequestHandler):
executor = ThreadPoolExecutor(max_workers=os.cpu_count())
@gen.coroutine
def post(self):
file = self.request.files['file'][0]
try:
thumbnail = yield self.make_thumbnail(file.body)
except OSError:
raise web.HTTPError(400, 'Cannot identify image file')
orig_id, thumb_id = yield [
gridfs.put(file.body, content_type=file.content_type),
gridfs.put(thumbnail, content_type='image/png')]
yield db.imgs.save({'orig': orig_id, 'thumb': thumb_id})
self.redirect('')
@run_on_executor
def make_thumbnail(self, content):
im = Image.open(io.BytesIO(content))
im.convert('RGB')
im.thumbnail((128, 128), Image.ANTIALIAS)
with io.BytesIO() as output:
im.save(output, 'PNG')
return output.getvalue()
First, we create a worker pool with a limited number of cpu cores (this is optimal for processor-intensive tasks such as image processing). And if more images are loaded at the same time, the rest will wait in line. Then we create a thumbnail asynchronously, calling our method
make_thumbnail
, wrapped by the run_on_executor decorator , which will cause the task to be executed in one of the executor threads. Notice how beautifully we catch the exception
OSError
that the pillow throws if it cannot recognize the image format. We do not need to explicitly pass an error in the response as it is done in case of callback asynchrony (for example, in node.js). Simple, we work with exceptions in a synchronous style.Next, we save the original image and thumbnail in gridfs . Note that instead of calling in sequence:
orig_id = yield gridfs.put(file.body, content_type=file.content_type)
thumb_id = yield gridfs.put(thumbnail, content_type='image/png')
We use parallel
orig_id, thumb_id = yield [ ... ]
. That is, files are saved at the same time. Such a parallel call to corutin makes sense for any operations that are not dependent on each other. For example, we could combine the creation of a thumbnail with the preservation of the original, but we won’t be able to combine the creation and preservation of the thumbnail since the second operation depends on the results of the first. Finally, we save the image information to the collection
imgs
. This collection is needed to link the thumbnail and the original image. Also in the future there you can store any information about the image: the author, access rights, etc. With the advent of this collection, the methods for displaying a list and an individual image will change accordingly:class UploadHandler(web.RequestHandler):
...
@gen.coroutine
def get(self):
imgs = yield db.imgs.find().sort('_id', -1).to_list(20)
self.render('upload.html', imgs=imgs)
class ShowImageHandler(web.RequestHandler):
@gen.coroutine
def get(self, img_id, size):
try:
img_id = bson.objectid.ObjectId(img_id)
except bson.errors.InvalidId:
raise web.HTTPError(404, 'Bad ObjectId')
img = yield db.imgs.find_one(img_id)
if not img:
raise web.HTTPError(404, 'Image not found')
gridout = yield gridfs.get(img[size])
self.set_header('Content-Type', gridout.content_type)
self.set_header('Content-Length', gridout.length)
yield gridout.stream_to_handler(self)
As you see,
ShowImageHandler.get
now it receives an additional parameter size
- specifying whether we want to get a thumbnail of the image or the original. The regularity of the url has changed accordingly:web.url(r'/imgs/([\w\d]+)/(orig|thumb)', ShowImageHandler,
name='show_image'),
And restoring these url in the template:
{% for img in imgs %}
{% end %}
Conclusion
That's it for today, the code for this and the previous part is available on github .