Imbolc July 27, 2014 at 15:43

Modern Tornado Part 2: Blocking Operations

Tutorial

Improving our distributed image hosting . In this part, we will talk about configuring the application and enable csrf protection. Then, using the example of creating thumbnails of images, we will learn how to work with blocking tasks, run coroutines in parallel and handle the exceptions that arise in them.

Application configuration

The Application constructor accepts the configuration parameters with keyword arguments. We have already encountered this by passing the debug=Truesecond parameter to the Application constructor. However, you should not hardcode such settings, otherwise how to run a script on production, where should this parameter obviously be False? The standard technique for django and other python frameworks is to store the general configuration in a file settings.py, at the end of which to import settings_local.py, overwriting the settings specific to this environment. Of course, you can well use this trick, but in tornado there is the ability to change specific settings using command-line options. Let's see how this is implemented:

from tornado.options import define, options 
define('port', default=8000, help='run on the given port', type=int) 
define('db_uri', default='localhost', help='mongodb uri') 
define('db_name', default='habr_tornado', help='name of database') 
define('debug', default=True, help='debug mode', type=bool) 
options.parse_command_line() 
db = motor.MotorClient(options.db_uri)[options.db_name]

With help, definewe define parameters in optparse syntax . And then in the right place we get them using options. By calling, options.parse_command_line()we overwrite the default values of the parameters with data from the command line. That is, on production, for us now it is enough to run the application with the parameter --debug=False. A launch with a parameter --helpwill show us all the possible parameters:

$python3 app.py --help 
Usage: app.py [OPTIONS] 
Options: 
  --db_name                        name of database (default habr_tornado) 
  --db_uri                         mongodb uri (default localhost) 
  --debug                          debug mode (default True) 
  --help                           show this help information 
  --port                           run on the given port (default 8000) 
/home/imbolc/.pyenv/versions/3.4.0/lib/python3.4/site-packages/tornado/log.py options: 
  --log_file_max_size              max size of log files before rollover 
                                   (default 100000000) 
  --log_file_num_backups           number of log files to keep (default 10) 
  --log_file_prefix=PATH           Path prefix for log files. Note that if you 
                                   are running multiple tornado processes, 
                                   log_file_prefix must be different for each 
                                   of them (e.g. include the port number) 
  --log_to_stderr                  Send log output to stderr (colorized if 
                                   possible). By default use stderr if 
                                   --log_file_prefix is not set and no other 
                                   logging is configured. 
  --logging=debug|info|warning|error|none 
                                   Set the Python log level. If 'none', tornado 
                                   won't touch the logging configuration. 
                                   (default info)

As you can see, the tornado automatically added logging options.

CSRF

Now add to the application settings xsrf_cookies=True. Try to get a new image, we will see the error: HTTP 403: Forbidden ('\_xsrf' argument missing from POST). This worked csrf protection . To restore the application it is sufficient to form download add {% module xsrf_form_html() %}in the html code of your page it will turn into something like: .

Image Thumbnails

When displaying thumbnails in the list of recent images, we used full images for simplicity. It is time to fix this moment. We will need a pillow (this is a modern fork of PIL, the famous library for working with images):

pip3 install pillow

However, a tornado is single-threaded and such a resource-intensive operation as image processing will negate all our dances with asynchrony. The simplest solution is to put this task in a separate thread:

import os 
import io 
from concurrent.futures import ThreadPoolExecutor 
from PIL import Image 
class UploadHandler(web.RequestHandler): 
    executor = ThreadPoolExecutor(max_workers=os.cpu_count()) 
    @gen.coroutine 
    def post(self): 
        file = self.request.files['file'][0] 
        try: 
            thumbnail = yield self.make_thumbnail(file.body) 
        except OSError: 
            raise web.HTTPError(400, 'Cannot identify image file') 
        orig_id, thumb_id = yield [ 
            gridfs.put(file.body, content_type=file.content_type), 
            gridfs.put(thumbnail, content_type='image/png')] 
        yield db.imgs.save({'orig': orig_id, 'thumb': thumb_id}) 
        self.redirect('') 
    @run_on_executor 
    def make_thumbnail(self, content): 
        im = Image.open(io.BytesIO(content)) 
        im.convert('RGB') 
        im.thumbnail((128, 128), Image.ANTIALIAS) 
        with io.BytesIO() as output: 
            im.save(output, 'PNG') 
            return output.getvalue()

First, we create a worker pool with a limited number of cpu cores (this is optimal for processor-intensive tasks such as image processing). And if more images are loaded at the same time, the rest will wait in line. Then we create a thumbnail asynchronously, calling our method make_thumbnail, wrapped by the run_on_executor decorator , which will cause the task to be executed in one of the executor threads.

Notice how beautifully we catch the exception OSErrorthat the pillow throws if it cannot recognize the image format. We do not need to explicitly pass an error in the response as it is done in case of callback asynchrony (for example, in node.js). Simple, we work with exceptions in a synchronous style.

Next, we save the original image and thumbnail in gridfs . Note that instead of calling in sequence:

orig_id = yield gridfs.put(file.body, content_type=file.content_type) 
thumb_id = yield gridfs.put(thumbnail, content_type='image/png')

We use parallel orig_id, thumb_id = yield [ ... ]. That is, files are saved at the same time. Such a parallel call to corutin makes sense for any operations that are not dependent on each other. For example, we could combine the creation of a thumbnail with the preservation of the original, but we won’t be able to combine the creation and preservation of the thumbnail since the second operation depends on the results of the first.

Finally, we save the image information to the collection imgs. This collection is needed to link the thumbnail and the original image. Also in the future there you can store any information about the image: the author, access rights, etc. With the advent of this collection, the methods for displaying a list and an individual image will change accordingly:

class UploadHandler(web.RequestHandler): 
    ... 
    @gen.coroutine 
    def get(self): 
        imgs = yield db.imgs.find().sort('_id', -1).to_list(20) 
        self.render('upload.html', imgs=imgs) 
class ShowImageHandler(web.RequestHandler): 
    @gen.coroutine 
    def get(self, img_id, size): 
        try: 
            img_id = bson.objectid.ObjectId(img_id) 
        except bson.errors.InvalidId: 
            raise web.HTTPError(404, 'Bad ObjectId') 
        img = yield db.imgs.find_one(img_id) 
        if not img: 
            raise web.HTTPError(404, 'Image not found') 
        gridout = yield gridfs.get(img[size]) 
        self.set_header('Content-Type', gridout.content_type) 
        self.set_header('Content-Length', gridout.length) 
        yield gridout.stream_to_handler(self)

As you see, ShowImageHandler.getnow it receives an additional parameter size - specifying whether we want to get a thumbnail of the image or the original. The regularity of the url has changed accordingly:

web.url(r'/imgs/([\w\d]+)/(orig|thumb)', ShowImageHandler, 
        name='show_image'),

And restoring these url in the template:

{% for img in imgs %} 
     
{% end %}

Conclusion

That's it for today, the code for this and the previous part is available on github .

Tags:

Modern Tornado Part 2: Blocking Operations

Application configuration

CSRF

Image Thumbnails

Conclusion

Also popular now: