schors August 19, 2009 at 19:07

Comparing the Effectiveness of Python Web Application Launch Methods

Recently, in the field of web development, the Python programming language has begun to gain popularity. However, the massive spread of Python is hindered by the problem of efficiently launching applications in this language. So far, in most cases, this is the fate of dedicated or virtual servers. Modular languages, unlike monolithic in the basic php functionality, load at least a runtime library on each request, and at the very least a few dozens more modules requested by the user. Therefore, a classic approach like mod_php for Python and Perl is not very appropriate, and keeping the application constantly in memory was expensive. But time moves, technology has become more powerful and cheaper, and for quite some time now you can safely talk about constantly running processes with the application as part of mass hosting.

What is it about

From time to time, various suggestions on how to run an application in Python appear on the network. For example, recently hosting Gino uniquely corrected mod_python and offered hosting with it . Following him, a certain hosting Locum generally rejected mod_python with its security (it seems that the essence of original security is the only IT problem on the way to nirvana) and conducted a victorious testing of modwsgi against fastcgi . The community, judging by my search, is torn between mod_python and FastCGI. Moreover, FastCGI usually means the one that comes with Django - flup. Being a popular hosting of Python-applications , we could not pass by and decided to contribute to this holy war.

I sincerely believe that any technology should optimally correlate implementation kosher, performance, usability and versatility. Based on this, each of the presented solutions will be described. I approached the issue of choice subjectively, focusing on the apache web server as a universal process manager that everyone understands. From www.wsgi.org/wsgi/Servers I selected flup ( trac.saddi.com/flup ), python-fastcgi ( pypi.python.org/pypi/python-fastcgi ), and mod_wsgi ( www.modwsgi.org ). At the same time he took mod_python ( www.modpython.org ) - as the most popular way of launching python among the average hoster.

Closer to the body

I tried to create ideal conditions for all the options, there are no reboots after a certain number of requests, everything was done in regular simple ways. In practice, there is a testing of the efficiency and productivity of the Apache-> Publisher-> Application path. Many of these tests for some reason also test the performance of the interpreter, but I found it difficult to explain to myself why to compare the same interpreter, and in the case of different ones, to formalize the implementation of which functionality and why it is required to test. I especially want to draw attention to the fact that all technical tests give only a comparative assessment of performance. Therefore, no special tuning or performance increase was done. To avoid unnecessary ranting about php - mod_php is also included in the test.

For all demonized processes, the conditions are selected - 2 pre-started processes with 5 threads each. Except special occasion with flup. All applications are tested by the utility ab 100,000 (one hundred thousand) requests, 10 at the same time, plus an additional mod_python test for 10,000 (ten thousand) requests. Tests were sequentially conducted on Apache with 5th, 30th and 100 pre-started processes (MPM Prefork) to identify trends.

Experimental

Dual-processor Xeon E5335 2.00GHz, 4Gb RAM, SCSI hard drives with SCSI-3 interface. Installed FreeBSD 7.2-RELEASE amd64, Apache 2.2.11, php 5.2.10, python 2.5.4, mod_wsgi 3.0, mod_python 3.3.1, mod_fastcgi 2.4.6, python-fastcgi 1.1 and fcgi devkit 2.4.0, flup 1.0.2 . All tests were performed locally on the server, the load never exceeded 1.

flup

It is a WSGI server with FastCGI interface. It is the main and only standard way to start the django application docs.djangoproject.com/en/dev/howto/deployment/fastcgi . For tests, I used the following program:

#!/usr/local/bin/python def my_wsgi_application(environ, start_response): status = '200 OK' output = 'Hello World!' response_headers = [('Content-type', 'text/plain')] start_response(status, response_headers) return [output] application = my_wsgi_application from flup.server.fcgi import WSGIServer wsgi_opts = {'maxSpare': 5,'minSpare': 5,'maxThreads': 5} WSGIServer(application,**wsgi_opts).run()

There are several difficulties inherent in this method of launching applications: the inability to restart the application without restarting the server, the inability to reload the application code without restarting or some third-party improvements, the need to independently declare the fastcgi handler and its parameters in the application. This is also true for python-fastcgi. As you can see from the results, flup has some saturation already on the test with 5 pre-started Apache processes. Also, he recorded that everything that flup cannot process immediately, he throws. I received up to 40% of query errors on tests. This test causes sadness and sadness, because programmers, according to my statistics, rarely see how programs will work and for many I am now opening America. Very surprised, I decided to look at flup behavior without strictly restricting running threads and wrote the following program,

#!/usr/local/bin/python def my_wsgi_application(environ, start_response): status = '200 OK' output = 'Hello World!' response_headers = [('Content-type', 'text/plain')] start_response(status, response_headers) return [output] application = my_wsgi_application from flup.server.fcgi import WSGIServer WSGIServer(application).run()

The result was expected. There are no losses, flup creates threads as needed (followed the output of ps), but as you might expect, performance almost halved.
So, I give you my fiery greetings, the most popular way to launch Django today ...

modwsgi

It is a WSGI server designed as a module for Apache. The main application is in daemon mode. Those. when the web server is only an intermediary between the created resident programs and their management. This is the main recommended way to start Django docs.djangoproject.com/en/dev/howto/deployment/modwsgi Due to its use with Apache, you can use all sorts of Apachev’s little things like .htaccess and it’s not so scary for system administrators. The same fact greatly scares developers who have heard about nginx and consider Apache evil in the flesh. The program that I used for tests looks like this:

def my_wsgi_application(environ, start_response): status = '200 OK' output = 'Hello World!' response_headers = [('Content-type', 'text/plain')] start_response(status, response_headers) return [output] application = my_wsgi_application

Test results show an increase in performance with an increase in Apache handlers, i.e. lack of saturation. And obviously more productive than flup.

I would like to note some features of modwsgi. Firstly, it has its own setting of how many requests to process before rebooting. This allows him to effectively deal with memory leaks. I did not make this setting in the examples (as well as for other methods), because it is obvious that this will slightly decrease performance. Secondly, it has a unique, unlike other methods, idle time setting, after which it is overloaded. This allows you not to keep in memory a deployed application or an application with leaked memory at a time when it is not required. Thirdly, it automatically reloads when updating the application file. Those. when modifying a program, we are always sure that we will see a new version. None of the above methods can do any of this without special modifications. Another important feature is the removal of even remembering the way the application is launched from the application’s area of responsibility. Pay attention to an example - the program really only has a WSGI interface and that’s it.

python-fastcgi

It is ... bingo! - WSGI_server, with FastCGI interface. In fact - a wrapper around the standard C ++ FastCGI. The program looks like this:

#!/usr/local/bin/python import fastcgi def my_wsgi_application(environ, start_response): status = '200 OK' output = 'Hello World!' response_headers = [('Content-type', 'text/plain')] start_response(status, response_headers) return [output] application = my_wsgi_application s = fastcgi.ThreadedWSGIServer(my_wsgi_application, workers=5) s.serve_forever()

The test results speak for themselves. With the growth of server handlers, productivity increases. Obviously python-fastcgi is the leader of our tests (hi, Locum). In general, after I basically defeated the rise of FastCGI through Apache, this module caused the least questions and complaints. Naturally, it has all the shortcomings of this method of launching - the complexity of the settings, the dependence of the server on the application, the lack of regular reboot tools (for example, to update the program), etc.

mod_python

It is an Apache server module like mod_php. It has several hooks, does not have a WSGI interface. The main problem is considered security, because without modification it is executed on behalf of the server. However, any built-in module, including mod_php, suffers from the same drawback. I wrote the following program for testing:

#!/usr/local/bin/python def index(req): return "Hello World!\n"

Suddenly, the results were modest enough. In the process of testing, one more feature was revealed. Here are the test results for 10,000 thousand queries:

It can be seen that with an increase in the number of processors, productivity ... decreases. This is because when the server starts, apache does not “suck” the application, but does so only when the request hits one of the handlers. Accordingly, the more I made the handlers, the more requests came “for the first time”. It is obvious that with 2-3 active applications, congestion will be quite frequent. Whether to choose the method of launching the application when it will be possible to configure only the whole server is up to you. Also, mod_python has code update issues. Although he has the appropriate setting, we were not able to get him to effectively update the application code when changing without rebooting the entire server. On some hosting sites, it works without a visible problem of updating the code through the use of the diffpriv module. But the second problem arises - the application is downloaded by the server to EVERY request, which even extrapolating our tests gives a serious drop in performance. And a separate acute question, of course, is the choice of “publishers” and work with them. It turned out that mod_python is the most basement of our rating by the sum of indicators.

mod_php

For comparison, I decided to run through tests and php. The program looks quite obvious:

The results are obvious, but the imagination is not struck. In the absence of unnecessary communication and monolithic php itself, I expected a factor of 2 and above.

The summary is quite simple and obvious. the simpler the technology, the more productive it is. The leader in the rating in the performance nomination is undoubtedly python-fastcgi, the leader in the nomination of convenience is modwsgi. Moreover, modwsgi is obviously today an optimal solution for the sums of characteristics, although it is neither the most productive nor the most buggy.

Tags: