
Django under microscope
If according to the report of Artyom Malyshev ( proofit404 ) they will make a film, then the director will be Quentin Tarantino - he has already made one film about Django, he will also shoot the second one. All the details from the life of Django internal mechanisms from the first byte of the HTTP request to the last byte of the response. The extravaganza of parser forms, action-packed compilation of SQL, special effects of the implementation of the template engine for HTML. Who is managing the connection pool and how? All this in chronological order of processing WSGI objects. On all screens of the country - the decoding "Django under microscope".

About the speaker: Artyom Malyshev is the founder of the Dry Python project and the core developer of Django Channels version 1.0. He has been writing Python for 5 years and has helped organize Python Rannts meetings in Nizhny Novgorod. Artyom may be familiar to you under the nickname PROOFIT404. The presentation of the report is stored here .
Once upon a time, we launched the old version of Django. Then she looked scary and sad.

We saw that it
The process starts, processes HTTP requests and all the magic happens inside and all the code that we want to show users as a site is executed.
Installation
Appears
Bootstrap
What happens inside a function? Bootstrap , which is divided into two iterations.
Configure settings
The first is reading configs :
The default settings are read
Anyone who writes in Django at least “Hello, world” knows what is there
Populate apps
In the second part, all these applications, essentially packages, are iterated one by one. We create for each Config, we import models for work with a database and we check models for integrity. Further, the framework fulfills
This is the minimum sanity check for models, for the admin panel, for anything - without connecting to the database, without something super complicated and specific. At this stage, Django still does not know which command you asked to execute, that is, does not differ
Then we find ourselves in a module that tries to guess by command line arguments which command we want to execute and in which application it lies.
Management command
In this case, the runserver module will have a built-in module
Next, we go to the runserver module and see that Django is made of "regexp and sticks" , about which I will talk in detail today:
Commands
Scroll down one and a half screens - finally we get into the definition of our team that starts the server.
WSGI server
What is WSGIHandler ? WSGI is an interface that allows you to process HTTP requests with a minimum level of abstraction, and looks like something in the form of a function.
WSGI handler
For example, here it is an instance of the class that is defined
Further, we can pass the response body to the server through the response object. Response is a generator that you can iterate over.
All servers that are written for WSGI - Gunicorn, uWSGI, Waitress, work on this interface and are interchangeable. We are now considering a server for development, but any server comes to the point that in Django it knocks through environ and callback.
What is inside God Object?
What happens inside this global God Object function inside Django?
All the machinery we want from Django takes place within a single function, which is spread across the entire framework.
Request
We wrap the WSGI environment, which is a simple dictionary, in some special object, for the convenience of working with the environment. For example, it is more convenient to find out the length of a user request through working with something similar to a dictionary than with a byte string that needs to be parsed and look for key-value entries in it. When working with cookies, I also don’t want to calculate manually whether the storage period has expired or not, and somehow interpret it.
Request contains parsers, as well as a set of handlers to control the processing of the body of the POST request: whether it is a file in memory or temporary in storage on disk. Everything is decided inside the Request. Request in Django is also an aggregator object in which all middlewares can put the information we need about the session, authentication and user authorization. We can say that this is also a God Object, but smaller.
Further Request gets to middleware.
Middlewares
Middleware is a wrapper that wraps other functions like a decorator. Before giving up control of middleware, in the call method we give a response or call an already wrapped middleware.
This is what middleware looks like from a programmer’s point of view.
Settings
Define
From the point of view of Django, middlewares look like a kind of stack:
Apply
We take the original function
We went through 7 circles of middlewares, our request survived and decided to process it in view. Further we get to the routing module.
Routing
This is where we decide which handler to call for a particular request. And this is solved:
Urls
We take the resolver, feed it the current request url and expect it to return the view function itself, and from the same url we get the arguments with which to call view. Then it
Resolver
This is what the resolver looks like:
This is also regexp, but recursive. It goes in parts of the url, looks for what the user wants: other users, posts, blogs, or is it some kind of converter, for example, a specific year that needs to be resolved, put in arguments, cast to int.
It is characteristic that the depth of recursion of the resolve method is always equal to the number of arguments with which view is called. If something went wrong and we did not find a specific url, not found error occurs.
Then we finally get into view - the code that the programmer wrote.
View
In its simplest representation, it is a function that returns request from response, but inside it we perform logical tasks: “for, if, someday” - many repetitive tasks. Django provides us with a class based view where you can specify specific details, and all the behavior will be interpreted in the correct format by the class itself.
Method flowchart
The method of
Form
The form must be read from the socket before it gets into the Django view - through the same file handler that lies in the WSGI-environment. form-data is a byte stream, in which separators are described - we can read these blocks and make something of them. It can be a key-value correspondence, if it is a field, part of a file, then again some field - everything is mixed.
Parser
The parser consists of 3 parts.
A chunk iterator that creates expected readings from a byte stream turns into an iterator that can produce
Next, the generator wraps in LazyStream , which again creates an object file from it, but with the expected reading. So the parser can already walk through pieces of bytes and build a key-value from them.
field and data here will always be strings . If we received a datatime in ISO format, the Django form (which was written by the programmer) will receive, using certain fields, for example, timestamp.
Further the form, most likely, wants to save itself in a database, and here Django ORM begins.
ORM
Approximately through such DSL requests for ORM are executed:
Using keys, you can collect similar SQL expressions:
How does this happen?
Queryset
The method
When traversing the tree, each of the sections polls its child nodes, receives nested SQL queries, and as a result, we can construct SQL as a string. For example, the key-value will not be a separate SQL field, but will be compared with the value-value. Concatenation and denial of queries work in the same way as a recursive tree traversal, for each node of which a cast to SQL is called.
Compiler
Output
A small helper-compiler is passed to this method, which can distinguish the MySQL dialect from PostgreSQL and correctly arrange the syntactic sugar that is used in the dialect of a particular database.
DB routing
When we received the SQL query, the model knocks on DB routing and asks which database it is in. In 99% of cases it will be the default database, in the remaining 1% - some kind of its own.
Wrapping a database driver from a specific library interface, such as Python MySQL or Psycopg2, creates a universal object that Django can work with. There is a wrapper for cursors, a wrapper for transactions.
Connecting pool
In this particular connection, we send requests to the socket that is knocking on the database and wait for execution. The wrapper over the library will read the human response from the database in the form of a record, and Django collects the model instance from this data in Python types. This is not a complicated iteration.
We wrote something into the database, read something, and decided to tell the user about it using the HTML page. To do this, Django has a community-disliked template language that looks like something like a programming language, only in an HTML file.
Template
Code
Parser
Surprise - regexp again. Only at the end should there be a comma, and the list will go far down. This is probably the most difficult regexp I've seen in this project.
Lexer
The template handler and interpreter are pretty simple. There is a lexer that uses regexp to translate text into a list of small tokens.
We iterate over the list of tokens, look: “Who are you? Wrap you in a tag node. ” For example, if this is the start of some
The operation goes to the parser again.
Parser
The tag handler gives us a specific node, for example, with a loop
For loop
For node
The method
Response
Finally we got our line with the HTTP response:
We can give the line to the user.
All of the famous startups that were written in Django, such as Bitbucket or Instagram, started with such a small cycle that every programmer went through.
All this, and a presentation at Moscow Python Conf ++, is necessary for you to better understand what is in your hands and how to use it. In any magic, there is a large part of regexp that you must be able to cook.

About the speaker: Artyom Malyshev is the founder of the Dry Python project and the core developer of Django Channels version 1.0. He has been writing Python for 5 years and has helped organize Python Rannts meetings in Nizhny Novgorod. Artyom may be familiar to you under the nickname PROOFIT404. The presentation of the report is stored here .
Once upon a time, we launched the old version of Django. Then she looked scary and sad.

We saw that it
self_check
passed, we installed everything correctly, everything worked and now you can write code. To achieve all this, we had to run a team django-admin runserver
.$ django-admin runserver
Performing system checks…
System check identified no issues (0 silenced).
You have unapplied migrations; your app may not work properly until they are applied. Run 'python manage.py migrate1 to apply them.
August 21, 2018 - 15:50:53
Django version 2.1, using settings 'mysite.settings'
Starting development server at http://127.0.0.1:8000/Quit the server with CONTROL-C.
The process starts, processes HTTP requests and all the magic happens inside and all the code that we want to show users as a site is executed.
Installation
django-admin
appears on the system when we install Django using, for example, pip, the package manager .$ pip install Django
# setup.py
from setuptools import find_packages, setup
setup(
name='Django',
entry_points={
'console_scripts': [
'django-admin =
django.core.management:execute_from_command_line'
]
},
)
Appears
entry_points setuptools
that indicates the function execute_from_command_line
. This function is an entry point for any operation with Django, for any current process.Bootstrap
What happens inside a function? Bootstrap , which is divided into two iterations.
# django.core.management
django.setup().
Configure settings
The first is reading configs :
import django.conf.global_settings
import_module(os.environ["DJANGO_SETTINGS_MODULE"])
The default settings are read
global_settings
, then from the environment variable we try to find the module c DJANGO_SETTINGS_MODULE
that the user wrote. These settings are combined into one name space. Anyone who writes in Django at least “Hello, world” knows what is there
INSTALLED_APPS
- where we are just writing custom code.Populate apps
In the second part, all these applications, essentially packages, are iterated one by one. We create for each Config, we import models for work with a database and we check models for integrity. Further, the framework fulfills
Check
, that is, checks that each model has a primary key, all foreign keys point to existing fields and that the Null field is not written in the BooleanField, but the NullBooleanField is used.for entry in settings.INSTALLED_APPS:
cfg = AppConfig.create(entry)
cfg.import_models()
This is the minimum sanity check for models, for the admin panel, for anything - without connecting to the database, without something super complicated and specific. At this stage, Django still does not know which command you asked to execute, that is, does not differ
migrate
from runserver
or shell
. Then we find ourselves in a module that tries to guess by command line arguments which command we want to execute and in which application it lies.
Management command
# django.core.management
subcommand = sys.argv[1]
app_name = find(pkgutils.iter_modules(settings.INSTALLED_APPS))
module = import_module(
'%s.management.commands.%s' % (app_name, subcommand)
)
cmd = module.Command()
cmd.run_from_argv(self.argv)
In this case, the runserver module will have a built-in module
django.core.management.commands.runserver
. After importing the module, by convention, a global class is called inside Command
, instantiated, and we say: " I found you, here you have the command line arguments that the user passed, do something with them ." Next, we go to the runserver module and see that Django is made of "regexp and sticks" , about which I will talk in detail today:
# django.core.management.commands.runserver
naiveip_re = re.compile(r"""^(?:
(?P
(?P\d{1,3}(?:\.\d{1,3}){3}) | # IPv4 address
(?P\[[a-fA-F0-9:]+\]) | # IPv6 address
(?P[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*) # FQDN
):)?(?P\d+)$""", re.X)
Commands
Scroll down one and a half screens - finally we get into the definition of our team that starts the server.
# django.core.management.commands.runserver
class Command(BaseCommand):
def handle(self, *args, **options):
httpd = WSGIServer(*args, **options)
handler = WSGIHandler()
httpd.set_app(handler)
httpd.serve_forever()
BaseCommand
conducts a minimal set of operations so that command-line arguments lead to arguments to the function call *args
and **options
. We see that the WSGI server instance is being created here, the global WSGIHandler is installed in this WSGI server - this is exactly God Object Django . We can say that this is the only instance of the framework. The instance is installed on the server globally - through set application
and says: "Spin in Event Loop, execute requests."There is always an Event Loop somewhere and a programmer who gives him tasks.
WSGI server
What is WSGIHandler ? WSGI is an interface that allows you to process HTTP requests with a minimum level of abstraction, and looks like something in the form of a function.
WSGI handler
# django.core.handlers.wsgi
class WSGIHandler:
def __call__(self, environ, start_response):
signals.request_started.send()
request = WSGIRequest(environ)
response = self.get_response(request)
start_response(response.status, response.headers)
return response
For example, here it is an instance of the class that is defined
call
. He waits for his dictionary entry, in which headers will be presented as bytes and a file-handler. Handler is needed to read
from the request. The server itself also gives a callback start_response
so that we can send response.headers
its header in one bundle , for example, status. Further, we can pass the response body to the server through the response object. Response is a generator that you can iterate over.
All servers that are written for WSGI - Gunicorn, uWSGI, Waitress, work on this interface and are interchangeable. We are now considering a server for development, but any server comes to the point that in Django it knocks through environ and callback.
What is inside God Object?
What happens inside this global God Object function inside Django?
- REQUEST.
- MIDDLEWARES.
- ROUTING request to view.
- VIEW - user code processing inside view.
- FORM - work with forms.
- ORM.
- TEMPLATE
- RESPONSE.
All the machinery we want from Django takes place within a single function, which is spread across the entire framework.
Request
We wrap the WSGI environment, which is a simple dictionary, in some special object, for the convenience of working with the environment. For example, it is more convenient to find out the length of a user request through working with something similar to a dictionary than with a byte string that needs to be parsed and look for key-value entries in it. When working with cookies, I also don’t want to calculate manually whether the storage period has expired or not, and somehow interpret it.
# django.core.handlers.wsgi
class WSGIRequest(HttpRequest):
@cached_property
def GET(self):
return QueryDict(self.environ['QUERY_STRING'])
@property
def POST(self):
self._load_post_and_files()
return self._post
@cached_property
def COOKIES(self):
return parse_cookie(self.environ['HTTP_COOKIE'])
Request contains parsers, as well as a set of handlers to control the processing of the body of the POST request: whether it is a file in memory or temporary in storage on disk. Everything is decided inside the Request. Request in Django is also an aggregator object in which all middlewares can put the information we need about the session, authentication and user authorization. We can say that this is also a God Object, but smaller.
Further Request gets to middleware.
Middlewares
Middleware is a wrapper that wraps other functions like a decorator. Before giving up control of middleware, in the call method we give a response or call an already wrapped middleware.
This is what middleware looks like from a programmer’s point of view.
Settings
# settings.py
MIDDLEWARE = [
'django.middleware.security.SecurityMiddleware',
'django.middleware.csrf.CsrfViewMiddleware',
'django.contrib.sessions.middleware.SessionMiddleware',
'django.contrib.auth.middleware.AuthenticationMiddleware',
]
Define
class Middleware:
def __init__(self, get_response=None):
self.get_response = get_response
def __call__(self, request):
return self.get_response(request)
From the point of view of Django, middlewares look like a kind of stack:
# django.core.handlers.base
def load_middleware(self):
handler = convert_exception_to_response(self._get_response)
for middleware_path in reversed(settings.MIDDLEWARE):
middleware = import_string(middleware_path)
instance = middleware(handler)
handler = convert_exception_to_response(instance)
self._middleware_chain = handler
Apply
def get_response(self, request):
set_urlconf(settings.ROOT_URLCONF)
response = self._middleware_chain(request)
return response
We take the original function
get_response
, wrap her handler, which will translate, for example, permission error
and not found error
in the correct HTTP-code. We wrap everything in the middleware itself from the list. The middlewares stack grows, and each next wraps the previous one. This is very similar to applying the same stack of decorators to all views in a project, only centrally. No need to go around and arrange the wrappers with your hands according to the project, everything is convenient and logical. We went through 7 circles of middlewares, our request survived and decided to process it in view. Further we get to the routing module.
Routing
This is where we decide which handler to call for a particular request. And this is solved:
- based on url;
- in the WSGI specification, where request.path_info is called.
# django.core.handlers.base
def _get_response(self, request):
resolver = get_resolver()
view, args, kwargs = resolver.resolve(request.path_info)
response = view(request, *args, **kwargs)
return response
Urls
We take the resolver, feed it the current request url and expect it to return the view function itself, and from the same url we get the arguments with which to call view. Then it
get_response
calls view, handles exceptions and does something with it.# urls.py
urlpatterns = [
path('articles/2003/', views.special_case_2003),
path('articles//', views.year_archive),
path('articles///', views.month_archive)
]
Resolver
This is what the resolver looks like:
# django.urls.resolvers
_PATH_RE = re.compile(
r'<(?:(?P[^>:]+):)?(?P\w+)>'
)
def resolve(self, path):
for pattern in self.url_patterns:
match = pattern.search(path)
if match:
return ResolverMatch(
self.resolve(match[0])
)
raise Resolver404({'path': path})
This is also regexp, but recursive. It goes in parts of the url, looks for what the user wants: other users, posts, blogs, or is it some kind of converter, for example, a specific year that needs to be resolved, put in arguments, cast to int.
It is characteristic that the depth of recursion of the resolve method is always equal to the number of arguments with which view is called. If something went wrong and we did not find a specific url, not found error occurs.
Then we finally get into view - the code that the programmer wrote.
View
In its simplest representation, it is a function that returns request from response, but inside it we perform logical tasks: “for, if, someday” - many repetitive tasks. Django provides us with a class based view where you can specify specific details, and all the behavior will be interpreted in the correct format by the class itself.
# django.views.generic.edit
class ContactView(FormView):
template_name = 'contact.html'
form_class = ContactForm
success_url = '/thanks/'
Method flowchart
self.dispatch()
self.post()
self.get_form()
self.form_valid()
self.render_to_response()
The method of
dispatch
this instance is already in url mapping instead of a function. Dispatch based on HTTP verb understands which method to call: POST came to us and we most likely want to instantiate the form object, if form is valid, save it to the database and show the template. This is all done through the large number of mixins that make up this class.Form
The form must be read from the socket before it gets into the Django view - through the same file handler that lies in the WSGI-environment. form-data is a byte stream, in which separators are described - we can read these blocks and make something of them. It can be a key-value correspondence, if it is a field, part of a file, then again some field - everything is mixed.
Content-Type: multipart/form-data;boundary="boundary"
--boundary
name="field1"
value1
--boundary
name="field2";
value2
Parser
The parser consists of 3 parts.
A chunk iterator that creates expected readings from a byte stream turns into an iterator that can produce
boundaries
. It guarantees that if something returns, it will be boundary. This is necessary so that inside the parser it is not necessary to store the state of the connection, read from the socket or not read to minimize the logic of data processing. Next, the generator wraps in LazyStream , which again creates an object file from it, but with the expected reading. So the parser can already walk through pieces of bytes and build a key-value from them.
field and data here will always be strings . If we received a datatime in ISO format, the Django form (which was written by the programmer) will receive, using certain fields, for example, timestamp.
# django.http.multipartparser
self._post = QueryDict(mutable=True)
stream = LazyStream(ChunkIter(self._input_data))
for field, data in Parser(stream):
self._post.append(field, force_text(data))
Further the form, most likely, wants to save itself in a database, and here Django ORM begins.
ORM
Approximately through such DSL requests for ORM are executed:
# models.py
Entry.objects.exclude(
pub_date__gt=date(2005, 1, 3),
headline='Hello',
)
Using keys, you can collect similar SQL expressions:
SELECT * WHERE NOT (pub_date > '2005-1-3' AND headline = 'Hello')
How does this happen?
Queryset
The method
exclude
has an object under the hood Query
. The object is passed arguments to the function, and it creates a hierarchy of objects, each of which can turn itself into a separate piece of the SQL query as a string. When traversing the tree, each of the sections polls its child nodes, receives nested SQL queries, and as a result, we can construct SQL as a string. For example, the key-value will not be a separate SQL field, but will be compared with the value-value. Concatenation and denial of queries work in the same way as a recursive tree traversal, for each node of which a cast to SQL is called.
# django.db.models.query
sql.Query(Entry).where.add(
~Q(
Q(F('pub_date') > date(2005, 1, 3)) &
Q(headline='Hello')
)
)
Compiler
# django.db.models.expressions
class Q(tree.Node):
AND = 'AND'
OR = 'OR'
def as_sql(self, compiler, connection):
return self.template % self.field.get_lookup('gt')
Output
>>> Q(headline='Hello')
# headline = 'Hello'
>>> F('pub_date')
# pub_date
>>> F('pub_date') > date(2005, 1, 3)
# pub_date > '2005-1-3'
>>> Q(...) & Q(...)
# ... AND ...
>>> ~Q(...)
# NOT …
A small helper-compiler is passed to this method, which can distinguish the MySQL dialect from PostgreSQL and correctly arrange the syntactic sugar that is used in the dialect of a particular database.
DB routing
When we received the SQL query, the model knocks on DB routing and asks which database it is in. In 99% of cases it will be the default database, in the remaining 1% - some kind of its own.
# django.db.utils
class ConnectionRouter:
def db_for_read(self, model, **hints):
if model._meta.app_label == 'auth':
return 'auth_db'
Wrapping a database driver from a specific library interface, such as Python MySQL or Psycopg2, creates a universal object that Django can work with. There is a wrapper for cursors, a wrapper for transactions.
Connecting pool
# django.db.backends.base.base
class BaseDatabaseWrapper:
def commit(self):
self.validate_thread_sharing()
self.validate_no_atomic_block()
with self.wrap_database_errors:
return self.connection.commit()
In this particular connection, we send requests to the socket that is knocking on the database and wait for execution. The wrapper over the library will read the human response from the database in the form of a record, and Django collects the model instance from this data in Python types. This is not a complicated iteration.
We wrote something into the database, read something, and decided to tell the user about it using the HTML page. To do this, Django has a community-disliked template language that looks like something like a programming language, only in an HTML file.
Template
from django.template.loader import render_to_string
render_to_string('my_template.html', {'entries': ...})
Code
{% for entry in entries %}
- {{ entry.name }}
{% endfor %}
Parser
# django.template.base
BLOCK_TAG_START = '{%'
BLOCK_TAG_END = '%}'
VARIABLE_TAG_START = '{{'
VARIABLE_TAG_END = '}}'
COMMENT_TAG_START = '{#'
COMMENT_TAG_END = '#}'
tag_re = (re.compile('(%s.*?%s|%s.*?%s|%s.*?%s)' %
(re.escape(BLOCK_TAG_START),
re.escape(BLOCK_TAG_END),
re.escape(VARIABLE_TAG_START),
re.escape(VARIABLE_TAG_END),
re.escape(COMMENT_TAG_START),
re.escape(COMMENT_TAG_END))))
Surprise - regexp again. Only at the end should there be a comma, and the list will go far down. This is probably the most difficult regexp I've seen in this project.
Lexer
The template handler and interpreter are pretty simple. There is a lexer that uses regexp to translate text into a list of small tokens.
# django.template.base
def tokenize(self):
for bit in tag_re.split(template_string):
lineno += bit.count('\n')
yield bit
We iterate over the list of tokens, look: “Who are you? Wrap you in a tag node. ” For example, if this is the start of some
if
or for
or for
, the tag handler will take the appropriate handler. The handler itself for
again tells the parser: "Read me the list of tokens up to the closing tag." The operation goes to the parser again.
A node, tag, and parser are mutually recursive things, and the depth of the recursion is usually equal to the nesting of the template itself by tags.
Parser
def parse():
while tokens:
token = tokens.pop()
if token.startswith(BLOCK_TAG_START):
yield TagNode(token)
elif token.startswith(VARIABLE_TAG_START):
...
The tag handler gives us a specific node, for example, with a loop
for
in which a method appears render
.For loop
# django.template.defaulttags
@register.tag('for')
def do_for(parser, token):
args = token.split_contents()
body = parser.parse(until=['endfor'])
return ForNode(args, body)
For node
class ForNode(Node):
def render(self, context):
with context.push():
for i in self.args:
yield self.body.render(context)
The method
render
is a render tree. Each upper node can go to a daughter node, ask her to render. Programmers are used to showing some variables in this template. This is done through context
- it is presented in the form of a regular dictionary. This is a stack of dictionaries for emulating a scope when we enter a tag. For example, if inside the loop for
itself it context
changes some other tag, then when we exit the loop the changes will be rolled back. This is convenient because when everything is global, it’s hard to work.Response
Finally we got our line with the HTTP response:
Hello World!
We can give the line to the user.
- Return this response from view.
- View lists middlewares.
- Middlewares this response modify, complement and improve.
- Response begins to iterate inside WSGIHandler, is partially written to the socket, and the browser receives a response from our server.
All of the famous startups that were written in Django, such as Bitbucket or Instagram, started with such a small cycle that every programmer went through.
All this, and a presentation at Moscow Python Conf ++, is necessary for you to better understand what is in your hands and how to use it. In any magic, there is a large part of regexp that you must be able to cook.
Artyom Malyshev and 23 other great speakers on April 5 will again give us a lot of food for thought and discussion on the topic of Python at the Moscow Python Conf ++ conference . Study the schedule and join the exchange of experience in solving a variety of problems using Python.