Django under microscope

    If according to the report of Artyom Malyshev ( proofit404 ) they will make a film, then the director will be Quentin Tarantino - he has already made one film about Django, he will also shoot the second one. All the details from the life of Django internal mechanisms from the first byte of the HTTP request to the last byte of the response. The extravaganza of parser forms, action-packed compilation of SQL, special effects of the implementation of the template engine for HTML. Who is managing the connection pool and how? All this in chronological order of processing WSGI objects. On all screens of the country - the decoding "Django under microscope".



    About the speaker: Artyom Malyshev  is the founder of the Dry Python project and the core developer of Django Channels version 1.0. He has been writing Python for 5 years and has helped organize Python Rannts meetings in Nizhny Novgorod. Artyom may be familiar to you under the nickname PROOFIT404. The presentation of the report is stored here .


    Once upon a time, we launched the old version of Django. Then she looked scary and sad.



    We saw that it self_checkpassed, we installed everything correctly, everything worked and now you can write code. To achieve all this, we had to run a team django-admin runserver.

    $ django-admin runserver 
    Performing system checks…
    System check identified no issues (0 silenced).
    You have unapplied migrations; your app may not work properly until they are applied. Run 'python manage.py migrate1 to apply them.
    August 21, 2018 - 15:50:53
    Django version 2.1, using settings 'mysite.settings'
    Starting development server at http://127.0.0.1:8000/Quit the server with CONTROL-C.
    

    The process starts, processes HTTP requests and all the magic happens inside and all the code that we want to show users as a site is executed.

    Installation


    django-adminappears on the system when we install Django using, for example, pip, the package manager .

    $ pip install Django
    # setup.py
    from setuptools import find_packages, setup
    setup(
        name='Django',
        entry_points={
            'console_scripts': [
                'django-admin =
                    django.core.management:execute_from_command_line'
            ]
        },
    )
    

    Appears entry_points setuptoolsthat indicates the function execute_from_command_line. This function is an entry point for any operation with Django, for any current process.

    Bootstrap


    What happens inside a function? Bootstrap , which is divided into two iterations.

    # django.core.management
    django.setup().
    

    Configure settings


    The first is reading configs :

    import django.conf.global_settings
    import_module(os.environ["DJANGO_SETTINGS_MODULE"])
    

    The default settings are read global_settings, then from the environment variable we try to find the module c DJANGO_SETTINGS_MODULEthat the user wrote. These settings are combined into one name space.

    Anyone who writes in Django at least “Hello, world” knows what is there INSTALLED_APPS - where we are just writing custom code.

    Populate apps


    In the second part, all these applications, essentially packages, are iterated one by one. We create for each Config, we import models for work with a database and we check models for integrity. Further, the framework fulfills Check, that is, checks that each model has a primary key, all foreign keys point to existing fields and that the Null field is not written in the BooleanField, but the NullBooleanField is used.

    for entry in settings.INSTALLED_APPS:
        cfg = AppConfig.create(entry)
        cfg.import_models()
    

    This is the minimum sanity check for models, for the admin panel, for anything - without connecting to the database, without something super complicated and specific. At this stage, Django still does not know which command you asked to execute, that is, does not differ migratefrom  runserveror shell.

    Then we find ourselves in a module that tries to guess by command line arguments which command we want to execute and in which application it lies.

    Management command


    # django.core.management
    subcommand = sys.argv[1]
    app_name = find(pkgutils.iter_modules(settings.INSTALLED_APPS))
    module = import_module(
        '%s.management.commands.%s' % (app_name, subcommand)
    )
    cmd = module.Command()
    cmd.run_from_argv(self.argv)
    

    In this case, the runserver module will have a built-in module django.core.management.commands.runserver. After importing the module, by convention, a global class is called inside Command, instantiated, and we say: " I found you, here you have the command line arguments that the user passed, do something with them ."

    Next, we go to the runserver module and see that Django is made of "regexp and sticks" , about which I will talk in detail today:

    # django.core.management.commands.runserver
    naiveip_re = re.compile(r"""^(?:
    (?P
        (?P\d{1,3}(?:\.\d{1,3}){3}) |                 # IPv4 address
        (?P\[[a-fA-F0-9:]+\]) |                       # IPv6 address
        (?P[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*) # FQDN
    ):)?(?P\d+)$""", re.X)
    

    Commands


    Scroll down one and a half screens - finally we get into the definition of our team that starts the server.

    # django.core.management.commands.runserver
    class Command(BaseCommand):
        def handle(self, *args, **options):
            httpd = WSGIServer(*args, **options)
            handler = WSGIHandler()
            httpd.set_app(handler)
            httpd.serve_forever()
    

    BaseCommandconducts a minimal set of operations so that command-line arguments lead to arguments to the function call *argsand **options. We see that the WSGI server instance is being created here, the global WSGIHandler is installed in this WSGI server - this is exactly God Object Django . We can say that this is the only instance of the framework. The instance is installed on the server globally - through set applicationand says: "Spin in Event Loop, execute requests."

    There is always an Event Loop somewhere and a programmer who gives him tasks.

    WSGI server


    What is WSGIHandler ? WSGI is an interface that allows you to process HTTP requests with a minimum level of abstraction, and looks like something in the form of a function.

    WSGI handler


    # django.core.handlers.wsgi
    class WSGIHandler:
        def __call__(self, environ, start_response):
            signals.request_started.send()
            request = WSGIRequest(environ)
            response = self.get_response(request)
            start_response(response.status, response.headers)
            return response
    

    For example, here it is an instance of the class that is defined call. He waits for his dictionary entry, in which headers will be presented as bytes and a file-handler. Handler is needed to read from the request. The server itself also gives a callback start_responseso that we can send response.headersits header in one bundle , for example, status.

    Further, we can pass the response body to the server through the response object. Response  is a generator that you can iterate over.

    All servers that are written for WSGI - Gunicorn, uWSGI, Waitress, work on this interface and are interchangeable. We are now considering a server for development, but any server comes to the point that in Django it knocks through environ and callback.

    What is inside God Object?


    What happens inside this global God Object function inside Django?

    • REQUEST.
    • MIDDLEWARES.
    • ROUTING request to view.
    • VIEW - user code processing inside view.
    • FORM - work with forms.
    • ORM.
    • TEMPLATE
    • RESPONSE.

    All the machinery we want from Django takes place within a single function, which is spread across the entire framework.

    Request


    We wrap the WSGI environment, which is a simple dictionary, in some special object, for the convenience of working with the environment. For example, it is more convenient to find out the length of a user request through working with something similar to a dictionary than with a byte string that needs to be parsed and look for key-value entries in it. When working with cookies, I also don’t want to calculate manually whether the storage period has expired or not, and somehow interpret it.

    # django.core.handlers.wsgi
    class WSGIRequest(HttpRequest):
        @cached_property
        def GET(self):
            return QueryDict(self.environ['QUERY_STRING'])
        @property
        def POST(self):
            self._load_post_and_files()
            return self._post
        @cached_property
        def COOKIES(self):
            return parse_cookie(self.environ['HTTP_COOKIE'])
    

    Request contains parsers, as well as a set of handlers to control the processing of the body of the POST request: whether it is a file in memory or temporary in storage on disk. Everything is decided inside the Request. Request in Django is also an aggregator object in which all middlewares can put the information we need about the session, authentication and user authorization. We can say that this is also a God Object, but smaller.

    Further Request gets to middleware.

    Middlewares


    Middleware  is a wrapper that wraps other functions like a decorator. Before giving up control of middleware, in the call method we give a response or call an already wrapped middleware.

    This is what middleware looks like from a programmer’s point of view.

    Settings


    # settings.py
    MIDDLEWARE = [
        'django.middleware.security.SecurityMiddleware',
        'django.middleware.csrf.CsrfViewMiddleware',
        'django.contrib.sessions.middleware.SessionMiddleware',
        'django.contrib.auth.middleware.AuthenticationMiddleware',
    ]
    

    Define


    class Middleware:
        def __init__(self, get_response=None):
            self.get_response = get_response
        def __call__(self, request):
            return self.get_response(request)
    

    From the point of view of Django, middlewares look like a kind of stack:

    # django.core.handlers.base
    def load_middleware(self):
        handler = convert_exception_to_response(self._get_response)
        for middleware_path in reversed(settings.MIDDLEWARE):
            middleware = import_string(middleware_path)
            instance = middleware(handler)
            handler = convert_exception_to_response(instance)
        self._middleware_chain = handler
    

    Apply


    def get_response(self, request):
        set_urlconf(settings.ROOT_URLCONF)
        response = self._middleware_chain(request)
        return response
    

    We take the original function get_response, wrap her handler, which will translate, for example, permission errorand not found errorin the correct HTTP-code. We wrap everything in the middleware itself from the list. The middlewares stack grows, and each next wraps the previous one. This is very similar to applying the same stack of decorators to all views in a project, only centrally. No need to go around and arrange the wrappers with your hands according to the project, everything is convenient and logical.

    We went through 7 circles of middlewares, our request survived and decided to process it in view. Further we get to the routing module.

    Routing


    This is where we decide which handler to call for a particular request. And this is solved:

    • based on url;
    • in the WSGI specification, where request.path_info is called.

    # django.core.handlers.base
    def _get_response(self, request):
        resolver = get_resolver()
        view, args, kwargs = resolver.resolve(request.path_info)
        response = view(request, *args, **kwargs)
        return response
    

    Urls


    We take the resolver, feed it the current request url and expect it to return the view function itself, and from the same url we get the arguments with which to call view. Then it get_responsecalls view, handles exceptions and does something with it.

    # urls.py
    urlpatterns = [
        path('articles/2003/', views.special_case_2003),
        path('articles//', views.year_archive),
        path('articles///', views.month_archive)
    ]
    

    Resolver


    This is what the resolver looks like:

    # django.urls.resolvers
    _PATH_RE = re.compile(
        r'<(?:(?P[^>:]+):)?(?P\w+)>'
    )
    def resolve(self, path):
        for pattern in self.url_patterns:
            match = pattern.search(path)
            if match:
                return ResolverMatch(
                    self.resolve(match[0])
                )
          raise Resolver404({'path': path})
    

    This is also regexp, but recursive. It goes in parts of the url, looks for what the user wants: other users, posts, blogs, or is it some kind of converter, for example, a specific year that needs to be resolved, put in arguments, cast to int.

    It is characteristic that the depth of recursion of the resolve method is always equal to the number of arguments with which view is called. If something went wrong and we did not find a specific url, not found error occurs.

    Then we finally get into view - the code that the programmer wrote.

    View


    In its simplest representation, it is a function that returns request from response, but inside it we perform logical tasks: “for, if, someday” - many repetitive tasks. Django provides us with a class based view where you can specify specific details, and all the behavior will be interpreted in the correct format by the class itself.

    # django.views.generic.edit
    class ContactView(FormView):
        template_name = 'contact.html'
        form_class = ContactForm
        success_url = '/thanks/'
    

    Method flowchart


    self.dispatch()
    self.post()
    self.get_form()
    self.form_valid()
    self.render_to_response()
    

    The method of dispatchthis instance is already in url mapping instead of a function. Dispatch based on HTTP verb understands which method to call: POST came to us and we most likely want to instantiate the form object, if form is valid, save it to the database and show the template. This is all done through the large number of mixins that make up this class.

    Form


    The form must be read from the socket before it gets into the Django view - through the same file handler that lies in the WSGI-environment. form-data is a byte stream, in which separators are described - we can read these blocks and make something of them. It can be a key-value correspondence, if it is a field, part of a file, then again some field - everything is mixed.

    Content-Type: multipart/form-data;boundary="boundary"
    --boundary
    name="field1"
    value1
    --boundary
    name="field2";
    value2
    

    Parser


    The parser consists of 3 parts.

    A chunk iterator that creates expected readings from a byte stream turns into an iterator that can produce boundaries. It guarantees that if something returns, it will be boundary. This is necessary so that inside the parser it is not necessary to store the state of the connection, read from the socket or not read to minimize the logic of data processing.

    Next, the generator wraps in  LazyStream , which again creates an object file from it, but with the expected reading. So the parser can already walk through pieces of bytes and build a key-value from them.

    field and data here will always be strings . If we received a datatime in ISO format, the Django form (which was written by the programmer) will receive, using certain fields, for example, timestamp.

    # django.http.multipartparser
    self._post = QueryDict(mutable=True)
    stream = LazyStream(ChunkIter(self._input_data))
    for field, data in Parser(stream):
        self._post.append(field, force_text(data))
    

    Further the form, most likely, wants to save itself in a database, and here Django ORM begins.

    ORM


    Approximately through such DSL requests for ORM are executed:

    # models.py
    Entry.objects.exclude(
        pub_date__gt=date(2005, 1, 3),
        headline='Hello',
    )
    

    Using keys, you can collect similar SQL expressions:

    SELECT * WHERE NOT (pub_date > '2005-1-3' AND headline = 'Hello')
    

    How does this happen?

    Queryset


    The method excludehas an object under the hood Query. The object is passed arguments to the function, and it creates a hierarchy of objects, each of which can turn itself into a separate piece of the SQL query as a string.

    When traversing the tree, each of the sections polls its child nodes, receives nested SQL queries, and as a result, we can construct SQL as a string. For example, the key-value will not be a separate SQL field, but will be compared with the value-value. Concatenation and denial of queries work in the same way as a recursive tree traversal, for each node of which a cast to SQL is called.

    # django.db.models.query
    sql.Query(Entry).where.add(
        ~Q(
            Q(F('pub_date') > date(2005, 1, 3)) &
            Q(headline='Hello')
        )
    )
    

    Compiler


    # django.db.models.expressions
    class Q(tree.Node):
        AND = 'AND'
            OR = 'OR'
            def as_sql(self, compiler, connection):
                return self.template % self.field.get_lookup('gt')
    

    Output


    >>> Q(headline='Hello')
    # headline = 'Hello'
    >>> F('pub_date')
    # pub_date
    >>> F('pub_date') > date(2005, 1, 3)
    # pub_date > '2005-1-3'
    >>> Q(...) & Q(...)
    # ... AND ...
    >>> ~Q(...)
    # NOT …
    

    A small helper-compiler is passed to this method, which can distinguish the MySQL dialect from PostgreSQL and correctly arrange the syntactic sugar that is used in the dialect of a particular database.

    DB routing


    When we received the SQL query, the model knocks on DB routing and asks which database it is in. In 99% of cases it will be the default database, in the remaining 1% - some kind of its own.

    # django.db.utils
    class ConnectionRouter:
        def db_for_read(self, model, **hints):
            if model._meta.app_label == 'auth':
                return 'auth_db'
    

    Wrapping a database driver from a specific library interface, such as Python MySQL or Psycopg2, creates a universal object that Django can work with. There is a wrapper for cursors, a wrapper for transactions.

    Connecting pool


    # django.db.backends.base.base
    class BaseDatabaseWrapper:
        def commit(self):
            self.validate_thread_sharing()
            self.validate_no_atomic_block()
            with self.wrap_database_errors:
                return self.connection.commit()
    

    In this particular connection, we send requests to the socket that is knocking on the database and wait for execution. The wrapper over the library will read the human response from the database in the form of a record, and Django collects the model instance from this data in Python types. This is not a complicated iteration.

    We wrote something into the database, read something, and decided to tell the user about it using the HTML page. To do this, Django has a community-disliked template language that looks like something like a programming language, only in an HTML file.

    Template


    from django.template.loader import render_to_string
    render_to_string('my_template.html', {'entries': ...})
    

    Code


      {% for entry in entries %}
    • {{ entry.name }}
    • {% endfor %}

    Parser


    # django.template.base
    BLOCK_TAG_START = '{%'
    BLOCK_TAG_END = '%}'
    VARIABLE_TAG_START = '{{'
    VARIABLE_TAG_END = '}}'
    COMMENT_TAG_START = '{#'
    COMMENT_TAG_END = '#}'
    tag_re = (re.compile('(%s.*?%s|%s.*?%s|%s.*?%s)' %
              (re.escape(BLOCK_TAG_START),
               re.escape(BLOCK_TAG_END),
               re.escape(VARIABLE_TAG_START),
               re.escape(VARIABLE_TAG_END),
               re.escape(COMMENT_TAG_START),
               re.escape(COMMENT_TAG_END))))
    

    Surprise - regexp again. Only at the end should there be a comma, and the list will go far down. This is probably the most difficult regexp I've seen in this project.

    Lexer


    The template handler and interpreter are pretty simple. There is a lexer that uses regexp to translate text into a list of small tokens.

    # django.template.base
    def tokenize(self):
        for bit in tag_re.split(template_string):
            lineno += bit.count('\n')
            yield bit
    

    We iterate over the list of tokens, look: “Who are you? Wrap you in a tag node. ” For example, if this is the start of some ifor foror for, the tag handler will take the appropriate handler. The handler itself foragain tells the parser: "Read me the list of tokens up to the closing tag."

    The operation goes to the parser again.

    A node, tag, and parser are mutually recursive things, and the depth of the recursion is usually equal to the nesting of the template itself by tags.

    Parser


    def parse():
        while tokens:
            token = tokens.pop()
            if token.startswith(BLOCK_TAG_START):
                yield TagNode(token)
            elif token.startswith(VARIABLE_TAG_START):
                ...
    

    The tag handler gives us a specific node, for example, with a loop forin which a method appears render.

    For loop


    # django.template.defaulttags
    @register.tag('for')
    def do_for(parser, token):
        args = token.split_contents()
        body = parser.parse(until=['endfor'])
        return ForNode(args, body)
    

    For node


    class ForNode(Node):
        def render(self, context):
             with context.push():
                 for i in self.args:
                     yield self.body.render(context)
    

    The method renderis a render tree. Each upper node can go to a daughter node, ask her to render. Programmers are used to showing some variables in this template. This is done through context- it is presented in the form of a regular dictionary. This is a stack of dictionaries for emulating a scope when we enter a tag. For example, if inside the loop foritself it contextchanges some other tag, then when we exit the loop the changes will be rolled back. This is convenient because when everything is global, it’s hard to work.

    Response


    Finally we got our line with the HTTP response:

    Hello World!

    We can give the line to the user.

    • Return this response from view.
    • View lists middlewares.
    • Middlewares this response modify, complement and improve.
    • Response begins to iterate inside WSGIHandler, is partially written to the socket, and the browser receives a response from our server.

    All of the famous startups that were written in Django, such as Bitbucket or Instagram, started with such a small cycle that every programmer went through.

    All this, and a presentation at Moscow Python Conf ++, is necessary for you to better understand what is in your hands and how to use it. In any magic, there is a large part of regexp that you must be able to cook.

    Artyom Malyshev and 23 other great speakers on April 5 will again give us a lot of food for thought and discussion on the topic of Python at the Moscow Python Conf ++ conference . Study the schedule and join the exchange of experience in solving a variety of problems using Python.

    Also popular now: