Error in HTTP protocol

    In the article I want to tell not so much about the error in RFC 2616, but about my approach to creating the HTTP message parser, to show its advantages and disadvantages . My approach is based on two principles: “it is better to lose an hour, then fly in five minutes” and “let the computer work, and I will rest” .

    And so the task as a whole: implement an HTTP server, and an HTTP parser, in particular. Protocol version 1.1 is described in RFC 2616. Two entities can be distinguished in this specification: the narrative and the BNF rules that define the syntax of the messages. BNF rules are a well formalized thing, for which there is even RFC 5234, where the BNF grammar is described using the BNF rules again. The truth released RFC 5234 was later than RFC 2616 (HTTP), and has several minor differences.

    BNF Short Tour

    BNF grammar is quite simple, so I will give an example with explanations, I think this will be enough to make an idea (for those who are not familiar).
    start-line      = Request-Line | Status-Line
    generic-message = start-line
                     *(message-header CRLF)
                     CRLF
                     [ message-body ]
    

    If translated into Russian, it will turn out:
    1) the start-line is a Request-Line or Status-Line (these are also the rules that are described somewhere)
    2) generic-message is a sequence from start-line, * (message- header CRLF), CRLF and possibly message-body ([...] brackets define optionality). Where * (message-header CRLF) allows 0 or more concatenation repetitions of the two message-header and CRLF rules.
    Something like regular expressions, which is not surprising.

    About the mistake

    In order to trace the error, below I gave a number of rules. A quick run through them, you can see the following: A request consists of a Request-Line and repeating the headers (header) separated by CRLF. Among the header groups, there is entity-header, which, unlike general-header and request-header, contains extension-header. The extension-header rule allows non-standard headers, in other words, it is it that solicits adding a header to the request
    My-Header: I am server
    however, the request will remain valid. In addition, this rule opens up the possibility of writing protocol extensions. Since extension-header allows any headers, including standard ones (From, Accept, Host, Referer, etc.), this situation arises: if the message contains an invalid standard header, it will not be allowed by the rule describing it, but extension The -header header will admit that is not correct.
    Request       = Request-Line              ; Section 5.1
                    *(( general-header        ; Section 4.5
                     | request-header         ; Section 5.3
                     | entity-header ) CRLF)  ; Section 7.1
                     CRLF
                    [ message-body ]          ; Section 4.3
    entity-header  = Allow                    ; Section 14.7
                   | Content-Encoding         ; Section 14.11
                   | Content-Language         ; Section 14.12
                   | Content-Length           ; Section 14.13
                   | Content-Location         ; Section 14.14
                   | Content-MD5              ; Section 14.15
                   | Content-Range            ; Section 14.16
                   | Content-Type             ; Section 14.17
                   | Expires                  ; Section 14.21
                   | Last-Modified            ; Section 14.29
                   | extension-header
    extension-header = message-header
    message-header = field-name ":" [ field-value ]
    field-name     = token
    field-value    = *( field-content | LWS )
    field-content  = 

    Unfortunately, in BNF there is no way to describe a rule of the form “something not including something else”. In the specification, such rules are described informally:
    ctext          = 
    qdtext         = >
    

    The correct field-name rule should look something like this:
    field-name     = 


    The main thing

    I wanted to talk about the utilities that build DKA based on BNF. That is, ideally, they generate the parser automatically. But somehow it turned out that the headline to attract attention took too much time, so about the tools next time.

    UPD : Gentlemen minusers, please express your opinion. If I'm wrong, then I would like to know what.

    Also popular now: