We'll figure it out once and for all: AJAX, "Cyrillic characters", encodings, prototype.js, jQuery, JsHttpRequest

    AJAX is technology. One of the commonly used techniques of this technology is to
    send requests using the XMLHttpRequest class object.


    How to send and receive AJAX requests in the encoding we need, whether it is necessary to use single-byte encodings or not to do without UTF-8. All these questions will be answered once and for all by this article.



    By the way, a reprint from mine .

    And yet, of course, there are no classes in JavaScript, but for convenience we will use this terminology.

    The XMLHttpRequest documentation says that the browser should support the following types of
    HTTP requests: GET, POST, HEAD, PUT, DELETE, OPTIONS.

    To date, you can send javascript through an object of class XMLHttpRequest
    only requests like GET and POST .

    So, consider 2 of these requests:

    1. Request of type GET:

    All information to the script on the server can be transmitted only through the URL and through the headers.

    For example,

    GET moy-rebenok / ajax.php? F = 324
    Host: moy-rebenok
    User-Agent: Mozilla / 5.0 (Windows; U; Windows NT 5.1; ru; rv: 1.8.1.11) Gecko / 20071127
    Firefox / 2.0. 0.11
    Accept:
    text / xml, application / xml, application / xhtml + xml, text / html; q = 0.9, text / plain; q = 0.8, image / pn
    g, * / *; q = 0.5
    Accept-Language: ru -ru, ru; q = 0.8, en-us; q = 0.5, en; q = 0.3
    Accept-Encoding: gzip, deflate
    Accept-Charset: windows-1251, utf-8; q = 0.7, *; q = 0.7
    Keep-Alive: 300
    Connection: keep-alive
    Referer: moy-rebenok / ajax.html

    On the server, in ajax.php it will be possible to use the
    $ _GET ['f'] construct to get the value of the variable f.

    Why is there a problem with Russian letters? Because, as you know, Russian letters in the URL cannot be used, they must be somehow transmitted using available Latin letters, numbers and characters allowed in the URL after the '?' Sign.

    People agreed that they would do this using escape sequences.

    escape the sequence of the word “hi” encoded by windows-1251:
    % EF% F0% E8% E2% E5% F2

    escape the sequence of the word “hi” encoded by UTF-8:
    % D0% BF% D1% 80% D0% B8% D0 % B2% D0% B5% D1% 82

    escape sequence of the word "hello" in the coding KOI8-R:
    % CE% CF% D5% C1% C5% D0

    (Sign '%', then character code).

    Thus, you can transfer Russian letters, for example, like this:

    GET my -rebenok / ajax.php? F =% EF% F0% E8% E2% E5% F2
    Host: ...

    or like this:

    GET myy-rebenok / ajax.php? F =% D0% BF% D1% 80% D0% B8% D0% B2% D0% B5% D1% 82
    Host: ...

    Nobody limits you to this .

    By the way, for a GET request, you do not need to specify the Content-Type header.
    Because no content. There is only a request at a specific address.
    All variables to the server are passed through the URL.

    How to make necessary escape sequence in the necessary coding?

    You can tinker with your hands, at least somehow, but naturally in JavaScript.
    Again, no one limits you.

    But for convenience, they usually use one of the 3 functions that are already defined in JavaScript:

    a) escape ()
    b) encodeURI ()
    c) encodeURIComponent ()

    In order:

    a) escape ()

    Latin letters, numbers, @ * / + characters. leaves as it is, everything else encodes like this:
    % xx, or like this:% uxxxx.
    Moreover, xxxx in the second case is a character code not in UTF-8, but in Unicode

    ( The difference between Unicode and UTF-8 ).

    You do not need to use this function , because the result of the execution depends on the browser, the function is not standardized W3C, arose in the dashing 90s.

    Moreover, it is somehow normal (at least fast) to process a string in such a vinyl-gret format on the server.

    The escape () function is used by our compatriot JsHttpRequest library.
    Not because the library is bad, but because it was created to work with all browsers
    (including the oldest ones).

    b) encodeURI ()

    Latin letters, numbers, symbols! @ # $ & * () =: /;? + '. leaves it as it is, everything else
    is encoded with
    escape sequences encoded in UTF-8.

    c) encodeURIComponent ():

    Latin letters, numbers, symbols! * () '. leaves it as it is, everything else is encoded with
    escape sequences encoded in UTF-8.
    Approved by W3C.

    Used by jQuery, prototype.js when querying with the GET method.

    You may have heard from someone: “XMLHttpRequest only works with UTF-8.”
    Now you know that this is not entirely true.

    When a GET request is used, the encoding of the transmitted data is generally not registered anywhere (!).
    Once again, 'Content-type', in which we can specify charset, is
    not used in GET requests.

    But, because JavaScript has 2 convenient functions for translating any string into a string with escape sequences in UTF-8, then all of them use it and work with UTF-8.

    That is why in jQuery you can’t even specify charset when sending a request.
    That is why in Prototype.js, even when you specify encoding = 'windows-1251' and use a GET request, it is still transmitted UTF-8.

    Just because the code for these libraries uses the encodeURIComponent () function.

    Well. There is absolutely nothing wrong with that. All you need to do to now work
    in PHP in
    normal encoding is to use iconv:

    $ f = iconv ('UTF-8', 'windows-1251', $ _GET ['f']);

    By the way, we can do this precisely because $ _GET works so that it understands
    escape sequences. Thanks to the creators of PHP.

    Those. when a GET request arrives, PHP looks at the URL, creates an $ _GET array for us, and we
    already
    do what we want with it . But it seems to be clear.

    2) POST requests.

    Everything is more interesting here.

    Here comes this request to the server. The PHP handler looks at the Content-type, and depending on it fills the $ _POST array and / or the $ HTTP_RAW_POST_DATA variable.

    It fills $ _POST when multipart / form-data or
    x-www-form-urlencoded is specified in the Content-type .

    What kind of Content-type is this?
    And content typing is very convenient. It allows you to pass several variables to your php script.

    What is essentially a POST request?
    These are the headlines, followed by the content. Content is generally arbitrary. Those. just bytes, bytes, bytes.

    But after all, from JavaScript it is usually required to transmit not just bytes, bytes, bytes, but several key = value, key = value pairs, ...
    like in a GET request.

    So people agreed on such a convenient type as x-www-form-urlencoded.
    In order to pass f = 123 and gt = null you need to pass the content:

    f = 123> = null

    Is it familiar? Of course it is familiar, and the type is not in vain called x-www-form-urlencoded.
    Everything is the same as with the GET request.

    And how is content formed in jQuery and prototype.js libraries?

    It is true that using the same function encodeURIComponent (), and therefore the escape sequences will be in UTF-8 encoding. (Regardless of the fact that in prototype.js you will install encoding).

    All. There is one more opportunity. After all, you can transfer not x-www-form-urlencoded (i.e. not parameters), but ordinary text or binary content, which can then be read via $ HTTP_RAW_POST_DATA.

    To do this, set Content-type text / xml or application / octet-stream, in the same place set charset = "windows-1251".

    We put in the send () function a string of the desired encoding. (Prototype.js wraps this call with the new Ajax.Request (...) construct.

    And then ... And he (an object of the XMLHttpRequest class) translates this string into UTF-8, no matter what encoding it is. This is what the W3C documentation says. And he really does it.

    Conclusions:

    1. Directly through XMLHttpRequest it is possible to transmit only UTF-8 encoded strings.

    2. It is possible to transfer strings as if “in any other encodings”, if non-Latin characters
    are over-escapeed.

    3. In JavaScript, there are 3 functions that escape non-Latin characters:
    escape (), encodeURI () and encodeURIComponent ().

    The first translates to a Unicode curve. The second two in UTF-8.

    You can write your own functions that will generate escape sequences of any encoding. It is possible, but not necessary. Because on the contrary, one should be glad that there are such functions that translate the text of any encoding into UTF-8. This is an extremely beautiful fact. The scheme in which all xhtml pages work on windows-1251, ajax throws windows-1251 from the server to the client, and ajax throws UTF-8 from the client to the server, is absolutely acceptable and is used on most resources .

    Just remember to use iconvas described below. And in order for the server to give the JSON script (or whatever you have) to the correct encoding (i.e., in the same encoding in which all xhtml pages are rendered) simply write the header at the beginning of your ajax.php:

    header ('Content -type: text / html; charset = windows-1251 ');

    And everything will be ok.

    Lastly, a little subjective opinion:

    Use jQuery, love people, give gifts.

    Also popular now: