Solving problems with broken encoding: accept-charset

    Faced such a problem: many programmers either do not know about the existence of accept-charset, or ignore this attribute. Having come to my current company, I started developing the REST API service, but occasionally the bugs “XML response is broken for ...” fell on me. I had to dig a little deeper into the GUI and found the lack of a favorite tag. Why do we need another attribute, you ask?


    What is accept-charset better than I have long been described in the W3C at this link (http://www.w3.org/TR/html401/interact/forms.html#adef-accept-charset)

    Let's imagine the situation:


    - you have a site
    - you specified utf-8 encoding in meta
    - you configured the server part to work with utf-8 (base, backend, etc.)
    You are testing: go to the site, submit from the form - everything is fine. However, the problem is that many people forget:
    1. in most cases, the browser has auto-encoding and your site correctly posts data to the server part
    2. there are people who manually set the encoding for themselves
    3. there are people who like to play with your site
    4. other: bots, software for testing, etc.

    What will happen in this case if there is no sabzha attribute in the FORM tag:
    1. open your site
    2. change the encoding in the browser, let it be ISO-8859-1
    3. try to enter data in Russian or, for example: German, using umlauts; if you want to go further - try specials. characters
    4. post your form
    5. open your record in the database and see in which encoding your characters got there and how they were processed by the server part

    Answer : you will receive text encoded in ISO-8859-1 because the browser follows the standards and certain sequences in the encoding definition, which means that if ISO-8859-1 is hard-coded, the browser will obey and use ISO-8859-1 to send data from the form

    How to deal with this?


    Look in the topic heading: yes, it is accept-charset = "utf-8" in the FORM tag that will save you from this problem situation. This attribute will give the browser the necessary “knowledge” that data from the form should be sent only in utf-8 encoding and no other

    conclusion : everything ingenious is simple, but information in our time owns the world.

    PS youtube still remains a mystery to me, they intentionally do not use accept-charset, instead they use some kind of their own functionality that does the same (it seems like javascript)

    Also popular now: