Semantics in HTML 5

Original author: John Allsopp
  • Transfer
I am going to make a bold prediction. Long after you and me, HTML will be around. Not only in billions of archival pages of our era, but as living respiratory organs. Too much effort, energy and investment went into the development of web-tools, protocols and platforms, so that all this would be easily abandoned.

Let's stop to consider our responsibility. Unfortunately, in history we are associated with the development of an important instrument of our civilization, which will be used for communication for decades. And so when we direct our minds, idly or seriously, to improving HTML, we need to understand how far-reaching the consequences of our decisions can be.

HTML 5, W3C has recently redoubled its efforts to form a new generation of HTML, and has gained significant momentum over the past year or so. This is a huge project that covers not only the structure of HTML, but also parsing models, error handling models, DOM, algorithms for extracting resources, media content, 2D graphics, data templates, security models, page loading models, storing data on the client side and much more.

There are also changes in the structure, syntax and semantics of HTML, some of them were described by Lachlan Hunt in the article " Overview of HTML 5 " ( translation on the hub ).

But in this article, let's only look at the semantics of HTML. This is what I have been interested in for many years and I believe that it is very important for the future of HTML.

The BBC recently announced that they will reduce the proportion of hCalendar microformat in their television program, in favor of the availability and convenience of the abbr design pattern . This indicates that we, without a doubt, have pushed the semantic capabilities of HTML far beyond the limits that have ever been intended, and indeed it is possible for the language. We simply run out of HTML elements and attributes that can enhance the semantics of the document. If we continue to trick with existing HTML constructs, then more and more such problems will arise. Because HTML suffers from a fundamental defect, like a semantic markup language - its semantics are fixed and not extensible.

This is not just a theoretical problem. Hundreds of thousands of developers use class and id to create more semantic markup (they also use them as “hooks” for CSS styles, but that's another question). Almost always, these developers use special dictionaries, the values ​​of which they themselves compose, and not the values ​​of existing schemes. This is pseudo-semantic markup - at best.

Many pages across the Internet use microformats to add more structured semantics than using an impoverished set of HTML elements and attributes . In this case, the values ​​used for the class attribute are consistent with dictionaries, sometimes taken from other standards, such as vCard, sometimes from newly created dictionaries where there is no rigid existing standard (as is the case with hReview ).

Expandable Semantics


There is a very serious problem that needs to be solved here. We need mechanisms in HTML that clearly and unambiguously allow developers to add more expressive semantics, rather than pseudo-semantics, to their markup. This is perhaps the most urgent task for HTML 5 projects.

But it’s not so easy to come up with a mechanism to create more semantics in HTML content: There are significant limitations to any solution. Perhaps the biggest of these is backward compatibility. The solution cannot violate the hundreds of millions of viewing devices in use today, which will be used in the coming years. Any decision that is not compatible will not be widely accepted by developers, fearing loss of readers. It will dry quickly on the vine.

The solution should also be forward-compatible. Not in the sense that it should work in future browsers - this is the task of browser developers, but it should be extensible . We cannot expect any single solution that we will now develop to solve all the imaginable and unimaginable needs of semantics in the future. We can develop solutions that can be expanded to meet future needs as they arise.

these difficulties together pose a huge problem. But in the context of a language whose main iterations take place over decades, and the importance of which, as a global platform for communications, is of paramount importance, this is a problem that must be solved.

So how does HTML 5 solve this? HTML 5 introduces a number of new elements. I called some structural ones - section, nav, aside, header and footer. A dialog element that is similar in type and content to blockquote. There are also a number of data elements, such as meter , which is a "scalar measurement within a known range or fractional value, such as disk usage"; and the time element {http://www.w3.org/html/wg/html5/#the-time}, which represents the date and / or time.

Although these elements can be useful and, as it turned out, aroused some interest in whether they can really solve this problem, we will determine with the limitations of upward compatibility and backward compatibility.

Consider every obstacle

backward compatibility


How do modern browsers handle these new elements, such as section? Well, the latest versions of Safari, Opera, Mozilla, and even IE7 do everything on the page as follows.

Top Level Heading



 

Second Level Heading


 

this is text in a section element


 

  

Third Level Heading


 



* This source code was highlighted with Source Code Highlighter.

In the beginning, it looks great. But when we try to set CSS styles, for example, for the section element, which looks like this:

section {color: red}

... Most of the mentioned browsers succeed, but IE7 (and especially 6) does not.

Therefore, we have a backward compatibility issue with 75% of the browsers currently in use. Given the half-life of Internet Explorer, we can predict that most users will use IE6 and IE7, even after a few years.

If HTML 5 introduces new elements, what is the likelihood that they will be used by the vast majority of developers - given that they are not compatible with most browsers used?

Let's turn to compatibility from the bottom up, this is the next problem.

Bottom up compatibility


First we pose the question: “Why are we inventing these new elements?” A reasonable answer would be: "Because there is not enough semantics in HTML, and adding these elements we will increase the semantics of HTML, which can not be bad, or maybe?".

By adding these elements, we consider the need to increase the potential of the semantics of HTML, but only within a narrow scope. No matter how many elements we introduce, we will always think about adding more HTML semantics. And adding as many elements as we want, we will not solve the problem. We do not need to add certain terms to the HTML dictionary, we must add a mechanism to expand the semantics of the document as necessary. In technical terms, we need to make HTML extensible. HTML 5 does not offer an extensibility mechanism.

Thus, HTML 5 performs a function that kills a significant percentage of modern browsers and does not allow you to add semantics to the language at all.

There are a few questions about the new elements. Where do the names of the new elements come from? How was it decided that the navigation element should be called “nav”? Why are the terms page-level, site-level and meta-site-level used in navigation?

Why not adopt an existing dictionary such as DocBook? His vocabulary structure of the document is richer; it has been developed through expert publications over the years. This is not an argument in favor of DocBook, but the fact is that the extremely important task of preparing a mechanism for providing HTML semantics goes a long way, paying little attention to the practice in work that began more than 30 years ago. (The original work on GML began in the early 1970s)

Some solution ideas


And so, the current efforts are extremely important, I have some practical recommendations on how to solve this problem. Well, I started with one.

If adding new elements is not discussed, at least in this discussion, attributes are another logical area of ​​HTML, focus on it. In the end, for almost ten years now we have used the class and id attributes as mechanisms to extend the semantics of HTML. Many developers are already familiar with this and feel comfortable. Microformats projectshowed that existing attributes are not enough to use them as a mechanism for expanding the semantics of HTML. So if we want to use attributes to solve a problem, we must introduce one or more new attributes. Before moving on to the mechanics of how this can work, it is fair to expose this proposal to the same requirements as new elements in HTML 5. The most important thing in introducing new attributes is whether backward compatibility will be HTML. If so, does this provide a workable mechanism for expanding semantics in HTML?

Let's invent a new attribute. Let's call it “structure”, but the name is not important. We can use it like this:


Let's see how our browsers appreciate it.

Of course, all of our browsers will process the next CSS element.

div {color: red}

What about this:

div [structure] {font-weight: bold}

In fact, almost all browsers, including IE7, will process a div style with a structure attribute, even if there is no such attribute. Unfortunately, our happiness disappears because IE6 does not. But we can use this attribute in HTML and all existing browsers recognize it. We can even use CSS styles for our HTML, using the attribute in all modern browsers. And if we want to get around the old browsers, we can add a class, with a style value. Compared to the HTML 5 solution, which adds new elements that do not work in Internet Explorer 6 or 7, we see that this is certainly a more backward compatible solution.

Attribute Extensibility


Instead of new elements, HTML 5 should accept a number of new attributes. Each of these attributes will belong to a category or type of semantics. For example, as I already detailed in another article , HTML includes: structural semantics, rhetorical semantics, role semantics (adopted from XHTML) and other classes and categories of semantics.

These new attributes can be used as a class attribute: to give the element semantics, describe the nature of the element, or for the metadata of the element.

This is no different from attribute roles in XHTML , where we have one attribute for all semantics elements, we need to define different types of element semantics and separate them.

For example, the XHTML role attribute works as follows:


      
  • Downloads

  •   
  • Documentation

  •   
  • News



* This source code was highlighted with Source Code Highlighter.

The value of the role attribute is the divided list space of words defined by a standard dictionary or a given dictionary.

Why not accept the role attribute as it is? After all, there are other types of semantics for which the definition of a role is not applicable. For instance:

He's a fantastic person.


This demonstrates the theoretical type of semantics - “rhetorical”, which can be used to mark up a document of a rhetorical nature. This element clearly does not play the role of irony in the document. On the contrary, it contains elements of irony.

Here is another example. It is becoming increasingly apparent that HTML lacks a representation of a machine-readable value understandable to humans, such as dates. This underlies the BBC problem with the hCalendar microformat, which we talked about earlier. Although May Day next year really does not make sense, but by analogy May Day next year will be.

Again, when we use the specific term “equivalent” as an attribute or some other term for this kind of semantics, this is not a problem. It is important to note that this is not as simple as using the class or role attribute, where a whole set of information semantics elements are placed in one element. For a properly extensible solution that provides backward compatibility and sufficient flexibility, it is worth exploring in that direction.

I called this section “Some ideas for a solution,” since a significant amount of work needs to be done in order to create a truly workable solution. Open-ended questions include the following.

  • how many different semantic attributes should be. Will these categories be extensible, if so, how?
  • How to define a dictionary?
  • Are we just inventing the terms that we want in almost the same way that developers used the class value, or should the possible values ​​be defined by a standardized specification?
  • If we have a conflict between two dictionaries, for example, two different dictionaries are defined by two different dictionaries, how to solve this?
  • Do I need a namespace or is there another mechanism?


Instead of rushing to answer these questions, I put forward the questions that need to be addressed and a dialogue started. The fork and scope of the decisions made in HTML 5 is too large for these decisions to be made; awareness of linguistics, semantics, semiotics, and related fields must be introduced.

I hope it is clear that simply introducing new elements into HTML is not a solution to the problem of expanding semantics in HTML.

Let's not rush into an easy decision - with the change in the “climate” all this will burden our grandchildren with a problem, as now. At least let's leave them as good HTML as possible.

Also popular now: