
How the world of semantic microdistribution works
I work in a semantic web team at Yandex. We are committed to creating products based on semantic markup, making our own extensions and participating in the development of the Schema.org standard.
The world of semantic markup is not quite simple and at first glance it’s not always logical. In order to make life easier for those who want to understand it, we decided to write a story about what markup happens, what gives and how to implement it. By micro-marking (or semantic markup) we mean marking a page with additional tags and attributes in the tags that indicate to the search robots what is written on the page. Micro-markup consists of a dictionary and syntax. Vocabulary

- this is a kind of "language", a set of classes and their properties, with the help of which the essence of the content on the page is indicated. For example, a dictionary determines with what term the name should be indicated - “name”, “title” or “n”.
Syntax is a way to use such a language, i.e. dictionary. It determines by which tags and how entities and their properties will be indicated, for example, on web pages.
Semantic markup developed in stages, at one time various initiative groups took up the development of the concept. And as a result, we got a vinaigrette from different dictionaries and syntaxes - there are quite a lot of them and it is far from easy to deal with all of them first.

In this article we will analyze the most common dictionaries:
Open Graph is a dictionary developed by Facebook so that any site can become part of this social network and display beautifully in it. OG shows extended site links.
Schema.org is a dictionary that is jointly developed by the largest search engines so that webmasters do not have to mark up separately for each search engine. Schema.org markup allows sites to receive special snippets in search results.
Microformats were developed by W3C enthusiasts who wanted to make their standard using basic HTML elements. Often there are difficulties with the difference in microformats and microformats - we note right away that this is not the same thing. Microformats are one of the micro-markup dictionaries, just like Schema.org, Open Graph or FOAF. The only difference is that microformats are a unified syntax and vocabulary standard. Whereas micro-markup, as we said above, is a collective term for a way of enriching a page with semantic data.

We will describe the idea of creation, the development process, the described entities and properties, and give small examples of markup for each dictionary. And in the following articles, we will write about syntaxes, products, and methods for implementing micro-marking.

Open Graph (OG) is the most common and easiest dictionary. Now Open Graph is most often used so that published links from sites are expanded, beautiful and understandable. With OG markup, links will be so displayed on all popular social networks.
Also, the Open Graph markup is actively used by applications for Facebook - it allows users to reflect actions from applications on their pages.
Thanks to OG, you can watch videos, read a brief description of the article and quickly understand the essence of the information that friends share while viewing endless news feeds. In addition to Facebook, Open Graph markup is also recognized by Vkontakte , Google+ , Twitter , LinkedIn, Pinterest, and others.
The dictionary itself is quite easy to use - to start using you need 4 properties:
For example, the Open Graph markup for a person’s description looks like this:
Here, the robot recognizes that the page is dedicated to a man named Yuri Gagarin, it has a link to his photo. Here, a property such as url is indicated as the canonical URL of the page.
In the og: type tag, in addition to the "profile" type, various types of entities can be specified (which also have their own properties):
If you do not use such markup on the site page, then when publishing the link on Facebook, the system will try to build a preview in any case. But, as a rule, this is far from being so successful - instead of a picture for the article, the logo from the site may be reflected, the title can be replaced by the name of a category of the site, and a text from the history of the company that will not reflect the essence of the article will fall into the brief description of the article ( and is unlikely to please the user).
In addition, search engines recognize the OG dictionary, and in some cases even supplement it.

Schema.org is a dictionary that appeared on the initiative of search engines in 2011. It is supported by Yandex, Google, Bing and Yahoo!
Schema.org also provides sets of classes that describe various entities and their properties. But if OG and Microformats.org have dozens of such classes, there are already hundreds of them in Schema.org. All classes have their place in the tree hierarchy .
This is a vibrant and flexible dictionary. New entities are actively discussed before the addition: for this, members of the initiative group meet weekly and discuss the implementation, expansion and use of schemes.
The most general type of entity is Thing , which has subtypes. Consider a few of them:
The process of creating and introducing new types is quite fascinating and in some cases very interesting and unexpected. In the next discussion, it became clear that it was far from easy to make the implemented schemes coincide with the Russian mentality and international concept of beauty.
From our experience. For almost a year 7 new fields were introduced into the type schema.org/PeopleAudience, since the doubts of politically correct Europeans and Americans knew no bounds: “How can I indicate the maximum age of the target audience? The fact that a man over 30 does not mean that he is not interested in books for little girls! ”Ok, the proposed fields maxAge and minAge turn into suggestedMaxAge and suggestedMinAge. Everything turned out to be difficult with the floor. It was not possible to convince that the gender can be unambiguously specified - it is politically incorrect. So gender turned into suggestedGender.
So long, painstakingly, every property and every type has been introduced - after all, the dictionary, in addition to covering the field of use as much as possible, to be international, must also reflect the interests of all participants and be unambiguous from the point of view of different countries and cultures. And yet, it’s always easier to implement a new property or type than to delete or change, because when you delete, you need to do something with those who have already implemented these fields or types.
It also provides the opportunity to expand the dictionary on the initiative of users and webmasters.
There is a public newsletter in English public-vocabs@w3.org, created to discuss general issues, suggestions and error messages, where you can also write a letter with a question about the markup, if you are unable to implement something. There is an extension mechanism , and since May 2011, you can use lists on external resources to indicate various properties.
So if you want to take part in the development of semantic markup, in particular the Schema.org dictionary, you have such an opportunity;)
An example of Schema.org markup for the type Person :
In this markup, the search engine recognizes that a person named Yuri Gagarin is an astronaut and is a colleague of Valentina Tereshkova. A lot of other data is also indicated: his award, nationality, date of death, dating and others - some of these properties can only be specified using the Schema.org dictionary. There are two links marked up using the “sameAs” and “url” properties, where in the first case a page with reliable information about the person is indicated, and in the second - a link to a personal site.
I would like to note once again that Schema.org is an initiative of search engines. And the development of the dictionary will depend on the creation of products by search engines for sites. Therefore, do not take this dictionary as an attempt to lead to a single ontology all that exists in the world. Everything that exists on the Internet is possible. But if it will be needed by search engines.
And search engines are certainly interested in creating a large number of products for sites based on Schema.org, including for Russian-speaking ones.
You can get acquainted with the full description of the dictionary on the official website . There is an unofficial and yet incomplete translation of the standard into Russian on the site .

Microformats.org (Microformats) is an open standard created in 2007 by a community of enthusiasts. This community really wanted to create a standard for semantic markup of sites using previously existing technologies. Six years ago, this was a definite plus of the standard, since it was easier to implement, but now adding markup for microformats is not easier, and in some cases more complicated than other dictionaries. Compared to OG and Schema.org, it is used less and less.
There are currently about 10 common microformat specifications for several subject areas. Some of them are completed, but most of them are at the draft stage. There are microformats for publishing information about organizations, products, reviews, events, and many other entities. Each entity has its own properties.
New microformats are being developed in open mode; there is a separate microformat wiki . Due to the fact that when creating each microformat, the founders seek to agree and find a compromise with everyone, the process lasts a very long time, and sometimes does not end. Because of this, modified microformats can be counted on the fingers, and there are a lot of those that have draft status.
Currently, search engines support the following microformats:
Their use makes it possible to show special snippets in the results.
One of the most popular microformats is hCard. The hCard microformat is universal for describing people and organizations; it contains basic information about both.
Using hcard, you can specify properties such as:
This is part of the approved properties, there are also many that are under discussion. Here's how hcard is used in marking up a person’s description:
Here, the search engine understands that this is an organization or a person named Yuri Gagarin - this is a pilot-cosmonaut who worked in the USSR Air Force. His date of birth is also known and there is a note “The First Man in Space”. The url property here points to the home page of the described object.
In 2013, a new initiative was announced - microformats 2 , in which there are innovations in class names and simplifications for using properties.
Microformats used to be quite common, but today, especially against the background of other fast-growing dictionaries, they looksenseless and mercilessoutdated. In addition, the use of microformats limits their format - this is a combined standard of syntax and vocabulary in which other dictionaries cannot be used. (The next article will be about what the syntax is).
We examined the most common and developed dictionaries. But there are still quite a lot of highly specialized, small dictionaries that were also created to solve the issue of data transfer. I’ll tell you about the most interesting of them.
The FOAF dictionary (an acronym for Friend of a Friend - “each other”) specializes in relationships between people, their interactions and associations.
It contains such classes as Agent, Organization, Group, Person. They can have various properties that describe people or groups in life. There are usual ones - age, gender, surname, birthday, and also there are properties:
Layout Example:
In the search for blogs from Yandex , this dictionary is used. Its extension was added to it , which helps to accurately describe user blogs (in RuNet, this extension is mainly used).
The Data Vocabulary dictionary was developed by Google. At the moment, it is no longer developing, since the entire development flowed smoothly into Schema.org.

Previously, types such as Person , Organization , Breadcrumb , Review , Product , Address were supported - we can say that they became prototypes of the Schema.org classes .
The Dublin Core dictionary (or Dublin core) is used in electronic libraries and documents. The Dublin core appeared on the initiative of a group of library and museum specialists.
Dublin Core appeared in 1995 with a basic set of 15 elements, such as Title, Creator, Subject, Description, Publisher, Rights, etc. Now there are many different classes and properties.
Since 2011, in Russia, the state standard GOST R 7.0.10-2010 ( ISO 15836: 2003 ) “National Standard of the Russian Federation. System of standards on information, librarianship and publishing.
Dublin Core Metadata Elements Set Dublin Core Layout Example
The Good Relations Dictionary has been used since 2008 as a standard for describing e-commerce products. The creators expected that the use of such markup would give a structured presentation of goods and services in search engines.
Using the dictionary, you can specify special properties for
Good relations describes the following areas of Internet commerce: Books (Books), Cars (Auto), Classified ads (Announcements), Concert tickets (Concert Tickets), Consumer electronics (Home Appliances), Guided tours and outdoor events (Excursions and events) and others .
In RuNet this dictionary is practically not used, but it is found on some large foreign sites ( Volkswagen UK , Strobelight-Shop , lux-case.se ). Of the search engines, the GR markup is recognized by Google .
Example markup using Good Relations:
The Good Relations standard has been integrated into Schema.org since November 2012 , the dictionary also has its own validator
To obtain all the necessary data from sites in Yandex, their own extensions are developed for some dictionaries .
For example, this was needed for markup:
In the following posts, we want to talk in detail about other sections of semantic markup - for example, syntaxes, products, and implementation examples. If you are interested in any other topics - share this in the comments.
The world of semantic markup is not quite simple and at first glance it’s not always logical. In order to make life easier for those who want to understand it, we decided to write a story about what markup happens, what gives and how to implement it. By micro-marking (or semantic markup) we mean marking a page with additional tags and attributes in the tags that indicate to the search robots what is written on the page. Micro-markup consists of a dictionary and syntax. Vocabulary

- this is a kind of "language", a set of classes and their properties, with the help of which the essence of the content on the page is indicated. For example, a dictionary determines with what term the name should be indicated - “name”, “title” or “n”.
Syntax is a way to use such a language, i.e. dictionary. It determines by which tags and how entities and their properties will be indicated, for example, on web pages.
Semantic markup developed in stages, at one time various initiative groups took up the development of the concept. And as a result, we got a vinaigrette from different dictionaries and syntaxes - there are quite a lot of them and it is far from easy to deal with all of them first.

In this article we will analyze the most common dictionaries:
- Open graph;
- Schema.org;
- Microformats;
- And a few other dictionaries: FOAF, Dublin Core, Data Vocabulary and Good Relations.
Open Graph is a dictionary developed by Facebook so that any site can become part of this social network and display beautifully in it. OG shows extended site links.
Schema.org is a dictionary that is jointly developed by the largest search engines so that webmasters do not have to mark up separately for each search engine. Schema.org markup allows sites to receive special snippets in search results.
Microformats were developed by W3C enthusiasts who wanted to make their standard using basic HTML elements. Often there are difficulties with the difference in microformats and microformats - we note right away that this is not the same thing. Microformats are one of the micro-markup dictionaries, just like Schema.org, Open Graph or FOAF. The only difference is that microformats are a unified syntax and vocabulary standard. Whereas micro-markup, as we said above, is a collective term for a way of enriching a page with semantic data.

We will describe the idea of creation, the development process, the described entities and properties, and give small examples of markup for each dictionary. And in the following articles, we will write about syntaxes, products, and methods for implementing micro-marking.
The most common dictionaries on the Internet
Open graph

Open Graph (OG) is the most common and easiest dictionary. Now Open Graph is most often used so that published links from sites are expanded, beautiful and understandable. With OG markup, links will be so displayed on all popular social networks.
Also, the Open Graph markup is actively used by applications for Facebook - it allows users to reflect actions from applications on their pages.
Thanks to OG, you can watch videos, read a brief description of the article and quickly understand the essence of the information that friends share while viewing endless news feeds. In addition to Facebook, Open Graph markup is also recognized by Vkontakte , Google+ , Twitter , LinkedIn, Pinterest, and others.
The dictionary itself is quite easy to use - to start using you need 4 properties:
- og: title - the name of the object.
- og: type - object type, for example, “video.movie” (movie). Depending on the type, you can specify other properties.
- og: image - URL of the image describing it.
- og: url is the canonical URL of the object to be used as the permanent ID.
For example, the Open Graph markup for a person’s description looks like this:
...
...
Here, the robot recognizes that the page is dedicated to a man named Yuri Gagarin, it has a link to his photo. Here, a property such as url is indicated as the canonical URL of the page.
In the og: type tag, in addition to the "profile" type, various types of entities can be specified (which also have their own properties):
- Music (subtypes of music.song, music.album, music.playlist, music.radio_station) - for songs you can specify the duration, album, artist, for albums - songs, artists, release dates.
- Video (video.movie, video.episode, video.tv_show, video.other) - movies can have actors and their roles, directors, screenwriters, duration.
- No vertical (article. Book, profile, website) - here are indicated those types that do not fit into the above categories. The article can specify tags, author, publication date. Profiles - gender, last name, first name.
If you do not use such markup on the site page, then when publishing the link on Facebook, the system will try to build a preview in any case. But, as a rule, this is far from being so successful - instead of a picture for the article, the logo from the site may be reflected, the title can be replaced by the name of a category of the site, and a text from the history of the company that will not reflect the essence of the article will fall into the brief description of the article ( and is unlikely to please the user).
In addition, search engines recognize the OG dictionary, and in some cases even supplement it.
Schema.org

Schema.org is a dictionary that appeared on the initiative of search engines in 2011. It is supported by Yandex, Google, Bing and Yahoo!
Schema.org also provides sets of classes that describe various entities and their properties. But if OG and Microformats.org have dozens of such classes, there are already hundreds of them in Schema.org. All classes have their place in the tree hierarchy .
This is a vibrant and flexible dictionary. New entities are actively discussed before the addition: for this, members of the initiative group meet weekly and discuss the implementation, expansion and use of schemes.
The most general type of entity is Thing , which has subtypes. Consider a few of them:
- Action - describes an action that can be performed by someone specific (a person or organization). This action may additionally indicate the place, object and tools with which it was committed. Like any action, it can have a result, participants and a period of time during which it took place. With the help of this class that describes actions, Yandex.Islands are developed , and the Gmail Actions project is implemented .
- CreativeWork - describes the features of creative work. Videos, pictures, recipes, diets - everything can be described using this type. All creative works can indicate the author, genre, short description, as well as reviews, audience, references and much more.
- Event - like any event, here you can describe the venue, date, participants, speakers, etc.
- Product is all that is sold and bought. Some paid services (such as a haircut) can also be described by the Product type. A product can have a rating, brand, color, audience, price, model, etc.
- Person - as indicated in the Schema.org documentation, it can be a person - living, fictional or already dead - and also “undead” (apparently, the creators needed to describe more zombies and there was no more suitable type). People can have contact information, information about work, family, relationships and much more.
The process of creating and introducing new types is quite fascinating and in some cases very interesting and unexpected. In the next discussion, it became clear that it was far from easy to make the implemented schemes coincide with the Russian mentality and international concept of beauty.
From our experience. For almost a year 7 new fields were introduced into the type schema.org/PeopleAudience, since the doubts of politically correct Europeans and Americans knew no bounds: “How can I indicate the maximum age of the target audience? The fact that a man over 30 does not mean that he is not interested in books for little girls! ”Ok, the proposed fields maxAge and minAge turn into suggestedMaxAge and suggestedMinAge. Everything turned out to be difficult with the floor. It was not possible to convince that the gender can be unambiguously specified - it is politically incorrect. So gender turned into suggestedGender.
So long, painstakingly, every property and every type has been introduced - after all, the dictionary, in addition to covering the field of use as much as possible, to be international, must also reflect the interests of all participants and be unambiguous from the point of view of different countries and cultures. And yet, it’s always easier to implement a new property or type than to delete or change, because when you delete, you need to do something with those who have already implemented these fields or types.
It also provides the opportunity to expand the dictionary on the initiative of users and webmasters.
There is a public newsletter in English public-vocabs@w3.org, created to discuss general issues, suggestions and error messages, where you can also write a letter with a question about the markup, if you are unable to implement something. There is an extension mechanism , and since May 2011, you can use lists on external resources to indicate various properties.
So if you want to take part in the development of semantic markup, in particular the Schema.org dictionary, you have such an opportunity;)
An example of Schema.org markup for the type Person :
Юрий Гагарин
Летчик-космонавтВалентина ТерешковаРоссия
Военно-воздушные силы СССРСергей КоролевГерой Советского союзаСтраница на ВикипедииСайт Юрия Гагарина
In this markup, the search engine recognizes that a person named Yuri Gagarin is an astronaut and is a colleague of Valentina Tereshkova. A lot of other data is also indicated: his award, nationality, date of death, dating and others - some of these properties can only be specified using the Schema.org dictionary. There are two links marked up using the “sameAs” and “url” properties, where in the first case a page with reliable information about the person is indicated, and in the second - a link to a personal site.
I would like to note once again that Schema.org is an initiative of search engines. And the development of the dictionary will depend on the creation of products by search engines for sites. Therefore, do not take this dictionary as an attempt to lead to a single ontology all that exists in the world. Everything that exists on the Internet is possible. But if it will be needed by search engines.
And search engines are certainly interested in creating a large number of products for sites based on Schema.org, including for Russian-speaking ones.
You can get acquainted with the full description of the dictionary on the official website . There is an unofficial and yet incomplete translation of the standard into Russian on the site .
Microformats.org

Microformats.org (Microformats) is an open standard created in 2007 by a community of enthusiasts. This community really wanted to create a standard for semantic markup of sites using previously existing technologies. Six years ago, this was a definite plus of the standard, since it was easier to implement, but now adding markup for microformats is not easier, and in some cases more complicated than other dictionaries. Compared to OG and Schema.org, it is used less and less.
There are currently about 10 common microformat specifications for several subject areas. Some of them are completed, but most of them are at the draft stage. There are microformats for publishing information about organizations, products, reviews, events, and many other entities. Each entity has its own properties.
New microformats are being developed in open mode; there is a separate microformat wiki . Due to the fact that when creating each microformat, the founders seek to agree and find a compromise with everyone, the process lasts a very long time, and sometimes does not end. Because of this, modified microformats can be counted on the fingers, and there are a lot of those that have draft status.
Currently, search engines support the following microformats:
- hCard - markup format for contact information (addresses, phone numbers, etc.);
- hRecipe - a format for describing recipes;
- hReview - markup format for reviews;
- hProduct - markup format for products.
Their use makes it possible to show special snippets in the results.
One of the most popular microformats is hCard. The hCard microformat is universal for describing people and organizations; it contains basic information about both.
Using hcard, you can specify properties such as:
- n is the name;
- bday - date of birth;
- geo - geographical location;
- tz - time zone;
- uid - reference to an identical entity;
- photo - image;
- adr - address;
- org is the name of the organization.
This is part of the approved properties, there are also many that are under discussion. Here's how hcard is used in marking up a person’s description:
Юрий ГагаринЛетчик-космонавт at Военно-воздушные силы СССРСтраница Ю.Гагарина9 марта 1934Первый человек в космосе
Here, the search engine understands that this is an organization or a person named Yuri Gagarin - this is a pilot-cosmonaut who worked in the USSR Air Force. His date of birth is also known and there is a note “The First Man in Space”. The url property here points to the home page of the described object.
In 2013, a new initiative was announced - microformats 2 , in which there are innovations in class names and simplifications for using properties.
Microformats used to be quite common, but today, especially against the background of other fast-growing dictionaries, they look
We examined the most common and developed dictionaries. But there are still quite a lot of highly specialized, small dictionaries that were also created to solve the issue of data transfer. I’ll tell you about the most interesting of them.
Other dictionaries
Foaf
The FOAF dictionary (an acronym for Friend of a Friend - “each other”) specializes in relationships between people, their interactions and associations.
It contains such classes as Agent, Organization, Group, Person. They can have various properties that describe people or groups in life. There are usual ones - age, gender, surname, birthday, and also there are properties:
- linked to social networks: skypeID, yahooChatID. jabberID.
- specific ones: for example, knows - to describe how people meet each other or myersBriggs, which reflects the results of the Myers-Briggs career guidance test (yes, we also only found out what it is).
Layout Example:
Jimmy Wales Jimbo Angela Beesley
In the search for blogs from Yandex , this dictionary is used. Its extension was added to it , which helps to accurately describe user blogs (in RuNet, this extension is mainly used).
Data vocabulary
The Data Vocabulary dictionary was developed by Google. At the moment, it is no longer developing, since the entire development flowed smoothly into Schema.org.

Previously, types such as Person , Organization , Breadcrumb , Review , Product , Address were supported - we can say that they became prototypes of the Schema.org classes .
Dublin core
The Dublin Core dictionary (or Dublin core) is used in electronic libraries and documents. The Dublin core appeared on the initiative of a group of library and museum specialists.
Dublin Core appeared in 1995 with a basic set of 15 elements, such as Title, Creator, Subject, Description, Publisher, Rights, etc. Now there are many different classes and properties.
Since 2011, in Russia, the state standard GOST R 7.0.10-2010 ( ISO 15836: 2003 ) “National Standard of the Russian Federation. System of standards on information, librarianship and publishing.
Dublin Core Metadata Elements Set Dublin Core Layout Example
Song of the Open Road I think that I shall never see
A billboard lovely as a tree.
Indeed, unless the billboards fall
I'll never see a tree at all.
Good relations
The Good Relations Dictionary has been used since 2008 as a standard for describing e-commerce products. The creators expected that the use of such markup would give a structured presentation of goods and services in search engines.
Using the dictionary, you can specify special properties for
- Companies - contact details, location, logo;
- Store - address, opening hours, phone;
- An individual product - product category, short description, code, payment methods for delivery, as well as functions for services (repair, installation, rental, etc.)
Good relations describes the following areas of Internet commerce: Books (Books), Cars (Auto), Classified ads (Announcements), Concert tickets (Concert Tickets), Consumer electronics (Home Appliances), Guided tours and outdoor events (Excursions and events) and others .
In RuNet this dictionary is practically not used, but it is found on some large foreign sites ( Volkswagen UK , Strobelight-Shop , lux-case.se ). Of the search engines, the GR markup is recognized by Google .
Example markup using Good Relations:
HTML for Idiots - Used Copy, $ 9.99Price: $9.99
The Good Relations standard has been integrated into Schema.org since November 2012 , the dictionary also has its own validator
Yandex extensions for dictionaries
To obtain all the necessary data from sites in Yandex, their own extensions are developed for some dictionaries .
For example, this was needed for markup:
- interactive responses in Yandex.Ostrov (to describe forms and buttons );
- dictionary entries ( terms and scientific articles );
- rating of organizations ;
- target audience .
In the following posts, we want to talk in detail about other sections of semantic markup - for example, syntaxes, products, and implementation examples. If you are interested in any other topics - share this in the comments.