International Standard Data Exchange SDMX (Statistical Data and Metadata eXchange)
There is very little information about SDMX on the Russian-language Internet, despite the fact that this standard has long been used to publish and share data from many countries and international organizations. The standard development initiative was launched by seven international organizations working with statistical data, which became the sponsors of the development. The main goal was to simplify the exchange of statistical data between such organizations, the creation of a standard for such an exchange and a description of the business process for the implementation of this standard. A unified approach not only allows you to simplify access to statistical data, but also using metadata (data about data) makes understanding of their meaning and content more accessible.
SDMX initiative stands main site site sdmx.org , it also contains a list of approved cross-domain concepts, reference books and classifiers. Each organization joining the standard can expand and supplement it using a special administrative registration procedure.
The standard is not a strict guide to action; organizations themselves choose which elements of SDMX they will use and for what purposes.
Версия 1.0 была одобрена в сентябре 2004 года и принята как техническая спецификация ISO (ISO/TS 17369:2005) в апреле 2005 года.
В ноябре 2005 года была готова и одобрена версия 2.0, которая полностью совместима с версией 1.0, но добавила возможность обмена ссылочными(описательными) метаданными.
Версия 2.1 (текущая на 2018 год) была выпущена в мае 2011 года и в 2013 году была издана как международный стандарт ISO 17369.
Позднее были описаны стандарты обмена в форматах CSV и JSON.
The SDMX standard description contains the following components:
- Information Model
- XSD diagrams describing the structure, content model and data types
- Content Guidelines (Content-oriented Guidelines)
- A set of programs and tools for working with SDMX
The SDMX information model is the basis of the standard. It is represented by concepts ( CONCEPT ), restrictions ( CONSTRAIN ), rules, operations for determining the format and composition of the statistical data disclosed by the organization. Within this article, it is not planned to fully describe all SDMX entities, only the main components.
SDMX Information Model
How are statistics different from ordinary data? Yes, in general, nothing.
Statistical data - a set of ordered, classified data on a mass phenomenon or process. They are characterized by a set of dimensions (concepts, in terms of SDMX), one of which is usually the time period. BI tools are commonly used to process and analyze such data.
Statistical observation is a set of specific values of concepts that uniquely characterize each unit of a set of data sets.
The number "208.36" is a statistical observation defined by a set of concepts (all data is fictional)
In SDMX, the concept is the basic object of the structure and is a qualitative characteristic of statistical observations. Concept values can be a number, string, date, or values from codebooks ( CODELIST ). This view can be overridden in the Data Structure Definition when the concept is used as a dimension or attribute.
Code directories are a simple key-value list. The list lists the set of values that will be used in the presentation: indicators, attributes, and other elements of the SDMX structural part. They are complemented by other structural metadata, which may reflect the language-specific description and hierarchical organization of codes.
The coding principle of SDMX structural objects is defined in the standard: uppercase Latin characters, numbers, and underscore are allowed. In addition, the supported versioning of structures.
Description of the data structure Data Structure Definition ( DSD ) determines the appropriate composition and order of concepts for the formation of the final data set ( DATASET ). Each concept included in the structure is given a definition of its role in the data set:
- Measurement ( DIMENSION ) - the main identifier of the data. The set of values for all measurements, except for the time one, forms a unique code ( CODE ) of the series within one data structure.
- Attribute ( ATTRIBUTE ) - provides an additional description for either a dataset or a specific observation. An example of an attribute could be a unit of measurement or a status of observation (preliminary, predicted, revised, etc.).
- Directly value ( MEASURE ) observation.
Thus, the example above can be described by the following data structure:
|unit of measurement||Attribute||Directory|
A data set ( DATASET ) is a collection of homogeneous data that has a common DSD structure. May contain time series ( time series ) or multiple episodes at a time (cross-sectional data - Cross-Sectional the Data ).
CONCEPTS, CODELISTS, DSD and DATASET communications
Sample dataset from the website of the European Central Bank. The “Key” field contains a set of measurements for each time series, separated by a dot, they form a unique key
Time series data
In SDMX, metadata (Metadata) is divided into two groups:
- Structural metadata ( Structural metadata ) - is a set of concepts used to describe the identification and statistical data and metadata
- Reference metadata (or explanatory metadata ) is a large set of concepts that define and qualify data sets and that usually describe not the observation or the data series, but the entire data set or even the organization that provides the data. Reference metadata usually has a text or HTML format and uses concepts that describe the content, methodology and quality of the data.
The metadata structure description Metadata Structure Definition ( MSD ) includes information on how metadata sets are organized that contain reference values (similar to DSD). In particular, MSD describes what is included in the exchange of metadata and how concepts relate to each other, how they will be shown (as text or values from the directory) and with what type of object (agency, dataflow, data provider, dataset, etc.) they are related.
A metadata reference set ( METADATASET ) is information that directly describes a statistical approach, an organization that provides data or a data structure, a publication calendar, data quality, and so on, according to the metadata structure.
Submission of reference metadata on the European Central Bank website
Content-orientation guidelines are a set of recommendations within the SDMX standard. Their goal is maximum compatibility in the exchange of data and metadata between organizations. Their use between statistical organizations is encouraged as much as possible. The main documents are:
- List of interdomain concepts
- Statistical domain-specific areas
- Common Metadata Dictionary
The list of cross-domain concepts ( Cross-Domain Concepts ) contains a list of statistical concepts that relate to the statistical process and data quality. This list is based on the concepts used by international sponsoring organizations. It is not exhaustive and will be supplemented in the future.
Concepts can be used for both data and metadata. Each concept has a unique code and description of the context in which the concept can be used, as well as a presentation in the SDMX standard.
Statistical domain-oriented domains ( Content-Oriented Domains ) is a top-level classification based on the work of the United Nations Economic Commission for Europe (UNECE) on statistical domains. The classification offers a starting point in the organization of the exchange of statistical data and metadata.
Metadata Common Vocabulary ( MCV ) Metadata Common Dictionary contains concepts and related measurements used in structural and reference metadata of international organizations and national agencies. MCV is a dictionary that recommends the use of common terminology in order to simplify communication and understanding. MCV is closely tied to cross-domain concepts and also contains all of these concepts, indicating their definition and context description.
IT tools for working with SDMX
The list of tools for working with SDMX is presented on the website sdmx.org .
The main tool for working with structural metadata is the development of the company Metadata Technology - Fusion Registry . Works as a web application. There are two versions - Community (free version with limited capabilities) and Enterprise Edition (paid). This software uses the International Monetary Fund sdmxcentral.imf.org as a single registrar (a single point of collection and dissemination of data and metadata) . This software also uses the SDMX community registry.sdmx.org .
The latest version of Fusion Registry has almost completely implemented all the functionality of the standard. The application can also work as an SDMX recorder. Unfortunately, there is no possibility of generating data and metadata in SDMX format.
Data Structure Wizard - Java application for creating structural metadata versions 2.0 and 2.1, supports the creation of all basic SDMX entities.
The SDMX converter is the main tool for working with SDMX data created by Eurostat. Allows you to create a data set (but not metadata) from Excel, CSV, FLR format files, as well as convert data between various SDMX formats.
Instead of conclusion
Standardization of statistical information within the SDMX standard greatly simplifies data distribution and analysis. Using web-services allows you to simplify the processing of files of information and ensure the connection of adjacent systems, giving anyone the opportunity to obtain and compare macroeconomic indicators of interest to different countries of the world. These advantages of the SDMX standard underlie the interdepartmental project currently being implemented in Russia to introduce the standard into the dissemination of statistical data both within the framework of information exchange with international organizations and within the framework of providing data to an unlimited circle of users using portal technologies.