About why “open data of Sberbank” is not open data and what should we do with it

    A few days ago, a significant event took place, and one of the largest companies in Russia announced that it is now publishing open data on its website. This company is Sberbank and the corresponding section on their website. The opening of the section was awarded a press release on their website and dozens of financial and non-financial media wrote about it as an important event.

    Has Sberbank really accomplished something incredible? Is this an ordinary occurrence and is what Sberbank did now open data? This is what will be discussed further.

    As an introduction

    Before continuing with Sberbank, let's go back to the term open data.

    1st official definition of the law 112-ФЗ (these are amendments to 8-ФЗ)

    Information posted by its owners on the Internet in a format that allows automated processing without prior changes by a person for the purpose of reusing it is publicly available information posted in the form of open data.

    Wikipedia definition 2
    Open data is a concept that reflects the idea that certain data should be freely accessible for machine-readable use and further publication without restrictions on copyright, patents and other control mechanisms. Data can be freed from copyright restrictions with free licenses such as Creative Commons licenses. If a data set is not in the public domain or is not connected by a license giving free reuse rights, then such a data set is not considered open, even if it is posted in a machine-readable form on the Internet.

    Third of the Charter of Open Data
    Open data is digital data that is made available with the technical and legal characteristics necessary for it to be freely used, reused, and redistributed by anyone, anytime, anywhere.
    Or in chaotic Russian:

    Open data is digital data made publicly available with technical and legal characteristics that must be freely used, reused and shared by anyone, anytime, anywhere

    Also, open data has clearly formulated principles for their publication, reflected precisely in the charter of open data.
    These principles:

    1. Default openness
    2. Timely and complete
    3. Accessible and convenient
    4. Comparable and integrable
    5. To improve governance and citizen engagement
    6. For development and innovation

    For those 7 years that I personally deal with the topic of open data in Russia, I heard and saw how very, very much that was not called open data. The most outstanding stupidity question was that when the definition is given through the description of “freely accessible machine-readable data”, the question “Are machine-readable data the ones that I can read in a machine?”.

    But in all definitions, it is important to remember one thing - open data is focused on a technologically qualified consumer . The state does not produce new information products itself; it makes it possible for start-ups, IT companies and social activists to do this.

    Why publish open data?

    To understand this particular case, it is important to know why do data owners publish them at all? Especially companies and government agencies - sometimes this may seem completely strange.

    PR. Commitment or Benefit

    These are the three main reasons why anyone publishes the data (questions of fan and vanity, I deliberately leave out the brackets).

    And if you see the activity of an organization in open data, and indeed in matters of openness and transparency, then look for the answer in one of these three reasons.


    For example, how PR is based on open data. Its main distinctive ability is orientation to the mass consumer , mass voter, mass citizen.

    Technology and data issues are on the sidelines. Issues of attendance, media coverage, the number of articles with a mention - go to first place.

    A living example is the Moscow open data portal - city authorities distribute news about publications even if there is some meaningless 28-row data set .


    Obligation or coercion is when open data is published because the law requires its publication. The data owner may not always be interested in openness, but he complies with the requirements of the law and publishes them.

    For example, the Central Bank collects reporting forms from banks and discloses in a special section on the site - this is a statutory obligation of banks and the Central Bank.

    Another example is the 112-ФЗ and 8-ФЗ mentioned above. Authorities are required to disclose basic data sets and publish them precisely as their obligations for the non-fulfillment of which they are accountable to the law.

    Commitment is the foundation of openness. For this reason, many of those who are required to disclose data do not take additional steps to make them available. They only comply with the mandatory requirements, but do not write about this advertising press releases.

    For example, if the Moscow Government publishes a dataset with the addresses of 28 military traders and distributes it to news sites, then it is not at all a fact that, for example, they will publish income statements of city officials as open data and also disseminate it through the media.

    In other words, the obligation is executed quietly and quietly, as much as possible.


    Why would anyone benefit from publishing their own data? It would seem - possess and be silent, someone else does not need to know.

    However, there are reasons why open data is published by government and commercial entities. For example, the Kaggle Datasets section is filled up in search of new finds, solutions and insights that require thousands of data scientists.

    Or why the Federal Treasury has been distributing data from the government procurement portal for many years through an FTP server (even before the stories with open data) - because it is easier and cheaper to distribute the database needed by hundreds of counterparties in the federal subjects.

    Some companies organize hackathons and look for employees. Others publish open source community reputation data, as Google does in their Transparency Report.

    So what is Sberbank?

    If you look again at the Sberbank open data section , you will find the following features:

    No free licenses

    Instead of freedom of use and distribution there is only a disclaimer that sounds like

    The information provided is the result of an analysis of the data of Sberbank PJSC, Q4, 2016. The data are not management, accounting, financial statements. When using links to the specified information, mention of Sberbank PJSC is required. Not an advertisement.

    What is not even closely related to free licenses

    No data sets

    To download the data you need to find a special button on the chart and there in the menu still find the upload section in XLSX, CSV or JSON. The peculiarity is that all these downloads are client-side downloads from Javascript files.

    All data, in fact, is stored in 13 Javascript files from http://www.rdatascience.ru/opendata/data1.js to http://www.rdatascience.ru/opendata/data13.js

    And uploading to CSV and so on is done using Javascript code. And it is impossible to deflate any data set directly. The emphasis is on visualization, not on working with these analysts.

    Missing description of sets

    Несмотря на то что на сайте даже используют термин "Паспорт датасета" который активно используется в реальных паспортах наборов данных на государственных порталах, конечно же ничего такого там нет. Ни информации об ответственных, ни описания структуры наборов — ничего нет

    Продажа услуг и смешение с большими данными

    Раздел заканчивается продажей исследований Сбербанка и тем что все это сделано на больших данных. А сам формат подачи больше похож на лонгрид какого-то инфобизнеса, а не раздел открытых данных.


    From all this, only one conclusion can be drawn - the goal of Sberbank for this section was only PR and nothing more. I just want to hope that someday Sberbank will find a form of working with open data that would benefit both them and the community. Because for now it’s more like trying to use a popular term to promote your commercial services

    Also popular now: