History of open data and Yandex Hackathon

    On September 14 - 15, the first Yandex Hackathon will be held in Moscow , the participants of which will create projects on the basis of open government data using Yandex technologies for two days and two nights.

    For many years now, I have been making Russian developers interested in working with open data. That’s why the Apps4Russia contest, organized by the Information Culture non-profit partnership, was created. This year, a nomination appeared for those who create applications on open data and Yandex technologies. These events prompted to systematically tell here about the history of open data, its sources, examples of use and many other important things. This is a graph from LJ eugenyboger

    image

    . The fact that now we can find out the detailed election results for each polling station is the norm, and more recently this was not the case even in very developed countries.

    Open Data: Background


    There are several definitions of open data. There is one that is given on Wikipedia: I myself translated it from English to bring in a Russian-language article. There is a definition on the website of the Government, which is given in the law . There are a few more definitions, but the point is as follows. Open data is information published by the organizations that own it (authorities, if it is open state data), provided in free form (i.e., non-burdensome free licenses) and in machine-readable form suitable for repeated automatic processing. There are some criteria that define data as open. A Creative Commons license is almost a prerequisite for open data.

    In principle, open data is not a new phenomenon; it has long existed in various forms, and the ideology of openness has been around for many years. Open source and free licenses appeared not five, not ten years ago, but much earlier. Especially in the scientific community with its research, for the results of which the ability to verify, verify, publish and work with them in every way is important. Research, as a rule, is a special format that is exclusively - what we now call - machine-readable.

    Today, developed nations around the world are striving for openness in a variety of ways. At a G8 meeting in the UK this June, the host proposed signing an Open Data Charter. It was also signed by Russia. The main principles that are spelled out in the Charter are openness of default data, timely publication in a machine-readable form, transparency and the obligation to ensure the conditions in which developers will create applications based on open data.

    “By accumulating huge amounts of data, authorities and businesses do not always share them so that they can be easily found, used and understood. These are missed opportunities. We have come to a turning point that portends a new era. People will be able to use open data to generate ideas and create services that will make our world a better place, ”the Charter says.

    Now, all G8 countries must declare their readiness to disclose information on crime rates, the register of companies, land transactions. The leaders in this business, of course, are Britain and the USA, who have been doing this for many years. But now many countries of the world have begun to publish open data, including Russia.

    This was influenced by the growth of all kinds of technology companies, and from this the growth of the value of data, the growth of the knowledge economy, the emergence of such large companies as Yandex, whose entire business is built on freedom of information. If every site on the Internet was paid, and it would be impossible to aggregate data, the problem of open data might not appear. Working with public domain information has influenced this very much.

    As a result, several trends came together, and this very phenomenon appeared - freedom of access to information and open data. It consists in the fact that information created primarily by the state, and generally by anyone, should, in principle, be available and also so that it can be reused. If someone conducted some research, and its results are presented in the table, we should not get the picture, but the table as it is, so that we can check it, use it, and maybe even make a business based on these results. If the state reveals some information about its activities, then it is useful for citizens not only to know about it, but also to do something based on it. Maybe it will have a social effect, maybe it will have an economic effect, maybe it will be the effect of “civilian control”, “Civil fight against corruption”, etc. But still, this is an economic effect, albeit in a slightly different form. Open data portals, on which huge amounts of information generated by the state are uploaded, are mainly created by governments. From it one can make something interesting and useful - this is how the ideology of openness is transformed into concrete products.

    But everything did not come from officials and the state, but from people who began to do this much earlier. In Britain, before the open data portals appeared, there were a bunch of different small groups of developers who started doing projects like “let's relink the state” - rewired state. Or, for example, ScraperWiki has long existed - a special engine with the help of which any person who knows a little programming in python can write programs and scripts and extract site data.

    Gradually, it became so massive that it didn’t matter whether the states opened the data or not - they somehow learned how to extract it. In the United States, before data.gov appeared, there were Sunlight Labs , the Knights Foundationwhich extracted data from Congress reports, converted PDF files to excel files, excel files were uploaded to the database and there they were already converted to .CSV. Strong public pressure led to the fact that in the Anglo-Saxon countries, officials and representatives of the authorities came to a state where they either do it or do it for them. And if David Cameron did not cling to the topic of open data, if he did not include it in the party’s program and didn’t come to power with it, then the green party would come, in which openness of data is now registered in the program. And this openness is not information, but data.

    image
    Infographics The Guardian Datablog

    And the right step for the state in such a situation is to try to lead the trend, and not to resist it. And it does so, trying to deploy it into those vectors that it considers priority. This is not so bad, but has its own specifics.

    In Russia, the situation is about the same. I have been engaged in open data since 2009, until which there have been no actions on the part of our state in this direction. For two years we were actively pushing the topic, and when it became completely clear that we had advanced to such an extent that we did not need the state, suddenly its representatives realized that it was better to lead this trend.

    Moscow has a certain claim to leadership in this - here, for example, they made the budget portal earlier than the feds did. In my opinion, the data placed there is imperfectly convenient, but you can work with them.

    Usually, the first to use open data are civic activists. For example, in the United States they compare congressmen among themselves, make up various ratings. Using transcripts of speeches, they find out how many words the congressman spoke during the quarter.

    Open Data Status


    Data usually exists in three conditional forms.

    First one. They are affordable and suitable for work. That is, the state or its owner ensures their machine readability. Here the entry threshold is minimal - we can take them and put them on some cards, apply them on a mobile phone. Everything is ready right away.

    The second one. The situation is worse: there is information in principle, but it must be extracted from various sites. For example, information on State Duma deputies is on the State Duma website, but in the form of web pages, it needs to be extracted.

    Information on water quality in the city of Moscow by region is on the Mosvodokanal website. But through a special service in which you must first enter the street, then the house number and only after that you will be given the area, the level of pollution, pollution levels for various indicators.

    In order to collect all this information, activists write various scrapbooks - programs that remove information from websites and turn it into some databases.

    The third. Information in some form exists, in principle, but is not available in public space. In general, all we do is try to achieve transparency of information. I am talking now not only about myself, but also about many other activists who are actively engaged in this in Russia (including commercial companies) and are trying to achieve openness of information, that is, the following:

    1. So that the data that is already published in a machine-readable form is suitable and convenient for work, so that there is a minimal amount of errors.
    2. So that the data that is not machine-readable now is converted. If they are published, let them make it useful, that’s the most important thing.
    3. So that what is not being published now appears in public space.


    For this, the so-called Open Data Council has appeared in our Open Government . The state has stated that it is ready to participate in this, some changes are being made to the laws, regulations. In principle, in order to start working on ensuring the openness of data and in order to use them, there are no restrictions.

    Open data sources


    Open data is not only public. This is largely the data of huge crowdsourcing Internet projects. Not everyone knows that, for example, all Wikipedia is available in the form of dumps. Or Wikidata . This is generally just an amazing ideological project. And DBpedia comes in from the other side. Wikidata is for people to gradually convert information into data, and DBpedia for sharpening algorithms so that previously infoboxes can be converted into connected data. The Freebase project , which is now bought by Google, was built entirely on DBpedia and Wikipedia. The guys just downloaded the data, made an interface that allows you to add something else additionally, and based on this we made a rather expensive product. OpenStreetMap

    Project. Likewise, huge data dumps are publicly available and can be used. There are several dozens of projects that are open as crowdsourcing and from which you can collect data. These are mainly various encyclopedias, reference books, user databases.

    For example, in France there are activists who monitor products and enter their ingredients, EAN and EPC codes in a separate database and distribute. Thus, a directory is created by which people with nutritional restrictions can understand what foods they can eat.

    That is, one part of the data is what activists create in different forms, in different forms, and the other is what the state provides. It is the largest data owner. And the third part is the data published by commercial and non-profit companies.

    The former usually publish them in two formats. Either under duress, or guided by social responsibility or other motivation. For example, some are so attracted to developers. Nike publishes machine readable information for its plants.

    How to use open data in the world


    Developers very often ask: “And what can be done on the basis of open data, what are some examples?” And I always suggest looking at what others have done. Just look at the competition sites NyCBigApps , Apps4Development , Apps4Berlin , Apps4Finland , Apps4SanFrancisco . Although not all examples of them can be transferred to Russia.

    The guys who created the project “ Do not eat here”, They didn’t even take open data, but parsed data from the New York Food Inspection website. They found where the addresses, company names and results of the verification are indicated on it, marked them on a map and made an application that works on the principle of the same Foursquare. It, based on the number of issued and unclosed prescriptions, shows where to go is not worth it. The application was even sold for some small fee and people installed it.

    There are a huge number of applications that are part of the City-Go-Round project . This is a small portal in the USA where information on transport companies and applications is aggregated based on their data - 2000 companies are collected in a separate list. 270 of them provide transport data on a regular basis in a special format - general transit feed specification (GTFS ). And thanks to this, hundreds of applications have been created on this data.

    There are, for example, new media projects like Storify . There already a huge amount of open data is loaded, which you can use in your mini-newspaper - to create harfics or other complex visualizations based on them. Thanks to this, you can supplement your stories. Storify creates an environment in which people themselves come up with how to use open data. In the same series, you can put a lot of projects that create infographics online, allow you to draw charts, load ready-made data and manipulate already open ones. This is Sacrato, Factual , the same FreeBase that Google bought from MetaWeb.

    image

    It’s not always possible to make money on your application, because the data you’ve used is not always enough to create a complete product. But you can monetize the result in other ways.

    Data is like some ingredients. If you do not have salt, the dish will not taste good, but you can eat it. If you have salt, then you can sell it more expensively, or those you feed will be more pleased. Sometimes the data can be the dish itself, and sometimes this very salt. That is, in any case, they, as a rule, are rarely self-significant. And a lot of projects that work on open data actually use them only as an addition.

    For example, in the USA and Great Britain real estate services are being transformed very quickly. In addition to the familiar criteria that everyone has long been providing, they began to show, for example, the criminal situation or weather data in the city where you plan to start living. Where does all this information come from? In the United States, weather data has been publicly available for the past twenty years. This is the most monetized open data in the world.

    Crime information is disclosed by police departments. Several dozen projects have already appeared that are based on it. Information on the environmental situation is also published. Again, it is either part of state monitoring, or commercial. Therefore, I always tell developers to think not only about what they can do on their own, but about what it will be possible to embed in the result of their work and how to make extra money from it.

    And one of the ways to apply your development is indirect monetization - selling what you have created. For example, the guys who did the Chicago Crime Crime Monitoring Project, sold it to MSN, which made it part of their portal.

    And the British are very proud that after the discovery of data on the success of heart operations in different hospitals, they have reduced the number of deaths - people began to choose hospitals based on this information.

    A huge number of startups that arise in the US on open data are created to complement open data with various existing ideas.

    Open data in Russia


    One of the most important things in working with open data is the convenient format. In Russia, this is often not respected. In addition, despite the fact that we have adopted a law on open data, many government agencies may be inattentive to the information on their websites. For example, it is often forgotten to update.

    Some open data in our country is published by commercial organizations. For example, the Russian Language Corps , which is supported by Yandex. Russian Railways publishes all information on the benefits that it provides. We can find out who received how many benefits, information on tariffs, financial statements. You just need to go to the websites of corporations and see what is published there. Schedule based on data on turnout in the mayoral election of Moscow

    image


    The USE, with all its shortcomings, has an important plus - the quality of education in schools can be measured. But the data are scattered, so there are no decent projects based on them. And it would be possible to make the “Pick a School” application or add this information to real estate services.

    Another part: it is housing and communal services. Moscow authorities began to disclose a lot of information about the housing and communal complex. The portal gorod.mos.ru has information on each house. If you parse the data from there to all houses, you can find out how many people complain, how quickly they respond to their complaints, etc. You just need to collect the database. And although the developers of the portal have not set such a goal for themselves, nothing prevents us from making it ourselves.

    Our country is one of the few where data on public procurement is fully disclosed. Processing them is not a very simple task, because it is big data. But they can make convenient services, for example, for suppliers.

    State data in Russia is now scattered across a bunch of portals. Each ministry, each federal agency has its own special section. We have several open data portals: the portal of Moscow, the portal of the Ulyanovsk region, now there will be a portal of the Tula region, Perm Territory, Perm. “Informkultury” has a hubofdata.ru portal , where we load tens of gigabytes of useful and not very data in bulk scripts. We have 3000 arrays there only according to statistics; data on votes of deputies of the State Duma, economic registers, all data of Moscow, all data of the Ulyanovsk region.

    image

    There is a similar portal - this is ar.gov.ru , which is maintained by the Ministry of Economic Development. They are now simply cataloging and cataloging everything that exists. Data on the budget of the city of Moscow is openly available - on a special portal budget.mos.ru , where there is even a section for developers.

    So far, the publication of open data is mandatory only for federal authorities. The process is progressing gradually. We have many laws that are not enforced. For example, federal law N 8-FZ- about the openness of information. God forbid, 10% of government agencies correspond to it 100%. The rest - in something in the little things - violate it. And not always consciously, but rather because of the negligence of people who maintain official sites. But the signed Charter and the adopted law on data openness indicate that working with them has already become part of state policy. Our peculiarity is that we do not know what information basically exists. For example, there is a transcript of speeches of deputies. Now it is not machine readable, but we have a machine readable version.

    If you have any ideas and you need help, you can write to me - I will always tell you what data you can use for your purposes and where you can get it.

    What is Apps4Russia


    One important task is to fuel interest in open data. For this, Apps4Russia was created - a long continuous competition for developers, which we did before the state became interested in this topic. In 2011, seven people raised their own money to make up the prize pool, and held the first competition, in which there were about fifteen substantial applications. After him, we created a non-profit partnership “Information Culture” and now we are holding a competition for the third time. Its main task is to motivate developers to turn to open data, to make them understand that they can and should be used for their projects.

    Apps4Russia participated in one great project - a social card. This is an application that, by the coordinates of a mobile phone, determined which state institutions are nearby, and immediately brought their phones: DEZ, government, police station, etc. This is open data that has been collected from different sites and systematized. Recently, we held a small competition based on police data. Within its framework, several applications have appeared that help to find out your district officer.

    This year at Apps4Russia there is a Yandex nomination in which applications created on its technologies will compete. It has a very specific idea: Yandex is a service company that also works on open data and creates many opportunities for developers to improve the quality of their products. It is difficult to measure how many projects have earned on Yandex.Maps, but the food quality of so many has certainly improved. You can use not only Yandex.Maps, but the Yandex.Search API , API of other services .

    image

    In addition to the generally accepted APIs, Yandex also has technologies that are specifically designed to process the language in free form. For example, some time ago, the parser Tomita became opendesigned specifically for this. It is he who helps to understand the meaning of the text, for example, Yandex.News.

    And with the help of the Search and the registry of hospitals, you can make a search engine for hospitals. Or create a mobile application for prosecutors or people interested in prosecutors by collecting data from all prosecutor’s sites and adding news to RSS. And sell it to the prosecutors themselves.

    You can take a small piece from each data array and use it somehow. If the registry of organizations has their web addresses, you can restart the robot, collect RSS feeds and make the mobile application “Latest Moscow City News” - all Moscow departments have an RSS feed. All this can be done on Yandex technologies - you just need to go to api.yandex.ru. This year, applications for Apps4Russia will end on September 16th, but there is a chance that we will extend it.

    Yandex open data hackathon


    September 14 - 15, Moscow will host the first Yandex Hackathon. Two days and two nights, developers will create applications based on open government data and Yandex technologies. You can participate in it even with teams of up to five people. And you can come as a ready-made team, or you can organize on the spot.

    If you can think for a long, long time at the competition, and then do something for him in two hours, then you need to think fast at the Hackathon. As a rule, you have to come prepared for it. Therefore, think in advance about what you will do, understand where you will look for information, and learn the API. Of course, they will help you on the spot: there will be consultants on open data and on Yandex technologies.

    I want to emphasize once again that it is not necessary to immediately make a product that you will sell. You can make it part of another product. You can sell yourself - due to the fact that you are qualitatively implementing a particular piece of the project. And not necessarily the employer - it's just a job for a reputation. At the Hackathon, you will be able to show that you are able to create cool things based on some information and some tools.

    The main task of both Apps4Russia and Yandex Hackathon is to show that there is a lot of information and technologies around with which you can create something useful.

    Also popular now: