Moscow Open Data and API Challenge Based on It

    In my experience of communicating with developers who have ever participated in open data competitions, they all say that we need data of as much detail as possible.

    For example, not statistics on regions, but statistics on municipalities. Not a summary of crimes / accidents, but information with addresses and coordinates.
    Not just addresses of institutions with coordinates, but detailed information about each.

    While such detailed data, frankly, in a convenient form a little. If we take Moscow as an example, then even on the Moscow portal data.mos.rumost of the data is geodata or geo-referenced data in the form of an address and some other minimal information. It is clear that it is difficult to do something really interesting with them. Therefore, we say thanks to the Moscow Government for at least revealing this and trying to figure out where to get more interesting data and what to do with them.

    Contests and Competitions

    I’ll answer right away the question why this is necessary - it’s impossible to conduct a single contest / hackathon / competition for developers without enough interesting data. We came across this at the Yandex hackathon , the last contest of Apps4Russia and many others.

    Therefore, now, as we help in preparing the API Challenge, we decided to prepare as much useful data as possible. And since the API Challenge is a competition coming from the authorities of Moscow and focused on Moscow, we collect data from Moscow. To achieve this, we started browsing through dozens of websites and are looking for something that can be used legally and with benefit.




    How it happened and continues


    First you need to understand where to look for data. The universal formula is in 4 directions.
    1. Official websites of authorities
    2. Sites of territorial divisions of federal bodies (FSIN, Ministry of Justice, Ministry of Internal Affairs, etc.)
    3. Sites of state enterprises and state-regulated monopolies
    4. Sites of municipalities


    The last paragraph refers to Moscow weakly and then only to new territories, but all the rest are quite existent and accessible.

    We looked at the sites of all departments, having found their list on www.mos.ru of interesting data there is not so little, but not enough. Some of the data that they already have is published on data.mos.ru, while others require significant efforts to extract it from PDF documents, for example, Mosekomonitoring reports are large PDF documents that cannot be translated into data manually.

    Further on the sites of territorial administrations of the federal government. In Moscow, as in all regions, there are representative offices of a large number of federal bodies, since in our country many functions of power are divided between the federal government and the regions. In particular, the Ministry of Internal Affairs refers to the federal government, the Federal Penitentiary Service, the Bailiff Service, the Prosecutor’s Office and much more. We looked at a lot of their sites, finding their list first on the website of the Government of the Russian Federation, and then going through each and finding a section in Moscow.

    And finally, the data on state-owned enterprises and regulated corporations are the most complex in terms of the possibility of their use. The fact is that the natural ones are obliged to publish many data according to the orders of the FAS and the Federal Customs Service and these data are only Public Domain, there are no restrictions on them. Typically, these sections on sites are called “Disclosures.” According to other information, on their websites there is no unequivocal legal purity / understanding - here we need a city policy in regulating its openness. Nevertheless, for the competition of developers, such data is quite suitable in case of their high social value.

    What we found


    I will list the data immediately with links to the arrays that we extracted and which can be downloaded and used immediately.
    All the data that we collect we upload to our hub of open data . This is an open, nonprofit project similar to thedatahub.io by the Open Knowledge Foundation. Everything that is placed on it will always be open and the portal allows those who wish to upload at least all the data through the CKAN API.

    Register of Lawyers


    These data are posted on the website of the Ministry of Justice of Russia - Moscow Office.

    We deflated them and converted them to JSON, CSV and XLS with normalized fields. Now the data can be downloaded here - http://hubofdata.ru/dataset/mosadv

    Notary Registry


    Data, again, from the website of the Ministry of Justice .
    The story is exactly the same with them - it was an XLS file from the beginning, we just downloaded it, processed it in OpenRefine and converted it to JSON, CSV and put it here - http://hubofdata.ru/dataset/mos-notary

    Moscow prisons


    A very small list of prisons is available on the FSIN’s website in Moscow - http://www.77.fsin.su/structure/
    With a very simple parser, it was turned into all the same JSON, CSV, XLS formats and posted here - http: // hubofdata .ru / dataset / mos-prisons

    Contacts of Mosgaz units along the streets

    If the previous 3 arrays were related to state data from federal authorities, then the next array is the data on the contacts of Mosgaz which is an enterprise in Moscow and is regulated by laws and regulations on information disclosure.

    Mosgaz has a section in which you can enter the street to find out the contacts of its units. Here it is http://www.mos-gaz.ru/services/territory/

    Since a fairly simple AJAX code turned out to be inside this section, it turned out to extract all contacts and all departments in a short time and we posted a large array of contacts http: // hubofdata. ru / dataset / mosgaz-contacts in which there are files with binding of streets to districts and files with binding of divisions to districts.

    Addresses of TPP, hydroelectric power station and state district power station of Mosenergo


    On the site of Mosenergo, one of the natural monopolies of Moscow, there are addresses of their thermal power plants, hydroelectric power stations and state district power plants - http://www.mosenergo.ru/catalog/228.aspx this list is very small, but useful for everyone who is interested in such data.

    It was easy to parse and put it here - http://hubofdata.ru/dataset/mosenergo-filials . This data is useful for anyone who decides to make applications on the environmental situation in Moscow and, I’ll say right away, we have managed to process not all of Mosenergo’s data so far. They have a lot of public reports in the “ 2TP-Airstatistical report section , there is a lot of data in XLS format for each station about how much waste they throw. Maybe someone will be ready to collect and bring them together.

    Addresses and characteristics of post offices of Russia


    Russian Post is not a government body, but a state-owned enterprise is often criticized in view of the quality of work. They have data on departments, in particular, they publish them on several of their sites, the main of which is their site .

    We pulled out data on their offices in Moscow with information on the coordinates of their location, addresses, indexes, working hours and so on. This data could not be packaged into a CSV in any way in a simple way, so it is available in a single JSON file http://hubofdata.ru/dataset/ruspost-msk

    Noise complaints


    On the site of the previously mentioned Mosecomonitoring, a small but curious array of data from the complaints of the city residents about noise was discovered. Here http://www.mosecom.ru/noise/territ/noise_stroy_pl_2013.php these complaints are collected and they even have information about the address, that is, they can be superimposed on the cards if desired.

    We also pulled this data out with a parser and posted it on the hub - http://hubofdata.ru/dataset/msk-noise-req

    Nonprofit Addresses


    And here the largest data arrays have gone. In this case, we looked at the website of the Ministry of Justice and found that in the register of non-profit organizations they can be obtained by region. Here - http://unro.minjust.ru/NKOs.aspx .
    In fact, we did this a long time ago, at the beginning of this year, and the data was "gathering dust on the shelf." Now we have converted them into convenient formats for work and posted on the hub - http://hubofdata.ru/dataset/mos-nko-2013

    Please note that the data is divided into types of organizations. In case you want to work separately on religious organizations and separately on the others.

    Bases of Moscow houses with reference to constituencies and with construction dates


    And finally, the data that may be most useful. Several sites showed detailed data for each house in Moscow. These are sites such as dom.mos.ru, gorod.mos.ru, reformazhkh.ru, mosgorizbirkom.ru and a number of others.

    We did not have time to process them all and realize the dream of bringing all the data on houses into a single database, but we took the first step - we sorted out several databases and made it possible to further combine them.

    Now available:
    • The base of all houses with reference to PECs - http://hubofdata.ru/dataset/mos-elect-houses for each PEC there is a lot of additional information and information on the place of voting
    • The base of dates for building houses is http://hubofdata.ru/dataset/mos-buildings-years while the site, in fact, has much more information for each house, we have collected the construction dates so far and we hope that there are people who want to help collect everything data


    That, of course, is not all. There is more data and we will regularly upload them to the hub.
    All github code is laid out on github that we use https://github.com/infoculture/mosopendata

    As a summary, what are the conclusions and suggestions:
    1. All that we are collecting and parsim in Moscow, we will offer officials from DIT to disclose officially. I think that they will not refuse, since the data is already clear where to look. In any case, in the data that is under the jurisdiction of the Moscow authorities, here is with the federal ones - you will have to ask the federal authorities for longer.
    2. You can well do the same in your favorite region or city and make a portal of open city data or upload it to our hub or elsewhere for general access.
    3. Take part in contests and competitions. And in what I cited above, and in all that will be. This is not only an opportunity to test your skills, but also to receive a significant prize.

    Also popular now: