As we did the project about the presidential election in Russia in 2018
In the fall of 2017, the guys from VOICE thought that the online broadcast of the elections should be taken to a new level:
- firstly, it was frustrating that there are still no details on the maps of the constituent entities of the Russian Federation to territorial commissions,
- secondly, the extreme complexity and complexity of the CEC website, which does not allow you to quickly find out the results of your polling station,
- and thirdly, it was decided to start making the basis for the encyclopedia of results, when you can see the results for different elections for one site. Roughly speaking, find out how the neighbors voted.
And all this not only in order to find out in which region 39% of voters live for Grudinin, but also in order to check whether there is such a site in the country where Putin did not win the election.
Quite quickly it became clear that financial injections into this project, alas, should not be expected, and then everyone worked almost on pure enthusiasm.
The Association of Non-Profit Organizations “In Defense of Voter Rights“ Golos ”” is a Russian public organization. It was founded in 2000. The declared task of the organization is to protect the rights of voters. In mid-2013, the organization was active in 40 regions of Russia.
In its work, Golos informs citizens about the electoral legislation of Russia, conducts long-term and short-term observation of elections at all stages of elections.
I joined the project when the guys already had a clear idea of where they would get the data, project structure and design. The team needed a person to deal with the server side, and I took this position.
The project was structurally divided into two principal parts.
The first part : before the election
The first part : before the election ( uik.golosinfo.org ), was supposed to start a week before the election, and provide opportunities:
- find your polling station at the address;
- see where it is on the map;
- remember exactly how people voted on this site.
At this stage, it all starts with a search of the base of addresses of voters for more than 17 million records, which can be parsed from the CEC website. In this database, each component of the address, like the number of a house, street, city or district, is a separate field and there are 16 types of such components. The set of non-empty components for different addresses is very different. A simple search for even one address in this database took more than a minute, which in itself is too long, and if we talk about a loaded project, then it is not allowed. Sphinx came to the rescue and the search time was reduced to hundredths of a second.
Sphinx (English SQL Phrase Index) is a full-text search system developed by Andrey Aksyonov and distributed under the GNU GPL license. A distinctive feature is the high speed of indexing and searching, as well as integration with existing DBMSs (MySQL, PostgreSQL) and APIs for common web programming languages (officially supported by PHP, Python, Java; there are community-implemented APIs for Perl, Ruby, .NET and C ++ )If you are not familiar with Sphinx, I highly recommend it; working with it is like magic, it is easy to configure and just works. By the way, there are rumors that Habr search also uses this search engine.
A search in the address database allows you to determine which election commission the voter belongs to in 2018, but not in 2012 (the time of the last election). At first glance, it seems that nothing could change much, and people who went to vote at the nearest school will go. But in fact, there were changes, and there were many: in 2013, the numbers of polling stations in more than half of PECs (precinct election commissions) changed. In order to show how the precinct voted in the last presidential election, it is necessary to establish a correspondence between the precinct number in 2018 and the 2012 number.
In this comparison, great difficulty arose.
Unfortunately, the address base for 2012 could not be found anywhere (otherwise it would be possible to implement direct correspondence of the address to the site). Therefore, I had to compare the addresses of PECs on the 12th and the 18th. The addresses were in a different format, so a script was written that highlighted the name of the village and street (avenue, alley, etc.). Thus it was possible to match about 60 thousand PECs. On the same day, we started up and were faced with the discontent of users who reported that they could not find their PEC. Promptly on the same day, thanks to the help of caring people, we conducted a match of the remaining PECs in coordinates using the Python geopy library, and as a result, it was possible to give out turnout and the number of votes for candidates in the last election at almost every address.
According to the data of the 16th year (Elections to the State Duma) everything was ready, but they no longer had time to cut into the back and front end, it is in the plans.
Part Two : Election Day
In the process of discussing plans in the fall, the guys from VOICE got so hungry that it was decided to show the results of real time with details to each section! However, on this idea we got burnt.
So, according to the plan, on the election day, the site was supposed to show online the turnout of voters with details from the country to individual polling stations, and when the vote count was over, switch to showing how the Russians from different regions voted.
The second part of the project was supposed to start at the beginning of the elections and show on the map with some delay the dynamics of the turnout. It was planned that first on the map the Far Eastern regions would light up with color, and as polling stations open, the color border will move west.
In the course of voting, precinct commissions must submit turnout data at 10, 12, 15, and 18 hours, then the vote count and commissions submit detailed statistics for their polling station. We had a parser configured, which was ready to carefully collect data from the CEC website as soon as it was updated, and upload it to the database on the server so that website visitors could see how events were developing with minimal delay.
All the logic of the backend and frontend was built on the collection of data of the upper levels from the lower ones (PECs). However, this was our deliberate mistake, since we assumed that the results would be immediately available for all PECs in the region, in reality this is not so. In reality, it turned out that not all data appeared on the CEC website in a timely manner. In truth, almost no data came from anywhere. As a result, this happened: data could come from only one single PEC in the region.
In addition, we parsed the pages of territorial commissions that host a summary table with statistics on district commissions, and this is a long time because there are as many as 2800 such pages. For example, pages with statistics on TECs are only 85, such as the number of subjects of the Russian Federation. We had to write separate parsers for the country, constituent entities of the Russian Federation and TECs, and update them in a timely manner, and PECs as soon as possible, then it would work.
Due to the fact that we did not foresee the difficulties in advance, during the voting process, it was decided not to proceed to the second stage of the project until there are complete reliable data. So the project lost the ability to observe the turnout in almost real time. But the opportunity to carefully examine the turnout card and study the distribution of votes for candidates did not disappear when we had complete data (by the evening of the next day) we switched our project to the mode of showing election results.
As a result, all three global megastestals were closed (map, search for your PEC by address, viewing the results of past elections), but it didn’t work online. Although all the errors have been scrupulously studied and now we know how to do it, we nevertheless expect from the Central Election Commission more communication and the fact that they will finally begin to send their real-time data through the API.
And also:
- I learned that there are no addresses and names in Russia. It turns out that there are many people who live all the time and are registered in a dead end.
- it turned out to be a very sticky thing, you can endlessly drive around the map and watch how the numbers change
We thank all those involved in this project, in particular the DataMap laboratory team, who made the map (a huge work!): Andreev Vyacheslav, Balashov Anton, Mandzhiev Khongor, as well as Elena Nikitin, Gleb Suvorov, Sergey Ustinov, Valery Viskalin, Marat Haliulin.