
Election and Candidate Data Service
Good afternoon, colleagues!
As many people know, a single voting day will be held on September 14, 2014, in many regions of Russia they elect deputies and sometimes mayors.
At the same time, information support, in my opinion, is lame. The main inconvenience is that it is impossible to look at the information about the candidates with a list, only a list without details (besides paginated by 20 people) and one person with details.
On a sunny summer day, the idea came to me to get this information out so that it could be conveniently analyzed and candidates selected clearly and wisely. Unfortunately, the CEC does not provide any export options for all elections (at least I did not find it), so the solution is parsing the pages with a robot.
The first decision was on the Rock, I wanted to consolidate my knowledge in the language and figure out the new Play framework for me. Parser wrote, tested, but, unfortunately, did not master the documentation for Play, for a long time could not find the answer to some initial question. After that, I decided to deal with the Django framework, since everything was much better with the documentation, so the parser was rewritten for Python.
The project can be viewed on Github , the scala-parser folder remains from the parser on the Rock.
During the development process, when compiling the models, a wonderful bonus was found out, we can get the entire history of the candidate’s participation in the elections (since 2007, when the CEC switched to the current format, the old one wasn’t parsil, especially since it would give a maximum of +1 elections to the story, the resource itself started in 2003). This, in fact, can be considered the main value of the project, since now the voter can get the completeness of information, where, when and with whom this or that candidate participated in the elections. A column is displayed in the election list, how many times the candidate participated in the election and you can go to the candidate’s page to see all his elections and all the information. As far as I know, a pair of full name + date of birth is unique for Russian citizens, so there will be no mistakes.
Models are obvious designs., election objects (name, date and link), people’s objects (name and date of birth) and information objects with all the data in the elections with links to specific elections and a specific person.
Parsing a site in python using the BeautifulSoup library can be found here . During development, I had to solve the problem with the commissions, which sometimes get confused with the dates of candidates and their full names when they are entered in the database, I check the update date of all information records at the end of the election processing. If the date of updating the information on the candidate is much less than the date of updating the elections, this information is unnecessary, it can be deleted.
And then comes the most ordinary Django project, which is of little interest
For dynamic filters and sorting, the table uses the js library http://tablefilter.free.fr/
The project was originally posted on Heroku, but I quickly exceeded the free limit on the database (no more than 10,000 rows), now, after parsing the elections Moscow region, Moscow and St. Petersburg the number of candidates is about 50 thousand. A cry on Facebook about sponsoring the project gave me a free virtual server from Sergei Arsentyev , for which many thanks to him!
This was my first experience setting up a linux server on ssh for a Django project through Gunicorn with Nginx, so the increase in knowledge was simply amazing. One question remains, for some reason, the logs are not written when starting through Upstart, if anyone is in the topic - help. The Upstart and Nginx configs can also be found on the github.
Actually, the link itself to the working site of
elections.istra-da.ru
For example, information on elections to the Moscow City Duma can be found here:
elections.istra-da.ru/election/1399
If there is a need, name your regions and districts, I name them I will also include in tasks for the robot. I have not yet scanned all regions, only the Moscow region, Moscow and St. Petersburg, I am afraid that the CEC would not be offended and block the parser.
Comments, advice, suggestions, ideas for further development, development assistance are welcome
As many people know, a single voting day will be held on September 14, 2014, in many regions of Russia they elect deputies and sometimes mayors.
At the same time, information support, in my opinion, is lame. The main inconvenience is that it is impossible to look at the information about the candidates with a list, only a list without details (besides paginated by 20 people) and one person with details.
On a sunny summer day, the idea came to me to get this information out so that it could be conveniently analyzed and candidates selected clearly and wisely. Unfortunately, the CEC does not provide any export options for all elections (at least I did not find it), so the solution is parsing the pages with a robot.
The first decision was on the Rock, I wanted to consolidate my knowledge in the language and figure out the new Play framework for me. Parser wrote, tested, but, unfortunately, did not master the documentation for Play, for a long time could not find the answer to some initial question. After that, I decided to deal with the Django framework, since everything was much better with the documentation, so the parser was rewritten for Python.
The project can be viewed on Github , the scala-parser folder remains from the parser on the Rock.
During the development process, when compiling the models, a wonderful bonus was found out, we can get the entire history of the candidate’s participation in the elections (since 2007, when the CEC switched to the current format, the old one wasn’t parsil, especially since it would give a maximum of +1 elections to the story, the resource itself started in 2003). This, in fact, can be considered the main value of the project, since now the voter can get the completeness of information, where, when and with whom this or that candidate participated in the elections. A column is displayed in the election list, how many times the candidate participated in the election and you can go to the candidate’s page to see all his elections and all the information. As far as I know, a pair of full name + date of birth is unique for Russian citizens, so there will be no mistakes.
Models are obvious designs., election objects (name, date and link), people’s objects (name and date of birth) and information objects with all the data in the elections with links to specific elections and a specific person.
Parsing a site in python using the BeautifulSoup library can be found here . During development, I had to solve the problem with the commissions, which sometimes get confused with the dates of candidates and their full names when they are entered in the database, I check the update date of all information records at the end of the election processing. If the date of updating the information on the candidate is much less than the date of updating the elections, this information is unnecessary, it can be deleted.
And then comes the most ordinary Django project, which is of little interest
For dynamic filters and sorting, the table uses the js library http://tablefilter.free.fr/
The project was originally posted on Heroku, but I quickly exceeded the free limit on the database (no more than 10,000 rows), now, after parsing the elections Moscow region, Moscow and St. Petersburg the number of candidates is about 50 thousand. A cry on Facebook about sponsoring the project gave me a free virtual server from Sergei Arsentyev , for which many thanks to him!
This was my first experience setting up a linux server on ssh for a Django project through Gunicorn with Nginx, so the increase in knowledge was simply amazing. One question remains, for some reason, the logs are not written when starting through Upstart, if anyone is in the topic - help. The Upstart and Nginx configs can also be found on the github.
Actually, the link itself to the working site of
elections.istra-da.ru
For example, information on elections to the Moscow City Duma can be found here:
elections.istra-da.ru/election/1399
If there is a need, name your regions and districts, I name them I will also include in tasks for the robot. I have not yet scanned all regions, only the Moscow region, Moscow and St. Petersburg, I am afraid that the CEC would not be offended and block the parser.
Comments, advice, suggestions, ideas for further development, development assistance are welcome