Creating your own hub for publishing open data

  • Tutorial
The topic of open government and open data is gaining momentum and gaining popularity among many countries of the world, their governments and organizations . In addition, a law on open data was recently adopted in Russia , which indicates a growing interest in this topic. In Ukraine, too, the government is moving towards the publication of open data. Actually, since this is popular, you can make money on it or take part in the fashion movement . In addition, competitions , festivals and hackathons are held annually to create sites and applications for publishing open data.

Open data is a way of presenting publicly available information in machine-readable form. In a form in which developers can upload them to databases, analyze and present in a much more clear and understandable form than how this is done in government systems.

I would like to share my personal experience of “creating” (installing) a site for publishing open data. I used the open source platform CKAN . Whether to go this way, use another platform or write your website from scratch - it's up to you. I hope my article will help you make the right choice.

CKAN is a data management system that makes it accessible through tools that simplify their publication, distribution, retrieval and use. More than 50 countries, organizations and cities have chosen this platform to publish their data. Among them are the UK, USA, Czech Republic, Australia, Brazil and others. In general, the list is impressive. The platform itself is written in python. Here's a detailed article in English. Here is a detailed article in Russian.

Install CKAN


This address contains detailed instructions for installing the platform. True, not everything works as smoothly as described there. I spent a fair amount of days to figure out and install the platform. In turn, the developers offer paid terms for installing, hosting and maintaining the platform. Previously, prices were posted on their website, but now they are not. However, we are interested in CKAN as a free platform. You can also fork this project if you wish. And this is one of the most popular forks - the UK government open data hub .

You are offered two ways to install the platform: package installation or installation from source. The first way saves a huge amount of your “nervous” energy. But it will suit you only if you have the right system. At the moment, this is Ubuntu 12.04 (until recently it was - 10.04). Here on it, and I recommend that you put this platform. If you are confident in your abilities or already have a customized system and do not want to abandon it, then the project wiki will help you . My experience is OpenVZ Ubuntu 12.04.

So, the first way is a batch installation. I did not succeed, for the reason indicated above (inconsistency of OS versions). But here I can give you a couple of tips. Since this was my first experience in administering a virtual server (and indeed administration), my advice may seem to be experienced (bearded) administrators for children, but for beginners, I hope, it will be useful.


!!! Pay attention to the version of the installed platform. CKAN is currently being translated into more than 30 languages ​​of the world, but with varying degrees of success. The translation is carried out by volunteers. And each new version is released with a different set of translations. Track to this addresstranslation status of the version you are about to install. I had to participate in the translation of Russian and Ukrainian locales (ver. 2.0 - 2.1), since the translation was not ready. The translation is carried out on the transifex website . You have a choice - either put the latest version, which has a translation, or take part in the translation. Translation status of Russian locale.

Installing CKAN from Package


1. Install the CKAN Package

We do everything according to the instructions. If there are no errors, we go further, if there are errors, proceed to the second method. This rule works for all items. But first check the essence of the error - maybe the matter is in you or in the server settings.

2. Install PostgreSQL and Solr

Before installing the database, we should give ourselves the rights to overwrite the / dev / null stack, otherwise we will get the error / dev / null: Permission denied.
The fix is ​​simple - we get root rights and fix it:
# rm /dev/null && mknod -m 0666 /dev/null c 1 3
Check: the
# ls -la /dev/null
rights should look like this:
crw-rw-rw-
After installing PostgreSQL, you need to set the locale and text encoding. Install the languages ​​in the system:
apt-get install language-pack-ru-base (apt-get install language-pack-uk-base)
Stop the database:
pg_dropcluster --stop 9.1 main
And install the locale itself (note that all databases will have one locale):
pg_createcluster --locale ru_RU.UTF8 9.1 main (pg_createcluster --locale uk_UA.UTF8 9.1 main)
Overload and check - now the databases should have the locale and encoding we need:
reboot
sudo -u postgres psql -l

Developers recommend installing the solr-jetty package. But, according to my observations and experience - it does not work. I do not know why. I tried everything, but it does not work. I had to go around. If you can’t manage to start using the native sorl method, then catch the fix: Assign the
value of the latest jetty version :
JETTY_VERSION=7.6.10.v20130312
Take it:
wget download.eclipse.org/jetty$JETTY_VERSION/dist/jetty-distribution-$JETTY_VERSION.tar.gz
Unpack:
tar xfz jetty-distribution-$JETTY_VERSION.tar.gz
Take the latest sorl version:
wget apache-mirror.telesys.org.ua/lucene/solr/3.6.2/apache-solr-3.6.2.zip
Unpack:
unzip -q apache-solr-3.6.2.zip
Go:
cd apache-solr-3.6.2/example/
Run in the background sorl:
nohup java -jar start.jar&

Clearly follow all instructions in the manual, and soon You will see a working site.

Now the second way, if you do not have Ubuntu 12.04
Once again, I pay attention to the wiki on installing CKAN.

Installing CKAN from Source


1. Install the required packages

We are offered such a set of packages:
sudo apt-get install python-dev postgresql libpq-dev python-pip python-virtualenv git-core solr-jetty openjdk-6-jdk
I recommend that you install the following set (do not forget apt-get update and about / dev / null (described above)):
sudo aptitude install python-dev postgresql-9.1 libpq-dev python-pip python-virtualenv git-core openjdk-6-jdk curl nginx gcc bcc tcc

3. Setup a PostgreSQL database

+ additional setup described above

5. Setup Solr

described above

9. You're done!

You are offered the code:
paster serve /etc/ckan/default/development.ini
My suggestion for running in the background:
nohup paster serve /etc/ckan/default/development.ini&

For testing on the local machine, the steps taken are enough. But if you want to transfer your platform to the server, then here I will also give you one piece of advice.

Deploying a Source Install


My good advice (for which many thanks to ibegtin ) sounds like this - use Nginx. This will greatly speed up your site. Here is a great instruction on installing the paster + Nginx bundle. She really helped me solve this way the issue of platform virtualization.

For the rest, just follow the instructions, and you will succeed. If you have any questions - you can ask them to me or write to the developers . You can also subscribe to the newsletter or follow the development of the project on twitter .

Useful resources


CKAN Storage Extension for Google Refine
Integrating CKAN and Drupal

Sites on the CKAN platform


List of sites running on this platform
A directory site running on CKAN that collects data about existing data hubs.
Hub of open data in the Russian Federation
Hub of open data in the Russian Federation on the activities of law enforcement authorities An
international hub operating on the CKAN platform. You do not have to create your own hub. You can upload any open data here and use api or link to this resource. The choice is yours. Good luck

Also popular now: