Yandex supported Wikidata

    Today at the SemTechBiz conference in San Francisco it was announced that the Wikidata project received a grant in the amount of 150 thousand euros from Yandex.

    Wikidata is a Wikimedia Foundation project, a jointly edited knowledge base for centralized storage of structured data. Especially for our tech blog on Habré, we asked Denny Vrandečić, one of the founders of this project, about what Wikidata is in details, how it differs from other similar projects and what benefits the infrastructure of the future Internet and all its users can bring. What is Wikidata? What are the goals of this project? Why was Wikidata the first Wikimedia Foundation project since 2006?

    Yandex supported Wikidata





    Wikidata is a new project from the Wikimedia Foundation. The main task of the latter is to provide everyone on the planet with free access to all possible knowledge. Our most famous project is Wikipedia, an open encyclopedia available in more than 200 languages.


    Versions in some of these languages ​​(for example, in Russian or English) support very active communities. But for many others, it is impossible to provide the same level of completeness and relevance. It also turns out that an encyclopedia in those languages ​​for which there are not enough editors is easier to spoil: there are not enough people who would rule everything and besides not let the information become outdated.

    Wikidata was created to partially fix this. We are creating an open multilingual structured data base with information that can be used on Wikipedia and other projects - including those external to Wikimedia. Our data can be used freely - the license allows almost any use. Everyone will be able to make changes to the project data, which are now available in more than 300 languages.

    In general, Wikimedia launched this project to improve the quality of the language versions of Wikipedia and allow editors to spend their time more efficiently.

    How is Wikidata different from other similar projects - Freebase, DBpedia? Why make another machine-readable database of structured information?

    DBpedia is committed to collecting data from Wikipedia, i.e. does almost the opposite of what Wikidata does. In addition, it follows that in DBpedia no data can be edited directly.

    Freebase is a project very similar to Wikidata, and I admit possible interaction in the future. Starting from checking the consistency of our data and up to exchanging them within the framework that our licenses admit. Let's see what happens. The main difference between Freebase and Wikidata is that for the latter, multilingualism and the availability of sources are much more important - and in fact, both are in Freebase, but it is not very easy to parse this in their interface. The second obvious difference is that Freebase is made by Google, and Wikidata is a not-for-profit organization. This, we hope, slightly reduces the risks of using data from it.

    Are you planning to integrate with existing data warehouses?

    We are already integrating with an increasing number of external databases, mainly through identifier relationships. Hundreds of thousands of pieces of information from Wikidata are already linked to VIAF, GND, MusicBrainz, IMDB and many other catalogs and databases. We believe that this may turn out to be one of the biggest contributions Wikidata will make to the future of the Internet infrastructure, to the creation of a knowledge network and the connection of entities on the Internet.

    What is Wikidata related to Wikipedia and how does it interact with its language sections?

    Wikidata provides data that can be used on regional Wikipedia. Our first step was to arrange access to links to versions of the article in different languages ​​that were previously decentralized - in each article separately. Wikidata now has a single central place for such links, and this has removed a lot of meaningless duplicate information from the language versions of Wikipedia.

    The second step (but also still the initial one) is to provide Wikipedia with another form of structured data. For example, identifiers from IMDB, which in some of the language versions of Wikipedia are already taken and displayed from Wikidata. We hope that this practice will gradually increase and become more and more useful for Wikipedia, although this process cannot be quick - first Wikidata should earn the trust of Wikipedists. And they, in turn, must learn to use the new opportunities correctly. Communities intersect widely, and it will help a lot, but how exactly they can start using Wikidata will be the most important and interesting question for us in the future.

    Who do you see Wikidata users? Are there any examples of success?

    We now have over 8,000 active editors on Wikidata. This means that by the number of editors Wikidata would be in the top ten most popular Wikipedia. And since Wikipedia is our main field of application, we are very pleased that it is already so useful. So this is our main example and indicator of success.

    There are some more great examples of using Wikidata. For example, Wiri is a system that can take questions in a natural language (in this case, English) and answer them, Geneology Visualizer and an alternative interface for Wikipedia browsing - “ Tree of life ”. Some research projects already use Wikidata data. For example, in gender analysis Wikipedia andto study the completeness of different languages . It becomes much easier to research such things with Wikidata.

    I think this is very good for a project that appeared just a few months ago. And as new capabilities appear - data types for time, coordinates, numbers, or an interface for queries - we hope to further increase our usefulness. We know that several companies already support their internal copies of Wikidata. I hope that they also bring some benefit. :)

    Photo from the SemTechBiz conference

    You often speak at conferences and at various universities. How did the active community respond to Wikidata?

    They were just happy. Almost everyone who has ever dealt with articles in other languages ​​was delighted to see Wikidata. And many are very curious where infobox data will lead us as a community. Almost every Wikipedist I spoke with mentioned that they were really looking forward to the appearance of such a project and even thought to do it themselves. So they are very happy to see that he finally appeared. Wikidata did not appear overnight. The idea of ​​such a project has been discussed since the first Wikimania conference in 2005 and even earlier. So, like many, I am happy to see it realized.

    Naturally, such a heterogeneous, intellectual and critical view of things community, as on Wikipedia, may not have a single opinion. And enough participants are worried about problems that may arise with Wikidata. And clearly their desire to wait, see how it works, make sure that the project is useful, and only then use it.
    Volunteering is one of the basic principles of Wikidata. This is an offer. Any community can decide whether to agree to accept it or not. Moreover, they, up to the smallest details, can choose what to use and what not.

    At least until today I was very pleased with how the community reacts, and I hope that its participants will continue to constructively communicate with us, show enthusiasm or deliberately criticize us.

    Tell a little about the team. How long did it take her to develop the first version?

    We started with a team of 12 people who worked full time — we wanted to get started quickly. The first year of work, full of ambitious goals, was clearly planned. Our task was to show that we really cope with the large and complex problems that arose in the work on the project. Everything went fine, and the release took place about six months later. During this time, we began to add more and more features. After 10 months, the first Wikipedians began to use our data, and Wikidata data began to enrich themselves.

    It also took us some time: to work out development and deployment cycles and learn how to communicate effectively with the main office in San Francisco. The Wikidata team is located in Berlin - the German Wikimedia division plays a leading role in development - and this is the first time we are working on a project of this magnitude without the direct involvement of the Wikimedia Foundation. There were a large number of things without which settlement it was impossible to begin.

    At the end of the first year of development, we slowed down its pace, and the team accordingly decreased. There are currently 10 people working on Wikidata, and not all of them are full-time jobs. There is still much to be done, but no longer in an emergency mode: we must be careful, give the community a break and develop further with us. We continue to add many new features and are working on our technical debt.

    The first version was launched about a year ago, and the second - more recently. Can you share some statistics? How many objects have already been added? Does this happen automatically, semi-automatically or completely manually?

    Now in our system more than 13M objects are described. The numbers are absolutely amazing: support for claims was added only in February, and now - at the end of May - we crossed the figure of 10M claims. This is very good compared to our expectations: when we needed to calculate the number of objects that we should have by the end of the first year, we converged on 100,000.

    The work is very much slanted towards semi-automatic editing. About 85-90% of all edits are made by three or four dozen robots. But due to the incredibly strong growth in the number of Wikidata edits - they are even ahead of those made on the English-language Wikipedia - in reality, we have a large number of manual changes. Currently, about one million edits per month are made by more than 8,000 people. Also, the changes made by robots are very limited and are tightly regulated by their creators. But this is exactly what we expected and what we hoped for - an environment in which robots and people can work together more efficiently than in a regular Wiki.

    What future do you see at Wikidata? What are your short and long term goals? How do you decide what to do first? Who can participate in such a decision?

    In the short term, we still lack several important features: support for data types for time, coordinates, several numbers, text and URLs, as well as several basic features - for example, the ability to sort and rank content. In addition, we are constantly working to support more export types for our data, as well as the ability to query Wikidata. Also this year, a visual editor will appear on Wikipedia. We plan how to integrate into its interface in order to make the interaction between information on Wikipedia and Wikidata as convenient as possible. We are also working to support not only Wikipedia, but also other Wikimedia projects in the near future. In addition, we want to make sure that our software can be used for other work scenarios.

    If we talk about the long-term development plans of Wikidata, the key question for us is: can we become what we hope for - the main repository of entities with IDs on the Web. We see a future in which all entities are identified using Wikidata. Applications may use data from Wikidata, or may not, but we seriously hope that identifiers will become an important part of the Web in 2015. If Wikidata succeeds in this, I will assume that we have laid an important stone in the foundation of a more intelligent Web, where data communication between heterogeneous sources will be easier, and it will be more useful for each user than we can even imagine now.

    In the meantime, our tasks are more modest: to support Wikipedia, improving its quality and reducing the complexity of its operation. And thus, to support the encyclopedia in its super task to bring knowledge to all the people of the world.

    Also popular now: