Twitter has handed over to the six universities the entire database of tweets since 2006
Every day, 500 million posts are posted on Twitter. Such an array of information with personal data is a real gold mine for data mining. On the basis of tweets, scientists study patterns in human behavior, social connections, the spread of infectious diseases, risk factors for the human body, and much more, writes the June issue of Scientific American.
For example, researchers from Microsoft have developed an algorithm that determines the risk of developing postpartum depression by the content of a pregnant woman’s tweets. The U.S. Geological Survey tracks tweets to determine the epicenter of the quake .
Until now, scientists have been forced to work with a very limited selection of data. The only way to search all tweets was to access the standard Twitter API, and it gives access to only 1% of all messages.
But now Twitter has turned to the scientific community. In February, the company announced that it would provide them with a complete database with all messages for analysis, starting in 2006.
In April, Twitter announced that it had received more than 1,300 applications from 60+ countries for access to the database for scientific purposes, with more than half of the requests coming from outside the United States. After selecting candidates, the company selected six universities from four countries to which it agreed to provide information.
Although access was granted only to selected universities, but still this is very positive news. In the future, the base will become available to a wider circle of researchers, which may lead to an explosive increase in the number of scientific papers based on data mining tweets. With more data, scientists can track more complex and specific patterns. In the end, the base can fall into open access.
True, a number of questions inevitably arise. For example, will Twitter get any rights to research results? Do I need to ask permission from users to use their data for data mining?
To agree on the nuances in advance, a group of scientists from the Virginia Polytechnic University proposed the Rules for the Ethical Use of Twitter Data, which anyone who intends to use data from Twitter can subscribe to. Among other things, the rules prohibit the publication of user names and nicknames, as well as the requirement to openly state the objectives of the study. The authors of the document believe that it is important to agree on such rules before a lot of scientific papers made using this database appear in print.
It should be added that software tools have already been developed that directly contradict the Rules for the Ethical Use of Twitter Data, namely, they automatically collect data about specific users and organizations. Among such programs areMaltego and Creepy .