Series: Big Data - like a dream. 8th series. Non-technical. Modular journalism  

    In previous series: Big Data is not just a lot of data. Big Data is a positive feedback process. The Obama Button as the embodiment of rtBD & A. Big Data Development Philosophy. BD is also Bolt Data. Analysts BD. In this series, let's talk about the impact of BD on such a non-technical industry as journalism.

    Very schematically - programmers are akin to reporters: both industries use past skills (apply the knowledge of previous generations, developers of methods and languages), both professions are aimed at modernizing the current and what is happening (improving the lives of specific people), advanced workers in both directions strive to rise to the next level, aimed at future- Become architects and writers, influencing huge groups of followers and peoples.

    Both professions have changed dramatically over the past decades: stenographers and typists are “buried” together, proofreaders disappear, the intermediate role of “PC operator” disappears, and Visual Basic and the Internet have repeatedly expanded the “entrance door” to both professions. The creation of programs and the publication of their own opinions became available "both young and old."

    And both professions go under the sword of Damocles standardization: the speed of receipt of new tasks and information flows from all corners, which became very small, the globe, automatically led to the growth of duplicated cliches. More and more programs are “compiled” from library modules, more and more journalistic short news materials can be compiled automatically from standard bindings around “bare facts”. Experiments on the automation of programming and writing articles by artificial intelligence have already entered the pre-industrial phase.

    But there should not be any pessimism: mankind is not scattered by class specialists - the usual flow will take place from drying up industries to gaining strength. About 15 years ago, when the first Internet wave in journalism was already taking shape, at a meeting in the office of Yasen Zasursky, dean of the Faculty of Journalism of Moscow State University, to the question “ Well, are there certainly many journalists here ?! ” Yasen Nikolaevich shook his head sadly. In a large office there were several tables, on which mountains of Mount Everest towered books, magazines, newspapers, almanacs, saturated with thoughts and time. And we spent an entire hour recklessly discussing with the doctor of philology where the paths of future graduates of the journalism department would turn. Honestly, life turned out to be richer than our speculations:blogs and Facebook, YouTube and news aggregators, a unique Twitter identity, communities around traveling photographers and groups of like-minded commentators around all kinds of thematic resources - all this diversity, even in the most rosy dreams, could not have been seen at the turn of the century. It is symptomatic that the grandson of Yasen Nikilaevich, Internet provider Ivan Zasursky, heads the department of New Media at the faculty of journalism.

    We continue the dose of positive and see what trends can strengthen the position of leading journalists in their desire to influence more and more large groups of people, and how the Big Data industry can help in this re / evolutionary explosion:

    1. Identification of new trends
    A typical example is the Chelyabinsk meteorite. Those. it is a completely new entity that did not previously exist, or an object whose information background was minimal for a long time.

    2. Change in existing trends.
    For example, the elections in France: there are three political forces and three persons (Hollande, Sarkozy, Marie Le Pen), there is a division of society into those who have already made their choice and those who hesitate. The goal of the “players” is to get the largest possible percentage of votes of undecided voters. Constantly analyze trends and reactions to stocks and campaign promises.

    3. Ratings of interests (media persons, thematic events)
    In any area of ​​society there are leaders recognized by society: these are individuals (football players, politicians, scientists, musicians) and objects (Sosa-Cola, Moscow, Sberbank, Zenit). The very fact of getting into a thematic rating or changing positions in ratings is a signal of changes in the public information field.

    In the previous series, it was already said about the need to distinguish between approaches:
    A) personalized “many-data-by-object”,
    B) information field of data in the industry and around objects.

    Until recently, for journalists and society, which received the main stream of data almost exclusively from the media, there was virtually no alternative to option “A” (topic “Dossiers”). Beginners (people, brands) needed to actively interact with the press in order to get into the circle of media references.

    With the development of social media, the direction of the vector has changed significantly:
    - “sensors” (people, companies, structures, robots) generate a lot of data in the information field;
    - The media are adapting to the new reality - due to competition for the speed of news release, monitoring groups are created in the editorial offices for early detection of information lines;
    - The role of the media is changing - due to a sharp increase in the total flow of materials in the world, there is a thematic accentuation and polarization of the opinions of specific media;
    - The role of the media is decreasing - people are increasingly using other information flows, competition is intensifying for the time of people, which remains 24 hours in a day.

    Here are the interesting data received from Brand Analytics on the Russian-language information field for the week of April:
    Date - News - News comments
    04/02/2015 ..... 147 607 ... 68 957
    04/03/2015 ..... 126 685 ..103 503
    04.04 .2015 ... 69 924 ... 85 015
    04/05/2015 ... 58 961 ... 78 819
    04/06/2015 ... 121 247 ... 92 784
    04/07/2015 ... 148 011 ... 104 245
    04/08/2015 ... 189 650 ... 92 011

    i.e. 862 thousand news were published per week , to which 625 thousand comments were made .

    For those who prefer to think in the format of monthly data - data for March: news - 3.7 million, comments on them - 2.7 million. In total, 6.4 million “near-media” materials are obtained. Just a little (0.5%) of the flow in1 BILLION of Russian-language posts on social networks (plus a few blogs, forums, review sites, etc.) generated by “sensor” people around the world.

    Therefore, it is not surprising that the unexpectedly occurring phenomenon - the same "Chelyabinsk meteorite" - took only a few minutes to appear on social networks (Twitter, YouTube, VKontakte) and as much as 2-3 hours for the first information to appear in the media.

    Thus, the modern media (no matter in what format they are operating now) become extremely important to quickly receive information from the "sensors" (manufacturers' sites, people at the scene, robotic systems), thematic filteringto reduce the "omnivorous" and more targeted orientation for readers, as well as operational rating , which allows you to "be in good shape" and timely adjust their understanding of the world to how the world is perceived by ordinary people.

    As a real case, let's look at three ratings regularly prepared by the largest Russian media on the basis of the analysis of unstructured information of large volumes from social media:

    1. Profile, thematic: monthly citation rating of Russian media on / bamarch

    2. Non-core, thematic, situational: during the World Cup in BrazilRT website (Russia Today) published a daily rating of the most popular materials of “humanity” on the football topic

    3. Profile, general: monthly media rating -person on the RIA Novosti website: Regarding the

    latest rating - media persons - of course, the daily rating reflecting changes in the perception of society is much more interesting. Here are the Top 50 tables for yesterday (April 26): separate media rating and social media rating. Find the many differences :-)

    A “+” sign next to a person means that this media person entered the current Top 50 rating per day and was not in the Top 50 yesterday. This rating allows you to immediately assess how much you are aware of yesterday’s events of the day - do you understand what exactly served as the reason for Nazarbayev’s appearance on the list (hint: Presidential elections were held in Kazakhstan yesterday).

    Media Rating:

    No. Qty. face
    1 4528 Vladimir Putin
    2 1435 Petro Poroshenko
    3 1070 Barack Obama
    4 959 Nursultan Nazarbayev
    5 692 Vladimir Klitschko
    6 590 Ramzan Kadyrov
    7 438 Arseniy Yatsenyuk
    8 390 Joseph Stalin
    9 387 Dmitry Medvedev
    10 375 Dmitry Rogozin
    11 348 Dmitry Peskov
    12 337 Adolf Hitler
    13,310 Merkel
    14296 Nemtsov
    15,295 Klitschko
    16257 Vladimir Lenin
    17,252 Hollande
    18243 Yanukovych
    19240 Yeltsin
    20209 Alexander Lukashenko
    21,188 Akhmetov
    22188 Joseph Kobzon
    23177 Shoigu
    24152 Federica Mogherini +
    25 151 Ban Ki-moon
    26 145 Sergey Lavrov
    27 137 Vladimir Solovyov +
    28 131 Stepan Bandera
    29 123 Yulia Tymoshenko
    30 119 Mikhail Saakashvili
    31 118 Mikhail Khodorkovsky +
    32 115 German Gref +
    33 111 Alexander Zakharchenko
    34 103 Jose Mourinho
    35 98 Robert Downey +
    36 96 Kim Kardashian +
    37 90 Alexey Navalny
    38 86 Mikhail Gorbachev +
    39 82 Fabio Capello
    40 77 Nadezhda Savchenko
    41 75 Alexander Pushkin
    42 73 John Kerry
    43 72 Oleg Lyashko
    44 68 Lionel Messi
    45 68 Napoleon Bonaparte +
    46 65 Igor Kolomoisky
    47 65 Nikita Mikhalkov +
    48 65 Jen Psaki +
    49 64 Arkady Rotenberg +
    50 59 Cristiano Ronaldo +

    Social Media rating:

    No. Qty. face
    1 144539 Vladimir Putin
    2 35769 Petro Poroshenko
    3 31902 Barack Obama
    4 25706 Ramzan Kadyrov
    5 23137 Joseph Stalin
    6 18729 Vitaliy Klichko
    7 14936 Adolf Hitler
    8 14328 Arseniy Yatsenyuk
    9 13440 Vladimir Klitschko +
    10 13027 Alexander Pushkin
    11 12799 Vladimir Lenin
    12 12267 Nursultan Nazarbayev +
    13 11995 Harry Potter
    14 11579 Boris Nemtsov
    15 9377 Cristiano Ronaldo
    16 9326 Erich Maria Remarque
    17 8816 Leo Tolstoy
    18 8469 Tim
    19 8405 Lionel Messi
    20 7974 Albert Einstein
    21 7799 William Shakespeare
    22 7606 Omar Khayyam
    23 7429 Sergey Yesenin
    24 7156 Viktor Yanukovych
    25 6843 Sergei Shoigu
    26 6452 Dmitry Rogozin
    27 6402 Vysotsky Vladimir
    28 6189 Dmitry Medvedev
    29 5974 Bernard Show +
    30 5838 Вера Брежнева
    31 5661 Владимир Соловьев +
    32 5642 Алексей Навальный
    33 5622 Юлия Тимошенко
    34 5608 Дмитрий Песков
    35 5526 Владимир Жириновский
    36 5513 Фабио Капелло
    37 5426 Земфира +
    38 5391 Джонни Депп
    39 5198 Наполеон Бонапарт
    40 5006 Анатолий Шарий
    41 4943 Степан Бандера
    42 4934 Андрей Леницкий +
    43 4929 Борис Ельцин +
    44 4922 Фёдор Достоевский
    45 4899 Сергей Лавров
    46 4583 Иосиф Бродский +
    47 4572 Фаина Раневская +
    48 4467 Ницше
    49 4467 Полина Гагарина +
    50 4446 Стивен Кинг +

    To form a daily rating, a stream of 30-40 million Russian-language messages is processed taking into account “non-objective linguistics of social media” , which allows identifying new entities, and not just filtering data with a predefined list of persons.

    To summarize and continue the conversation with Yasen Nikolaevich on the topic “How can a small number of journalists please a larger number of readers?” As a technocratic approach, we propose “modular journalism”:
    - The sea ​​of ​​information generated by “sensors” (people, companies, structures, robots) - for Russian-language segment is 30-50 million messages per day from 12-15 million "sensors";
    - An analytical system that identifies new trends("Chelyabinsk meteorite", "Earthquake in Nepal");
    - An analytical system that tracks changes in existing trends ;
    - An analytical system that distinguishes named entities (objects) and ranking media objects;
    - Rating system for monitoring thematically related objects.

    Such a set of modules will allow you to generate from 10 to 90%materials on your content resource, allowing you to quickly respond to expectations in the information support of readers. In one way or another, most publications have been moving in this direction for a long time (for example, a selection of 5 news items on the popular TJournal) - the whole thing is the tools used and the “trust” in us, readers, in terms of understanding our interests.

    Of course, a smart reader will immediately see the danger of the standardization and "killing" of publications - if each publication (even if within its scope, there are still a lot of them) will publish materials on the principle of "modular journalism", then why are there so many similar publications! We do not have an answer to this question now, but we are sure that creative people will definitely find a creative way out to the next level :-)

    UPD: Popularity ratings for commenting on news Sites, LiveJournal, Twitter, YouTube, VK blogs are available in the public domain at

    P.S. For those bloggers, journalists, and editions who need or are interested in keeping abreast of events, you can send a request to with the request to include your address in the daily list of Media Person ratings for the past day. Our colleagues from Brand Analytics are ready to provide free information with ratings of TOP-50 media persons by the end of May.

    Also popular now: