Visualization of comments from YouTube channels of international and local touhou communities

    Hello! We develop the ideas of the first post and continue to visualize and study comments on YouTube. This time we will work with global and local YouTube-communities. How do commentators who write in different languages ​​interact? Is a single global community gathering from the multitude of local groups, or is it more complicated than it seems? And where is the Touhou Project? Let's find out.



    Comments and communities - genre specificity, size, language spectrum


    To answer these questions, we explored the relationship between commentator groups on YouTube channels related to the Touhou Project community (the “East” project). As a rule, these channels are associated with a series of computer games of the same name in the genre Dammak (vertical shooters with a huge number of bullets). Based on the games created a large number of fan art. From the fan-art and the main content of YouTube channels is formed - game-related flysplays, streams, music, animation and so on.

    The audience of such channels is relatively small, therefore, on the one hand, data is easy to process and visualize, and, on the other hand, the findings can be extrapolated only to small thematic YouTube communities.

    For the study were selected comments of three local language communities - Russian, Spanish and Korean (the names of the dominant commentary languages). In the conventionally international community category, comments were considered in English and partly in Japanese. Since Touhou Rroject content originally produced in Japan, Cana , for example, it is used as a description of the elements in all other languages.

    Community info


    All channel data was encoded. Each channel was assigned a unique thousandth number, and each clip - channel number + sequence number of the clip.

    The international community is represented by 25 channels. A total of 243281 comments were processed. Code: 1000 - 25000.

    ( 1000 , 2000 , 3000 , 4000 , 5000 , 6000 , 7000 , 8000 , 9000 , 10,000 , 11000 , 12000 , 13000 , 14000 , 15000 , 16000 , 17000 ,18000 , 19000 , 20000 , 21000 , 22000 , 23000 , 24000 , 25000 )

    2) The Russian community is represented by 9 channels. Total processed 6417 comments. Code: 30000 - 38000
    ( 30000 , 31000 , 32000 , 33000 , 34000 , 35000 , 36000 , 37000 , 38000 )

    3) The Spanish community is represented by 8 channels. A total of 14,483 comments were processed. Code: 40000 - 47000
    ( 40000 ,41,000 , 42000 , 43000 , 44000 , 45000 , 46000 , 47000 )

    4) The Korean community is represented by 8 channels. A total of 12968 comments were processed. Code: 50,000 - 57,000
    ( 50,000 , 51,000 , 52,000 , 53,000 , 54,000 , 55,000 , 56,000 , 57,000 )

    Visualization results


    1) International community : Oriented graph at 50552 nodes and 117906 edges.



    Despite the general homogeneity of the comment field, two autonomous regions with clear contours and one diffuse are noticeable.

    Autonomous gray region in the west - comments on the 8000 channel rollers. The



    non-Touhou channel 8000 content corresponds to the isolated region. These are mainly videos with the soundtracks for the game Final Fantasy (for example, this one ).

    Autonomous green region in the northeast - comments on the channel 7000 commercials.



    Video 7024 collected many unique commentators. Content of the video - the passage of the game Undertale. This game has its own fan community. Probably unique commentators came from there.

    The scattered beige region in the south is Channel 3000.



    Mostly isolated regions are clips about GTA and other non-Touhou games ( 3015 , 3036 , 3038 , 3049 , 3051 , 3063, and others).

    That is, the majority of isolated regions in the international community are not Touhou-related content.

    2) Russian community : oriented graph at 3655 nodes and 5180 edges.



    There is a general field of comments, which tends to be divided into two parts, and a highlighted (purple) region.

    The highlighted purple region is the video of the channel 38000, which is the original content - the English subtitles prepared by the authors of the channel for the composition in Japanese. Comments on the video in English, commentators for this channel are mostly unique.

    3) Spanish community : oriented graph at 5866 nodes and 9843 edges.



    Three autonomous regions are observed. The red region is the channel 40000. The orange-black region is the channel 45000. The blue-violet-green region is channel 46000.

    The content of all three channels is represented by fan art. Probably, the division into isolated regions is associated with the specifics of division within the community. For example, the content of the channel 40000 is mainly associated with cosplay, the links placed in it also lead to the cosplay channels.

    4) Korean community : oriented graph at 4113 nodes and 6763 edges.



    There are two large (purple-blue and green-black), as well as several small autonomous regions (crimson, orange, dark green, and so on).

    Purple-blue region: channel 57000. Green-black region - channels 51000, 52000 and 53000.

    In general, all regions are presented with comments related to Touhou content. In the distance from the rest, sometimes come across comments videos with non-Touhou content, for example, 52003 .

    The Spanish and Korean communities are similar: the majority of commentators accumulate channels with a large amount of content, the other channels are clearly separated from them. The Russian community in comparison with them interacts more strongly, as can be seen in the general region of comments. This is explained by the fact that most of the Russian YouTube channels in the sample are connected to each other through links placed on channels.

    5) All communities are local language (2, 3, 4) and international (1) : a directed graph at 62340 nodes and 185412 edges.



    There is a general cluster of comments and branches departing from it.

    The dark green branch in the north-west direction is the Russian community.



    Spanish community (gray): its main part is concentrated in the south-west.



    A separate branch of the Spanish community, represented by the channel 40000, is in the northeast direction.



    The black line in the south-east direction is the Korean community.



    It is noticeable that the Korean community is more strongly connected with the international one, its main part (channel 57000) practically merged with the channel 13000 region.



    A similar situation with the Spanish community, its south-western branch (channel 46000) is combined with the international region (channel 20000).



    The main part of the Russian community (channel 38000) is located at a great distance from the nearest international region.



    Channel ranking by degree of interaction


    In conclusion, our review will talk about such a thing as channel ranking. Our approach is based on a very simple observation. If we allow a correlation between the spatial arrangement of comments and their intersection with many other comments, then the closer a separate comment is to the central cluster, the greater this intersection.

    Based on this, you can rank the comments of individual channels and videos, both local language and international. An example of channel ranking depending on the proximity / remoteness of comments from a common global center is presented in the table (channels are distributed in descending order of interaction).



    It should be clarified that this is only one of the most simple ranking options, when local groups are directly dependent on the global region. However, visualization of local groups showed heterogeneity in the distribution of external (global) and internal (local) links of commentators. The high connectivity of some channels of the Spanish and Korean communities practically makes them part of a global region, but they are weakly connected with other local channels. For example, the Spanish community, with relatively high integration with the global region, is divided into two practically unconnected regions from the inside. The Russian community is relatively distanced from the international region and demonstrates strong cohesiveness from the inside. These examples show the limitations of a simple model for ranking local groups relative to a common global center. It also indicates the possibility of creating a model that includes local specifics as evaluation criteria. Obviously, this task requires a separate study, which we will do.

    Also popular now: