Public "Schastematrinstva" and his small statistical study

    Introduction (January 2018)


    Sometimes people take up matters that they themselves cannot handle. And I am no exception.

    There is such an interesting VK group - #parenthood ( https://vk.com/zaiki_luzhaiki ). It is one of the most enchanting sources of crude realism. If you want to be disappointed in family, children, husbands and everything else, you are there. The existential crisis is guaranteed for you (at least by the fact that they write 15 posts a day there and these are real people). And, of course, this public is in many ways attractive.

    At some point, I and my wife, who works as a perinatal psychologist, interest arose in the study of what is happening in this public. For example, to impose trivial statistical methods on the content of the public, and suddenly there is something interesting there. I especially wanted to make some loud conclusion. Say the public helps people ... Or the public gives rise to hatred in people ... Or something else so expressive.


    As a result, the amount of all investigated increased.
    The number of interim findings grew.
    The number of graphs, tables grew.
    And the amount of understanding how to evaluate it was not added.

    Intermediate conclusions carried away the imagination to complex constructions based on little on anything, but by and large, the conclusion was obvious alone. Very interesting and exciting, but quite static. An endless cycle of recurring one-on-one problems, which are always uniformly evaluated by the participants in the process. Some kind of endless samsara in which nothing really changes. Waves come and waves leave, leaving no trace.

    All that remained was to summarize and write something beautiful on this issue. And that is all dead. For half a year. This task was impossible. I could not, other people could not.

    But something has been done and it needs to be shown. So watch. It is not directly objective, objective and unbiased. Many things that are in this public, cause me rejection and it is felt. But you can always look only at graphs and tables, and draw conclusions yourself.

    Briefly, what is in the text:

    • General trends
    • Violations of the rule of neutrality on the part of the administration and participants
    • Popular words, pairs of words, and all sorts of combinations
    • Using memes and mat
    • The concept of an ideal post in public


    (There are a number of bad words in the text, but purely for scientific reasons, during the study of the frequency of use thereof)

    Introduction (August 2017)


    Group # of motherhood ( https://vk.com/zaiki_luzhaiki ) is an extremely interesting phenomenon of the era of social networks. Huge frequency of posts. An average of 13-17 posts per day. In this case, no advertising and any distractions from the essence of reposts. Only authentic content. The group’s concept is based on anonymous posts with banned comments. The authors of the posts are mothers, weary of the various circumstances of motherhood. In general, the group has fairly reasonable rules for such a community and its content.

    For all this, the ideological enough administration allows itself to comment or embed links to its program literature, such as books - “Men who hate women and women who love them” and entries in the personal blog of the group’s main creator. Well, mothers themselves periodically try to correspond, inserting links to previous posts in their own. The administration, for some time struggled with this, inserting after such posts signatures in the spirit of “From the administration - you understand that this is at your own peril and risk, anyone can answer you. Be vigilant. ” Then she gave up. In general, the process was quite active.

    It would be even more interesting to follow the husbands' reactions to this. But there is no separate reaction group for this community, therefore, there are no statistics. Although there are rumors that husbands are bombing for good. Especially from the gentle names of them in the group "nitakoy" and "my asshole." However, all this is not verified, unfortunately.

    Some of such processes as: administration interventions, the use of characteristic words, mother-to-mother communication, negative dynamics, etc. I’ll try to consider it rather superficially from the point of view of all sorts of numbers and simple mathematical models.

    I can’t say that everywhere it turned out something excessively unusual and exciting, but certain points are extremely expressive.

    Posts were collected from the creation of the community until August 25, 2017.

    Number of words in a post


    I wanted to check, but suddenly tired of writing for all this time? Suddenly, everyone became more concise and dull. But no. Nothing changes.



    Around the same average number of words always. Although, if you close your eyes to emissions in the middle, it can be unsteadily assumed that people are becoming a little more verbose. A little bit. Apparently the well-read by the same group gives mothers the opportunity to use additional speech turns in the description of their misfortune.

    Number of posts per month


    Here our question is this. What activity has been in the group all this time? Maybe there are more posts? Or less? Or even how? Did the simplest. We calculated the number of posts per month for the entire period of the group’s existence (a red trend obtained by approximation with a polynomial of 6 order (do not ask why the 6th one) ):



    If, looking at the picture, we assume that there was a rather unusual decline in June and July 2016 activity, then the quite obvious seasonality of the flow of posts of disgruntled mothers looms.

    The most active in expressing dissatisfaction with mom in the summer. The least active in winter.
    There can be many possible explanations. For instance:

    1. In winter, you still can’t do anything special, and in the summer it seems that all life passes by while you are sitting with your child.
    2. Winter is already bad, so there is no expressive reason to rationalize this through maternity problems.
    3. Winter mommies ??? give birth less ???, and a rather large flow of discontent is associated with childbirth and what is after them. Here about the birth rate by month

    Choose an explanation that you like ....

    Number of likes per month


    It’s rather pointless to look at the average number of likes per month, because the number of people in the group is constantly growing, it is clear that something like this should happen with likes. But let's see.



    Unable to get into the official statistics of the group with their ugly little hands, we can assume that the number of users in the group changed approximately in this way. And the number of likes, in general, simply depends on the number of users in the group. But I will try to use a trickier indicator.

    I think Ni’s “number of posts per month” is a good indicator of activity. Now if we divide the average number of likes by Ni by Ni, we get some tricky indicator like “what part of the average number of likes was generated by one post this month”. Those. some sort of assessment of the "generative ability" of posts to produce likes.



    And then an interesting thing appears. We see seasonality reversing the seasonality of posts. Obviously, because we have this number of posts in the denominator. What does this tell us? This suggests that mothers may not write their posts in winter, but others read and like them no less actively than in summer. Or that mummies have nothing to do with it, and like most people who don’t write to the group . And this seems to me the most realistic explanation.

    The number of posts per month as an indicator of activity for likes does not work. And this is an interesting conclusion for such a group. Hype is not created by those people who create the content of the group.

    Activity by day of the week


    We reasonably assumed that the number of likes is a good indicator of the number of people in a group. And, looking at the likes chart, we can assume that in the first half of 2017 there is some stabilization of the number of users. Therefore, activity by the days of the week was considered in this first half of 2017, as in the stable period of the group. 0 this is monday. 6 is Sunday.



    Comments are almost superfluous, although it can be assumed that on Sunday admins hammer spread and lay out most of it on Monday.

    One of the alternative explanations is that the most fucked comes on the weekend, when everyone is sitting at home and the husband demands, the child demands and white light is not visible. At the same time with her husband, of course, such posts will not be written. Therefore, as soon as one goes to work in the morning, and the other in the kindergarten / school, mothers sit down to write an essay in public - “How I spent the weekend.”

    Administration Intervention


    With dirty little hands, of course, it’s immediately interesting to look for who has sucked up where, violated the rules (because they can) or what other nasty thing he did. And the main protagonist here, of course, is the administration, which climbs with its ratings and tips on how-to-live-right, while not letting others do the same.

    Administrators kindly allocated their statements in posts with entries “from the admin:” or “from Demakova:”, etc. But not all of them were "inadequate." Some were just informational, like what was given in the introduction, they say, you can’t, don’t write, be careful ...

    Thus, I filtered out the information messages and left only insolent (due to the impossibility of discussion) advice on how to live for unhappy authors. And I got such an interesting schedule:



    It’s immediately obvious who wanted to play God, but he quickly got fed up with it. Over the past six months, the ardor of sociability has faded a little. True, in recent months they have been showing some activation. You can see the summer increase in activity captures them too.

    Communication of moms bypassing the rules


    Moms are no less than administrators eager to break something and write something extra bypassing the rules. To do this, they again kindly insert at the beginning of the post a link to the post to which they are responsible. So it’s easier for me to count all this ... Really?



    Interest in communication is awakened and generated by the arrival of new users. When new users do not come, apparently responding to very similar complaints in the same way becomes simply uninteresting. Thus, the period most stable in the composition of the group is characterized by a rather sharp decrease in the amount of feedback.

    True, there is another option. Admins are more rigidly erasing the answers now.

    Word frequency


    A great torment is to try to depict the dynamics of the popularity (frequency) of words in posts, so I will leave here only 2017, although there are certain changes in priorities since 2015. Naturally, all words are represented by their “roots” in order to combine in one different form one word: “child”, “child”, “child”, ...

    It is worth mentioning that the child is not just the word child. These are words like children, son, daughter, etc. “Husband” is also “nitakoy”, “faithful”, etc. ... “Time” includes “year”, “day”, “hour”, “week”, etc. If you do not combine them, these forms of words with one content fill the entire table of popular words.

    At the top are the most popular words, down their popularity is decreasing.

    (2017, 1)
    (2017, 2)
    (2017, 3)
    (2017, 4)
    (2017, 5)
    (2017, 6)
    (2017, 7)
    (2017, 8)
    child
    child
    child
    child
    child
    child
    child
    child
    time
    time
    time
    time
    time
    time
    time
    time
    husband
    husband
    husband
    husband
    husband
    husband
    husband
    husband
    moms
    is simple
    is simple
    is simple
    is simple
    moms
    moms
    is simple
    is simple
    house
    moms
    moms
    house
    is simple
    is simple
    moms
    could
    moms
    wants
    house
    moms
    one
    wants
    kind
    house
    one
    one
    could
    one
    could
    house
    house
    wants
    could
    den
    den
    works
    house
    one
    one
    den
    works
    house
    one
    wants
    wants
    life
    den
    a dialect
    a dialect
    works
    works
    a dialect
    a dialect
    works
    life


    It is interesting to note, but in the initial stages of the group, the “husband” did not have such significance as from 2016 and could not fall into the top three. Apparently the general somewhat miserable discourse formed by the creators added the importance of men as the causes of motherhood troubles (it is hard to imagine that in the last 2 years husbands really became much worse).

    In general, the main issues of concern for mothers are fairly obvious. Lack of time, opportunities, help from her husband, unfulfilled desires, problems with work, with the house and who told whom to whom.

    Tag Frequency


    One of the important indicators of the group’s content is the hashtags used. They show what topics are currently taking shape in the current period. Opposite the hashtag, how many times it has been mentioned. Hashtags that have been used less than 5 times are not shown.

    (2017, 4)
    (2017, 5)
    (2017, 6)
    (2017, 7)
    (2017, 8)
    Motherhood - 52.00
    Motherhood - 54.00
    motherhood - 78.00
    motherhood - 81.00
    Motherhood - 60.00
    happiness of motherhood - 7.00
    Happiness of motherhood - 7.00
    happy wife - 11.00
    childbirth - 31.00
    childbirth - 58.00
    Happiness of motherhood - 5.00
    Happiness - 7.00
    Happiness - 6.00
    Happiness of motherhood - 9.00
    nitakoy - 6.00
    Happiness to be married - 7.00
    have a daughter - 5.00


    In principle, until the summer of 2017, hashtags were not massively used, except for the hashtag of the name of the group in various forms. In the summer of 17, the theme of " rejuvenation by childbirth " became popular . The hashtag "nitakoy" did not take root.

    TF-IDF


    In the most common words, there is usually no specificity of subject. In principle, it is clear that since the group is about motherhood, then here about moms, husbands, children and all sorts of such things. But it would be interesting to know what specifically excited people at different periods of the group's existence. For this, this very sorting criterion TF-IDF is used . In this case, a variation for 6 month periods (windows) for calculating IFD.

    I will not explain what it is, but it’s like the most important thing that excites people besides the general line of the whole public in this period. Words that are very often in this month and there are practically none in the previous 6 months.

    (2017, 1)
    (2017, 2)
    (2017, 3)
    (2017, 4)
    (2017, 5)
    (2017, 6)
    (2017, 7)
    (2017, 8)
    christmas
    globally
    March
    running wild
    nitak
    chaos
    childbirth
    childbirth
    is dead
    Samoyed
    zalipa
    ukat
    sat down
    choking
    rejuvenated
    kuren
    product
    old
    rent out
    tumble
    bolt
    medicament
    fag
    cheslov
    hovering
    are silent
    tumble
    new pass
    brought
    sarcasm
    fire
    episode
    howled
    zakida
    cram
    haste
    bacter
    will
    thirties
    scoliosis
    crash
    lived
    diplomat
    though
    call
    elm
    banged
    outright
    drinking
    candy wrapper
    boiled
    antics
    comfort
    pass
    suffered
    remote
    flat
    parent
    feminine
    will come
    fit
    hostess
    lining
    thick
    on duty
    mood
    huyn
    destroy
    five year
    hospitalized
    pulse
    hyperhidrosis
    bibik
    intimate
    by asking
    right now
    will leave
    push
    creep
    hell


    It should be noted that anti-aging labor has an extremely high TF-IFD compared to other words in the first places ~ 40. About 10 times more than the average value of the first place ~ (3-4). Only the word “flash mob” reached a comparable meaning in the spring of 2016 along with some other words:

    • flashmob 17.95
    • gender 16.32
    • yellow 10.88
    • beige 9.30
    • mimocrocodile 8.8

    I'm afraid to even imagine what it was.

    Bigrams


    Popular pairs of words that are most common.

    (2017, 4)
    (2017, 5)
    (2017, 6)
    (2017, 7)
    (2017, 8)
    I feel
    everyday
    everyday
    everyday
    after childbirth
    everyday
    just me
    all day
    after childbirth
    everyday
    Eat me
    after birth
    even
    I feel
    all day
    all day
    Eat me
    I feel
    it was necessary to
    I feel
    guilt
    I feel
    it was necessary to
    all day
    after birth
    just me
    may be
    of my life
    Eat me
    it was necessary to
    after childbirth
    it was necessary to
    after birth
    after birth
    thank God
    it could be
    all day
    all day long
    Lately
    right after
    even
    the moment when
    in order
    most
    it could be
    all day
    after that,
    all this
    all day
    in a month


    It is felt that some routine of what is happening and a sense of missed opportunities are clearly not happy. However, this conclusion is banal, as well as the fact that immediately after childbirth, some kind of trash always happens.

    Purely out of sports interest, it should be noted that frequent bigrams are very related to the motive of the equally frequent theme of time in texts. There are much fewer stable couples about childbirth and even less about husbands.

    Augmented Bigrams


    Bigrams alone do not reveal enough emotionality or context. To do this, we tried for each bigram to find the words that come closest to the most popular bigrams (plus 5 words) .

    Bigram
    Words that appear next to bigrams are often
    I feel
    [(mater, 10), (women, 7), (husband, 6), (could, 6), (horrible, 6)]
    everyday
    [(one, 21), (kid, 17), (affairs, 14), (husband, 14), (each, 11)]
    all day
    [(husband, 8), (game, 6), (kid, 6), (many, 5), (want, 5)]
    just me
    [(forces, 10), (could, 4), (reb, 3), (loved, 3), (thought, 3)]
    after childbirth
    [(first, 14), (year, 14), (pregnant, 13), (month, 11), (immediately, 10)]
    it was necessary to
    [(thought, 7), (children, 5), (affairs, 5), (talk, 5), (mat, 5)]
    all day
    [(house, 10), (husband, 10), (mouth, 8), (night, 8), (kid, 8)]
    after birth
    [(kid, 28), (son, 11), (month, 10), (reb, 9), (nka, 9)]
    even
    [(game, 6), (bud, 5), (husband, 5), (evening, 4), (kid, 4)]
    Eat me
    [(very, 6), (could, 6), (son, 6), (husband, 5), (one, 5)]


    The number to the left of word forms in the second column shows how many times in 2017 this word was less than 4 words from the bigram in the first column.
    How can this be interpreted?

    For example, so that the most common problem is that “every day” mom is “alone”. What can be seen from the second line. And after the “first” birth, something happens “right away.”

    However, the abundance of the “most common words” that are characteristic of any text in this public confuse. To fix this somewhat, we filter out the most popular words from a search for related words. Thus, we can see which words are specific to these bigrams, and not to public.

    Bigram
    Words that appear next to bigrams are often
    I feel
    [(mater, 10), (women, 7), (terrible, 6), (happy, 6), (last, 6)]
    everyday
    [(each, 11), (simple, 11), (cheat, 10), (mouth, 9), (hate, 9)]
    all day
    [(game, 6), (many, 5), (cartoon, 5), (gathering, 4), (hands, 4)]
    just me
    [(forces, 10), (loved, 3), (thought, 3), (killed, 3), (know, 3)]
    after childbirth
    [(first, 14), (pregnant, 13), (immediately, 10), (hair, 9), (became, 9)]
    it was necessary to
    [(thought, 7), (talk, 5), (mat, 5), (simple, 4), (neighbor, 4)]
    all day
    [(mouth, 8), (mornings, 7), (move, 7), (slept, 5), (yells, 5)]
    after birth
    [(nka, 9), (junior, 9), (immediately, 5), (simple, 4), (beginning, 4)]
    even
    [(game, 6), (bud, 5), (evening, 4), (equally, 4), (sleeping, 4)]
    Eat me
    [(sem, 4), (simple, 4), (familiar, 3), (friends, 3), (feelings, 3)]


    Trigrams


    The most frequent triples.

    (2017 4)
    (2017 5)
    (2017 6)
    (2017 7)
    (2017 8)
    guilty feeling
    In a few days
    instead of
    love my son
    immediately after childbirth
    strong enough to
    instead of
    mother is to blame too
    instead of
    after the first birth
    be strong enough
    every time when
    after giving birth
    every time when
    biggest mistake
    need to be enough
    day after childbirth
    only when
    after the second birth
    I can afford
    mother Mother Mother
    fuss. fuss. fuss.
    most of me


    For August, it was typical, as we can see, to write posts about childbirth, but in principle, for the entire period from mid-2015, the main topics of trigrams were:

    • Expression of love for a child, such as "I love my son", "I love my children", ...
    • Expression of guilt - “I feel guilty”, “guilt before”, ...
    • Expression of the feeling that every time my mother something ...

    Author and nitaki patriarchal mimocrocodile


    Of particular interest is the use of some specific words specific to the group and its discourse.

    Fem-discourse had a rather strong influence on the group, due to the ideological nature of the administration. Therefore, the dynamics of manifestation of females of newspeak in posts is interesting. The most commonly used is the author’s artificial feminitive in relation to writing moms.



    It is interesting that this word experienced a certain decline in use at the beginning of 2017. Perhaps this is due precisely to the fact that at these moments the administration did not particularly intervene in the life of the group. It is she who most often uses this word in her comments.

    The word "patriarchal" is not so often used, but it is.



    In general, everything hints at us that the peak of interest in this ideology was in the middle of 2016 and the very “flash mob” that was often mentioned at that time.

    But there are other characteristic words taken from different contexts. For example, the word “mimocrocodile”. For those who do not understand, this word means, for example, a commentator who got into the public with his very important and useful opinion. And in general, the one who walked by and said something, but it would be better to walk by.



    The beginning and peak of the use of this word coincides with the peak of commenting on posts of mothers in the group. The word clearly arose from dissatisfaction with the results of this comment. In the future, answers to posts became less and the word ceased to be used so actively.

    And finally, the designation of her husband as "nothing."



    The most beautiful schedule. It shows how the meme is taking root in the group, its use is becoming widespread, and the number of references to such begins to grow exponentially.
    In general, it is worth noting that femwords are used much less often and take root worse than group-specific expressions.

    Group negative dynamics


    The question arises. And how does a group influence the authors of this group? How much do they change? Maybe this group gives rise to anger and intolerance in the writers, which is growing with the number of posts? Or vice versa, the realization that many have similar problems calms?

    We decided to check it out like this. We have compiled a list of "bad" words. We made two lists. I’ll give a shortened second one here:

    fucking, fucking, dick, fag, cock, fucking, fucking, fucking, fucking, fucking, shit, fucking, fucking, fucking, fucking, fucking, fucking, fucking, fucking, fucking, fucking

    Next we watched how the average number of these bad words per post changes over the months.



    In general, it can be seen that the amount of polyvalov falls insecurely over time. Perhaps this is the position of the administration. But maybe not, because the administration does not mind to take care of husbands, children and relatives. Maybe it just makes us a little kinder. Or just everyone tired.

    And how do readers rate all this? Will the post make the post more attractive? We chose the last 6 months (02.2017-08.2017) as the most stable period in history. For him, we calculated the average number of likes, depending on the number of bad words in the post.



    On average, the correlation is not very convincing given the scatter of estimates. Therefore, we can safely assume that if you swear like a shoemaker, you can hardly get more likes.

    The most "loyal" words


    The question remains. And what words lead to the fact that the post is evaluated positively? It seems we showed that any abuse is not very helpful. Then the experiment should be carried out as follows.

    We watched posts over the past 6 months. For each of the possible words from these posts, they remembered how many likes this post received. Passed through all posts. For each of the words, a certain selection of likes was typed. This sample was considered average, if the sample was large enough.

    Thus, the words that were present ONLY in posts that usually gained likes much more than average were distinguished:

    go, discharge, give birth, man, says, must, man, years, child, cook, childhood, fuck, new, ours, money, your

    The spread of the "number of likes" for these words is from 370 to 440, with a total average of 290.

    Least successful words


    If you can check the most successful words, you can also check the words that "guaranteed" the absence of likes and the average number of likes "per word" was much less than the average.

    temperature, scary, frustrating, hysteria, survive, refuses, coughing, tantrums, face The

    spread of likes "for such words" is 214 to 230, with a total average of 290.

    Words Leading to the Least Standard Deviation in Grades


    But besides the words with the best and worst ratings, you can also find words for which the ratings for posts with these words have always been very similar. Words that kind of “guaranteed” that people’s assessment of the post will not change much. Words that most strongly influence the assessment, no matter which. Negative or positive.

    her, there are, screaming, wild, breasts, only, few, suddenly, alone, her, mom, together, wanted the

    standard deviation for these words varies from 73 to 88, with an average of 190.

    Ideal Post Concept


    It remains to figure out which plot can cause the greatest and least resonance. With a perfectly underrated post, everything is quite simple. Its plot can be traced from a set of "underestimated" words quite clearly.

    Mine got sick. Temperature 39.8, cough . Refuses to eat, rolls tantrums , throws things and is terribly angry. I break down and I also have a tantrum . I go around the house all the time with a displeased face . How to survive all this ?

    Naturally, such a post, which will be super-underestimated and contains all the “bad” words, can be provided with a lot of details and made more similar to reality, but my job is simply to convey a script that does not cause compassion for others.

    And an interesting aspect of this scenario is that it is underestimated due to the fact that there is no image of the enemy. The child is sick and hysterical. Mom can't stand it either. This is all logical and understandable, albeit unpleasant. There is no one here who could be thrown poop. In general ... There is nothing to regret, nothing to sympathize with.

    With a set of good words, things are a little more complicated. There is no ideal picture, except that there should be a husband, childbirth, discharge, money and years ... preferably lost. But you can try.

    Immediately after discharge, on the same day, the man says that he will not do the fucking thing. Our apartment should be cleaned, given birth and cooked by women. At the same time, how to earn money, so he, too, is not at work. Man , nothing to say. I spent the years of my life on this freak and should I give the same amount to his child? Bitches like that, “go fuck you all .” A

    clearly expressed antagonist in the form of a husband may very well guarantee you quite a few likes. It is obvious that almost anyone can play the role of an antagonist. For example, a doctor in a hospital or grandparents.

    Summary / Conclusion


    A huge amount of any disparate measurements made does not allow (at least to me) to write a beautiful, juicy conclusion with a global conclusion about life.

    Therefore, a few uncertain micro-conclusions with the list:

    • On average, there are approximately 100 words in each post and this does not depend on anything and does not change. But this is optional
    • In winter, mothers write less actively about problems
    • Peak Mondays
    • Liking is always about the same and the number of likes depends mainly on the size of the group
    • At first, admins intervened, then they got tired. But they are still wrong
    • The culture of “commentary on a post in another post” was happily born and died in these two years
    • In recent months, a meme has been rampant - "giving birth." Highly successful
    • The group forms its newspeak, which it uses. (mimocrocodile, nitakoy, asshole, ...)
    • Мат в группе уже не так популярен, как раньше, но есть вероятность, что если хорошо проматериться, вас оценят. Но это не обязательно
    • Если вы хотите лайков, пишите, что кто-то очень плохой и вас обижает. Если все плохо, но никто в этом не виноват, шиш вам, а не лайки

    That, in fact, is all ... There are some methodological flaws in this for everyone. There is no adequate comparison, for example, of a specific public dictionary with an external (or basic) dictionary. Some slightly deeper and more fun questions related to the use of neural networks and the generation of posts are also past. Again, no code examples. But that would still inflate more, and most likely everyone will be able to count the words in python and use nltk themselves (moreover, I'm not the best role-playing model of a pythonist to show off the code).

    If you have your own insights and interesting ideas from all this, I am always ready to listen.

    Also popular now: