We measure the dynamics of the mention of entities in the information field

    Today we will show a dashboard with visualization of data on the dynamics of popular entities, set up a separate instance for users of the Habr and give it the opportunity to monitor our own indicators, adding a regularity.

    More on what's going on here

    We study the Internet, in particular, we have the opportunity in a day to bypass all registered domains of the world in the face and process information. The product is quite complex and in general, to popularize the study of open data, we launched an instance that scans daily the top 1 million sites of the world according to the version of Alexa, calculates the content according to 300+ regulars and displays indicators on the dashboard.

    To understand the interest, an article was published earlier , the results of the survey of which could not but make us happy:

    Despite the frankly yellow heading, the article received quite a good rating, but the main thing:

    • 191 (52%) - unequivocally stated their desire to conduct a study
    • 123 (34%) - we recorded you in our gang
    • 53 (14%) - ok, but you come in, if that

    CA - 314 users of the hub, we could not leave you unattended and went to saw dashboards for this disco.

    We posted the dashboard on statoperator.com

    In order to measure your own indicators, you had something to compare them with - we posted in the public domain data on existing entities in dynamics for a couple of months.

    • daily, at 19:00 Moscow time, bypasses the list of sites of the top 1,000,000 (per hour)
    • each successful web server response is parsed by all those regulars on the entities that you now see in the legend + those that you add yourself

    All indicators and settings when working in a dashboard are thrown in url.

    How to add a regular season?

    Fill out the

    Data source form - header / html / text (in the web server response header / in the html code / in the text extracted from the document)
    Regex type - type of regularity: mentions / hosts (the amount of what is in the regularity document / was that found or not)
    JAVA regex - regular seasoning .

    Testing is convenient here.

    All adequate regularities will appear on the dashboard after the next iteration.

    Also popular now: