Github Visualizer - A repository history visualization service with GitHub

    Being a fan of visualization software products in repositories such as code_swarm and gource . One day I was visited by a muse who inspired me to create an online service for visualizing repository statistics from GitHub .
    And today I want to bring to your court my GitHub Visualizer project (a project on GitHub ).
    Here is a screencast for a preliminary acquaintance.

    And not a big gif
    image

    What is used



    Description of schedules and their implementation


    In this project, there are three main visualizations that demonstrate information about repositories, their history and quantitative indicators.

    Repository List Visualization

    Repository graph
    repository list

    • Circles (vertices) are repositories
    • The size of the vertex depends on the age of the repository; the older, the smaller.
    • Opacity depends on the date of the last change.
    • The color and grouping of the vertices depends on the main language of the repository.
      main language
    • Histogram of languages
      • Shows summary information for each language
      • Displays language color
      • Allows you to filter vertices on hover


    To build the graph, D3.Layout.Force and the clustering method proposed in this example were used .
    A piece of code from an example
    var force = d3.layout.force()
        .nodes(nodes)
        .size([width, height])
        .gravity(.02)
        .charge(0)
        .on("tick", tick)
        .start();
    function tick(e) {
      circle
          .each(cluster(10 * e.alpha * e.alpha))
          .each(collide(.5))
          .attr("cx", function(d) { return d.x; })
          .attr("cy", function(d) { return d.y; });
    }
    // Move d to be adjacent to the cluster node.
    function cluster(alpha) {
      var max = {};
      // Find the largest node for each cluster.
      nodes.forEach(function(d) {
        if (!(d.color in max) || (d.radius > max[d.color].radius)) {
          max[d.color] = d;
        }
      });
      return function(d) {
        var node = max[d.color],
            l,
            r,
            x,
            y,
            i = -1;
        if (node == d) return;
        x = d.x - node.x;
        y = d.y - node.y;
        l = Math.sqrt(x * x + y * y);
        r = d.radius + node.radius;
        if (l != r) {
          l = (l - r) / l * alpha;
          d.x -= x *= l;
          d.y -= y *= l;
          node.x += x;
          node.y += y;
        }
      };
    }
    // Resolves collisions between d and all other circles.
    function collide(alpha) {
      var quadtree = d3.geom.quadtree(nodes);
      return function(d) {
        var r = d.radius + radius.domain()[1] + padding,
            nx1 = d.x - r,
            nx2 = d.x + r,
            ny1 = d.y - r,
            ny2 = d.y + r;
        quadtree.visit(function(quad, x1, y1, x2, y2) {
          if (quad.point && (quad.point !== d)) {
            var x = d.x - quad.point.x,
                y = d.y - quad.point.y,
                l = Math.sqrt(x * x + y * y),
                r = d.radius + quad.point.radius + (d.color !== quad.point.color) * padding;
            if (l < r) {
              l = (l - r) / l * alpha;
              d.x -= x *= l;
              d.y -= y *= l;
              quad.point.x += x;
              quad.point.y += y;
            }
          }
          return x1 > nx2
              || x2 < nx1
              || y1 > ny2
              || y2 < ny1;
        });
      };
    }
    



    Actually, this was the muse that visited me.
    The functions are taken practically unchanged with some exceptions and additions.
    The implementation of the functional for visualizing the list of repositories is in two files repo.js and langHg.js


    Repository History Visualization

    After you download information about the list of user repositories, you can select the repository you are interested in either in the column or in the list of repositories in the second stage panel (you can also set the number of recent revisions for analysis here).
    Stage Two Panel
    Then analyze it by pressing the “Analyze” button. During the analysis, a graph of the repository history will be built. Which displays information on the number of recent commits you specified (by default, 100 commits. Maybe there are less than how many are in the repository).
    Story chart
    image

    • The X axis shows the dates of fixation.
    • Each red dot represents a fixation.
    • Arcs up and down are the number of added and deleted lines in the commit.
    • The areas in the background show the number of files to be modified.
      • Uploaded files
      • Modified Files
      • Deleted files

    • Participant Chart - Shows the activity of a participant according to various parameters.
      Participant Chart

    In order to draw diagrams, I used a number of tools and their combination from the d3.js library.
    The area calculation is performed by the d3.svg.area () component ( Stacked Area example ). I consider the stack myself, but everything else is trivial for d3js.
    A piece of code where the stack is considered
    var layers =
        [
            {
                color: colors.deletedFile,
                values: sorted.map(function (d) {
                    return {t : 1, x: d.date, y0 : 0, y: (d.stats ? -d.stats.f.d : 0)}
                })
            },
            {
                color: colors.modifiedFile,
                values: sorted.map(function (d) {
                    return {x: d.date, y0 : 0, y: (d.stats ? d.stats.f.m : 0)}
                })
            },
            {
                color: colors.addedFile,
                values: sorted.map(function (d) {
                    return {x: d.date, y0: (d.stats ? d.stats.f.m : 0), y : (d.stats ? d.stats.f.a : 0)}
                })
            }
        ]
    ;
    function interpolateSankey(points) {
        var x0 = points[0][0], y0 = points[0][1], x1, y1, x2,
            path = [x0, ",", y0],
            i = 0,
            n = points.length;
        while (++i < n) {
            x1 = points[i][0];
            y1 = points[i][1];
            x2 = (x0 + x1) / 2;
            path.push("C", x2, ",", y0, " ", x2, ",", y1, " ", x1, ",", y1);
            x0 = x1;
            y0 = y1;
        }
        return path.join("");
    }
    var y1 = d3.scale.linear()
            .range([h6 * 4.5, h6 * 3, h6 * 1.5])
            .domain([-data.stats.files, 0, data.stats.files]),
        area = d3.svg.area()
            .interpolate(interpolateSankey /*"linear"  "basis"*/)
            .x(function(d) { return x(d.x); })
            .y0(function(d) { return y1(d.y0); })
            .y1(function(d) { return y1(d.y0 + d.y); })
        ;
    


    To build arcs, I use d3.svg.arc () (there are many examples where this component is used: Arc Tween , Pie Multiples ).
    I do the generation of the X scale using the two components d3.time.scale () and d3.svg.axis . The implementation is taken from this Custom Time Format example .
    The participants diagram is calculated by d3.layout.pack () ( Circle Packing example ). In order to sort and resize the circles, I change the sort and value properties .
    The code for this visualization is located in two stat.js andusercommit.js


    Dynamic visualization

    For the sake of this, everything was all a venture. I like what happens when you render using code_swarm , but it’s not convenient to clone the repository to your computer every time and then render it.
    In this visualization, I tried to implement all the ideas that apply in code_swarm and make a change in the settings on the fly.
    Visualization of song-of-github , Launcher link , Article on Song-of-github on Habrahabr
    image

    • Each particle is a file. They move from developer to developer.
    • The particle size depends on the degree of its change, the more often it is changed, the larger it is.
    • The color of a particle depends on its expansion.
    • Over time, the particle disappears, as soon as all the particles disappear, the user also melts. (This can be adjusted by the corresponding settings in the panel 3 stage, User Life and File Life , value 0 - immortal).
    • Each participant collects around himself those files with which he manipulated.
    • If the files leave the user's orbit and no longer flies to anyone, then it is deleted.
    • Every second is a day (plans to add the ability to change the step)
    • The histogram shows the number of files participating in the commit, divided by extension
    • The legend shows the number of existing files at the moment for each extension.


    Physics calculations are performed by the notorious D3.Layout.Force , but with a slight omission there are two of them. One calculates the position of users, the other considers the position of the files depending on the user's position. How is this done? Each file has a property author; the current current (commit time) user writes to it if this file is in the current commit. The above clustering method gets it and considers the position of the given file in space.
    Clustering function
        function tick() {
            if (_force.nodes()) {
                _force.nodes()
                    .forEach(cluster(0.025));
                _forceAuthor.nodes(
                    _forceAuthor.nodes()
                        .filter(function(d) {
                            blink(d, !d.links && setting.userLife > 0);
                            if (d.visible && d.links === 0 && setting.userLife > 0) {
                                d.flash = 0;
                                d.alive = d.alive / 10;
                            }
                            return d.visible;
                        })
                );
            }
            _forceAuthor.resume();
            _force.resume();
        }
        // Move d to be adjacent to the cluster node.
        function cluster(alpha) {
            authorHash.forEach(function(k, d) {
                d.links = 0;
            });
            return function(d) {
                blink(d, setting.fileLife > 0);
                if (!d.author || !d.visible)
                    return;
                var node = d.author,
                    l,
                    r,
                    x,
                    y;
                if (node == d) return;
                node.links++;
                x = d.x - node.x;
                y = d.y - node.y;
                l = Math.sqrt(x * x + y * y);
                r = radius(nr(d)) / 2 + (nr(node) + setting.padding);
                if (l != r) {
                    l = (l - r) / (l || 1) * (alpha || 1);
                    x *= l;
                    y *= l;
                    d.x -= x;
                    d.y -= y;
                }
            };
        }
    

    And the place of initialization of force layouts
    _force = (_force || d3.layout.force()
        .stop()
        .size([w, h])
        .friction(.75)
        .gravity(0)
        .charge(function(d) {return -1 * radius(nr(d)); } )
        .on("tick", tick))
        .nodes([])
        ;
    .....
    _forceAuthor = (_forceAuthor || d3.layout.force()
        .stop()
        .size([w, h])
        .gravity(setting.padding * .001)
        .charge(function(d) {
            return -(setting.padding + d.size) * 8;
        }))
        .nodes([])
        ;
    


    Two threads work (if I may say so), one is the setIntervalother requestAnimationFrame. The first is responsible for moving in time, the second for rendering. But in fact, force also has its own timers and asyncForEach (needed so that there is a good system response and files from one commit do not fly out all at once, but with a slight delay) also launches setTimeouts.
    The code can be found in the show.js file .

    Data retrieval


    I get data from api.github.com .
    Data is received using the JSONP method .
    According to the GitHub API, there is a need for Client_idand Client_Secret, but then the limit of requests will be 60 for one ip per hour. That's why I created the application in the profile settings on GitHub and the required authorization information is added to the request.
    That’s what I’m all about ... And to the fact that the restriction for this method of authorizing 5000 requests per hour, some repositories like mc have a rich history. And if you walk along it well, then the limit is quickly exhausted, as the system tells you. If this happens, you can specify in the System settings menufrom the right client_idand client_secretyour application (having previously created it if it is not already).
    GitHub has very good APIs, it is enough to complete only one request, let's say requesting user information, api.github.com/users{user}all other links will be in the response. Moreover, if this is a multi-page request (suppose we get a list of repositories, the response contains only information on 10 repositories), then in the response object in the parameter metathere is a link to the next page with a full set of authorization parameters.

    In general, I express gratitude to the developers of the API and those who wrote documentation on it, it is a pleasure to work with them.
    I also thank the D3js developers for the rich collection of examples (without which I probably would not have been inspired to do this) and the very complete documentation with all the explanations.

    Conclusion


    At the very beginning, when I began to do the project, it was a toy for myself, in fact, it remained so. If you fork my repositories and find a bunch of errors or screw something new, please leave a Pull Request or write to Issues .
    During development, the application was tested only in Google Chrome dev-m (no, of course, I obviously fixed the jambs that were fixed in other browsers), if you know how to make it work correctly in your favorite browser, I will be infinitely grateful.
    Waiting for healthy criticism.
    Thank you for attention!


    PS
    Some interesting repositories:

    Only registered users can participate in the survey. Please come in.

    What is your main browser

    • 60.6% Google Chrome 336
    • 21.4% Mozilla Firefox 119
    • 2.5% Safari 14
    • 12.2% Opera 68
    • 1.4% Internet Explorer 8
    • 1.6% Other 9

    Everything works correctly

    • 51.6% Yes 175
    • 7.6% No (comments what exactly; in which browser) 26
    • 40.7% Have not tried 138 yet

    Also popular now: