i_shutov December 22, 2017 at 11:33

"Very skilled hands": make Tableau / Qlik from R and the "blue tape"

It is a continuation of previous publications .

Naturally, the name is amusing, but, as is well known, there is some truth in every joke. The topic itself arose when, for the next hundredth time, I had to hear a persistent wish that a “flexible report / graph designer” was needed. After a certain point, it’s easier to take and do than to explain once again what tidyversecovers all the necessary needs.

The statement of the problem itself is extremely simple: to provide a graphical interface for drawing a variety of graphical representations using arbitrary tabular data. The classic solution is two related entities:

an interface with a large-large number of menus and buttons, with multiple behind-the-scenes IFto control the mutual states of these elements;
“Flexible plotter” with a large number of nested IFgraphics for rendering in accordance with the fed data and the position of the sliders set in the UI.

On the one hand, making “Yet Another Tableau” is completely uninteresting. On the other hand, staging in the style of “making everything happen, but nothing needs to be done” is a typical task for TRIZ.

In general, after some deliberation, a solution was developed that almost satisfies the latest formulation. The Shiny application itself is still under the NDA, a freely published prototype is shown in the picture.

Two key ideas to simplify the task are as follows (nothing new, everything has already been invented before us):

instead of the statically specified UI, we pass to the dynamically generated UI;
We use the R interpreter not only for the source code, but also inside the code itself.

Idea 1. Dynamic web-based interface

The option when all the control elements are statically set and only their parameterization changes (name, state, lists, selected elements ...) is convenient at the design stage. Everything is clear, everything is obvious, you can touch the pens. But if the permissible states of these elements are very strongly connected both with the initial data for analysis ( data.frame) and with the state of each other, we find ourselves in a situation of a very large number of non-trivial event handlers for each element. Lots and lots of confusing code.

Let's do it differently. Instead of UI elements with complex behavior, we scatter them with the help of uiOutputplaceholders, into which we dynamically calculate and generate with the help of a shiny::renderUIrepresentation of this element. All external parameters required to generate an element are treated as reactive elements. Moreover, all such interactive elements act as "autonomous agents" that look at the environment and adapt to it. The user changed the state of one element - all addicts began to recalculate their state in turn (we obviously do not process events, but use the shiny reactive approach). When their state changes, new induced changes may occur. And so, until everything is stabilized.

As a result, only one handler remains in the code (“Go” button)

  observeEvent(input$gen_plot, { # код демонстрирует принцип
    escname <- function(x){
      # имена колонок надо закавычить
      # .....
    }
    point_code <- ""
    if(input$shape_type!="__NO_MAPPING__") {
      aes <- c("shape"=escname(input$aes_shape_col), "color"=escname(input$aes_color_col))
      point_code <- buildPointCode(fixed=c("shape"=input$shape_type, "color"=glue("'{input$plot_color}'")), aes=aes)
    }
    line_code <- ""
    if(input$line_type!="__NO_MAPPING__") {
      aes <- c("linetype"=escname(input$aes_linetype_col), "color"=escname(input$aes_color_col))
      line_code <- buildLineCode(fixed=c("linetype"=input$line_type, "color"=glue("'{input$plot_color}'")), aes=aes)
    }
    gcode <- glue("ggplot(data_df(), aes(x=`{input$x_axis_value}`, y=`{input$y_axis_value}`))\\
                  {point_code} {line_code} + xlab('{input$x_axis_label}')") %>%
      style_text(scope="spaces")
    plot_Rcode(gcode)
  })

Dependencies of elements can be very difficult, multi-stage, but the equilibrium state is fast, all this is invisible to the user. For the analyst-developer, this is also hidden under the hood. To simplify life, we give the end user pictures of points and lines instead of numbers appearing in ggplot.

Idea 2. Reuse of the interpreter R

Many people like to poke at the fact that R is "slow because he is an interpreter." Omitting the groundlessness and unfoundedness of such a statement (nobody wants to go down in detail), we use this "weakness" as a force.

Instead of writing a complex “flexible plotter”, which will generate parameterized (welcome to Non-Standard-Evaluation!) Source data and take into account all the nuances of the UI state when generating graphics, instead we will create a tidyverse dialect R code generator (as strings), which, upon subsequent software execution (eval), will generate the required graph:

  output$staticPlot <- renderPlot({
    base::eval(parse(text=req(plot_Rcode)))
  })

Fin

Shiny prototype “a la Tableau” fits into 250 lines of code, including the UI part, comments and multiple validation (assertions) and costs 0 rubles 0 kopecks under licenses.

Happy 2018 year!

Previous publication - “R and Information Security. How to eliminate a conflict of interest and run R on Linux offline . "

Tags:

data science

"Very skilled hands": make Tableau / Qlik from R and the "blue tape"

Idea 1. Dynamic web-based interface

Idea 2. Reuse of the interpreter R

Fin

Also popular now: