i_shutov February 21, 2019 at 16:04

Using the computing power of R to test the hypothesis of equality of means

Recently, a need arose to solve a seemingly classical math problem. statistics.
A test of a certain push effect on a group of people is conducted. It is necessary to evaluate the effect. Of course, you can do this using a probabilistic approach.

But talking with business about the null hypotheses and the p-value is completely useless and counterproductive.

How can, as of February 2019, this be done as simply and quickly as possible with an “average hand” laptop on hand? Abstract note, no formulas.

It is a continuation of previous publications .

Formulation of the problem

There are two statistically identical measured groups of users (A and B). Group B is affected. Does this effect lead to a change in the average value of the measured indicator?

The most popular option is to calculate statistical criteria and draw a conclusion. I like the example of "Classic Statistical Methods: Chi-square Test" . In this case, it does not matter how this is done, with the help of specials. programs, Excel, R or something else.

However, the reliability of the findings can be very doubtful for the following reasons:

In fact, mat. few people understand statistics from beginning to end. You should always keep in mind the conditions under which one or another method can be applied.
As a rule, the use of tools and the interpretation of the results are based on the principle of a single calculation and the adoption of a “traffic light” decision. The fewer questions, the better for all participants in the process.

Criticism of p-value

A lot of materials, links to the most spectacular of those found:

What can be done?

Now everyone has a computer at hand, so the Monte Carlo method saves the situation. From p-value calculations, we proceed to the calculation of confidence intervals for the average difference.

There are many books and materials, but in a nutshell (resamapling & fitting) is very compactly presented in the report of Jake Vanderplas - “Statistics for Hackers” - PyCon 2016 . The presentation itself .

One of the initial works on this topic, including proposals for graphical visualization, was written by the well-known mathematician popularizer of the Soviet era, Martin Gardner: Confidence intervals rather than P values: estimation rather than hypothesis testing. MJ Gardner and DG Altman, Br Med J (Clin Res Ed). 1986 Mar 15; 292 (6522): 746-750 .

How to use R for this task?

In order not to do everything with our hands on the lower level, let's look at the current state of the ecosystem. Not so long ago, a very convenient package was transferred to R dabestr: Data Analysis using Bootstrap-Coupled Estimation .

The principles for calculating and analyzing the results used in dabestrcheat sheets are described here: ESTIMATION STATISTICS BETA ANALYZE YOUR DATA WITH EFFECT SIZES .

R Notebook example for "touch":

---
title: "A/B тестирование средствами bootstrap"
output: 
  html_notebook:
    self_contained: TRUE
editor_options: 
  chunk_output_type: inline
---

library(tidyverse)
library(magrittr)
library(tictoc)
library(glue)
library(dabestr)

Simulation

Create a lognormal distribution of the duration of operations.

my_rlnorm <- function(n, mean, sd){
  # пересчитываем мат. моменты: https://en.wikipedia.org/wiki/Log-normal_distribution#Arithmetic_moments
  location <- log(mean^2 / sqrt(sd^2 + mean^2))
  shape <- sqrt(log(1 + (sd^2 / mean^2)))
  print(paste("location:", location))
  print(paste("shape:", shape))
  rlnorm(n, location, shape)  
}
# N пользователей категории (A = Control)
A_control <- my_rlnorm(n = 10^3, mean = 500, sd = 150) %T>%
  {print(glue("mean = {mean(.)}; sd = {sd(.)}"))}
# N пользователей категории (B = Test)
B_test <- my_rlnorm(n = 10^3, mean = 525, sd = 150) %T>%
  {print(glue("mean = {mean(.)}; sd = {sd(.)}"))}

We collect the data in the form necessary for the analysis by means dabestr, and conduct the analysis.

df <- tibble(Control = A_control, Test = B_test) %>%
  gather(key = "group", value = "value")
tic("bootstrapping")
two_group_unpaired <- df %>%
  dabest(group, value, 
         # The idx below passes "Control" as the control group, 
         # and "Test" as the test group. The mean difference
         # will be computed as mean(Test) - mean(Control).
         idx = c("Control", "Test"), 
         paired = FALSE,
         reps = 5000
         )
toc()

Let's take a look at the results

two_group_unpaired 
plot(two_group_unpaired)

=================================================== ====

Result as CI

DABEST (Data Analysis with Bootstrap Estimation) v0.2.0
=======================================================
Unpaired mean difference of Test (n=1000) minus Control (n=1000)
 223 [95CI  209; 236]
5000 bootstrap resamples.
All confidence intervals are bias-corrected and accelerated.

and the pictures

are understandable and convenient for talking with business. All calculations were for a "cup of coffee."

Previous publication - "Data Science" special forces "in-house . "

Tags:

data science