Sending statistics from a DataFrame to BigQuery using Yandex.Direct statistics as an example

  • Tutorial
In this article, I’ll talk about how I reduced statistics across the entire context in one place using BigQuery and Data Studio

. I had a need to visualize data for all traffic sources.

For this task, I use Data Studio. Google Ads is being pulled there in an excellent way, but with Yandex Direct everything is more complicated.

I chose the semi-manual method of statistics statistics for myself: receiving and sending statistics from my computer.

I get statics daily in various sections in a DataFrame.



And this is great because pandas has a pandas_gbq library that can send DataFrames to BigQuery.

What statistics to store in BigQuery


To begin with, let's decide which sections of the statistics we will send.
Since we can store a lot of information in BigQuery, I threw out statistics for myself in terms of days.

We also need to take into account the fact that we need to get statistics for at least a month for rewriting, as advertising systems update old statistics due to a click.

Therefore, for the time being, I do not use statistics in terms of keys for myself, since it is often redundant.
I will tell everything on the example of daily statistics in the context of the Republic of Kazakhstan. (columns: date and PK)

Next, we will talk about how to modify the DataFrame so that it goes.

Modification DataFrame



Problem 1


For some RCs, we will get empty fields in behavioral factors and conversions.



To get around this problem, you need to replace the cons by "0".

f['Conversions'].replace(['--'],[0],inplace=True)
f['CostPerConversion'].replace(['--'],[0],inplace=True)
f['ConversionRate'].replace(['--'],[0],inplace=True)
f['AvgPageviews'].replace(['--'],[0],inplace=True)
f['BounceRate'].replace(['--'],[0],inplace=True)

And assign a numeric data type.

f[["Conversions","CostPerConversion","ConversionRate","AvgPageviews","BounceRate"]]=f[["Conversions","CostPerConversion","ConversionRate","AvgPageviews","BounceRate"]].apply(pd.to_numeric)

Problem 2


Add a separately indexed column. If we send such a DataFrame or look at the data type with dtipe, we see that there is no column with dates.

To fix this, create a column that will equal the indexed column.

f['Date']=f.index


Get the next DataFrame



Sending a DataFrame to BigQuery


pandas_gbq.to_gbq(f,'мой датасет.файл',project_id='мой проект',if_exists='replace')

We indicate the name of the DataFrame, the name of the dataset, the name of the file (it may not exist yet), the name of the project and how the file will be recorded (in our case, the file will be completely overwritten).

We get the following in BigQuery: All that



remains is to visualize the data in DataStudio. End Code:

f['Date']=f.index
f['Conversions'].replace(['--'],[0],inplace=True)
f['CostPerConversion'].replace(['--'],[0],inplace=True)
f['ConversionRate'].replace(['--'],[0],inplace=True)
f['AvgPageviews'].replace(['--'],[0],inplace=True)
f['BounceRate'].replace(['--'],[0],inplace=True)
f[["Conversions", "CostPerConversion","ConversionRate","AvgPageviews","BounceRate"]] = f[["Conversions", "CostPerConversion","ConversionRate","AvgPageviews","BounceRate"]].apply(pd.to_numeric)
pandas_gbq.to_gbq(f, 'YD_Days.my_client_rk', project_id='my_project',if_exists='replace')

Thanks for attention!

Also popular now: