Automating Google BigQuery Data Import with Google Cloud Functions

    We are constantly working with Google BigQuery - importing data about users, their orders and advertising costs from different sources in order to be able to combine them with each other. What does this give us? For example, if you have an online store and a customer places an order by phone, and then logs into the site, then using Google BigQuery, you can link all of his actions backdating. You can track the client’s entire journey through the marketing funnel - from the first hit on the site to buying at the brick and mortar store, and evaluate advertising campaigns taking into account such offline sales.

    In this collection, we have 6 python scripts to automate data import into Google BigQuery using Google Cloud functions : FTP; FTPS; HTTP (s); Intercom; MySQL and SFTP. The principle of operation is the same: using the HTTP POST request, the Cloud function is called, which receives data from the source and loads it into the Google BigQuery table. If the table already exists in the selected dataset, it will be overwritten.

    Primary requirements


    1. Project in Google Cloud Platform with activated billing.
    2. Access to editing (the “Editor” role of BigQuery data) and the execution of tasks (the role User of BigQuery tasks) for the service account of the Cloud-function in the BigQuery project where the table will be loaded;
    3. HTTP client to perform POST requests that call the Cloud function.

    Setup steps


    • Go to the Google Cloud Platform Console and log in with your Google account, or sign up if you don’t have an account yet.
    • Go to the project with activated billing or create a new billing account for the project.
    • Go to the Cloud Functions section and click "Create Function". Please note that there is a fee for using the cloud functions .
    • Fill in the following fields:

    Name: for example, ftp-bq-integration or any other suitable name;

    Allocated memory: 2 GB or less, depending on the size of the file being processed;

    Trigger: HTTP;

    Source code: Built-in editor;

    Runtime: Python 3.X.

    • Copy the contents of the main.py file into the built-in editor, the main.py tab.
    • Copy the contents of the requirements.txt file into the built-in editor, the requirements.txt tab.
    • Specify ftp / ftps / https as the called function, and so on, depending on the module you are using.
    • In the advanced settings, increase the wait time from 60 seconds. up to 540 seconds or less, depending on the size of the file being processed.
    • Complete the creation of the Cloud function by clicking on the "Create" button.

    FTP / FTPS / SFTP


    This module is designed to transfer files from FTP (FTPS, SFTP) servers to Google BigQuery using the Google Cloud function. The solution allows you to automatically download data to Google BigQuery from a file that is regularly updated on an FTP server.

    The file that needs to be obtained from the corresponding server can be in any suitable extension (.json, .txt, .csv), but should be in one of the following formats: JSON (newline-delimited) or Comma-separated values ​​(CSV).

    Usage example


    from urllib import urlencode
    from httplib2 import Http
    trigger_url = "https://REGION-PROJECT_ID.cloudfunctions.net/ftp/"
    headers = { "Content-Type": "application/json" }
    payload = {
                   "ftp": 
                            {
                              "user": "ftp.user_name",
                              "psswd": "ftp.password",
                              "path_to_file": "ftp://server_host/path/to/file/"
                            },
                   "bq": 
                            {
                              "project_id": "my_bq_project",
                              "dataset_id": "my_bq_dataset",
                              "table_id": "my_bq_table",
                              "delimiter": ",",
                              "source_format": "NEWLINE_DELIMITED_JSON",
                              "location": "US"
                            }
                }
    Http().request(trigger_url, "POST", urlencode(payload), headers = headers)

    HTTP (s)


    Module for transferring files from HTTPS servers to Google BigQuery.

    Usage example


    from urllib import urlencode
    from httplib2 import Http
    trigger_url = "https://REGION-PROJECT_ID.cloudfunctions.net/https/"
    headers = { "Content-Type": "application/json" }
    payload = {
                    "https": 
                            {
                                "path_to_file": "https://server_host/path/to/file/",
                                "user": "https.user_name",
                                "psswd": "https.password"
                            },
                    "bq":
                            {
                                "project_id": "my_bq_project",
                                "dataset_id": "my_bq_dataset",
                                "table_id": "my_bq_table",
                                "delimiter": ",",
                                "source_format": "CSV",
                                "location": "US"
                            }
                }
    Http().request(trigger_url, "POST", urlencode(payload), headers = headers)

    Intercom


    Module to automate data transfer from Intercom to Google BigQuery using the Google Cloud function. Currently, the module allows you to import entities from Intercom such as: users, companies, contacts, admins, conversations, teams, tags, segments. At the same time, the module does not support custom attributes.

    Usage example


    from urllib import urlencode
    from httplib2 import Http
    trigger_url = "https://REGION-PROJECT_ID.cloudfunctions.net/intercom/"
    headers = { "Content-Type": "application/json" }
    payload = {
              "intercom": {
                "accessToken": "INTERCOM ACCESS TOKEN",
                "entities": [
                        "users", 
                        "companies", 
                        "contacts", 
                        "admins",
                        "conversations",
                        "teams",
                        "tags",
                        "segments"
                        ]
                    },
                "bq": {
                    "project_id": "YOUR GCP PROJECT",
                    "dataset_id": "YOUR DATASET NAME",
                    "location": "US"
                 }
    }
    Http().request(trigger_url, "POST", urlencode(payload), headers = headers)

    Mysql


    Used to transfer files from MySQL servers to Google BigQuery using the Google Cloud function. This solution allows you to automatically download data to Google BigQuery from tables that are regularly updated on the MySQL server.

    Usage example


    from urllib import urlencode
    from httplib2 import Http
    trigger_url = "https://REGION-PROJECT_ID.cloudfunctions.net/mysql/"
    headers = { "Content-Type": "application/json" }
    payload = {
                "mysql": 
                        {
                            "user": "mysql.user",
                            "psswd": "mysql.password",
                            "host": "host_name",
                            "port”: 3306,
                            "database": "database_name",
                            "table_id": "table_name",
                            "query": "SELECT * FROM table_name"
                        },
                "bq": 
                        {
                            "project_id": "my_bq_projec",
                            "dataset_id": "my_bq_dataset",
                            "table_id": "my_bq_table"
                        }
    }
    Http().request(trigger_url, "POST", urlencode(payload), headers = headers)

    More detailed documentation for each module can be found in the readme files in each section.

    This is just the beginning, and now we are working on scripts for Bitrix and amoCRM, because we see that they are the most popular among our customers. Share what methods you use to merge data and what integrations you lack for this.

    Also popular now: