How DeviceLock DLP prevents leakage of sensitive data on GitHub

    Recently, there has been a lot of news about the random leakage of various confidential data from a web service for hosting IT projects and their joint development of GitHub.



    I emphasize that we’ll talk about random leaks, i.e. due to negligence and without malicious intent from the perpetrators of incidents.Write off such leaks to the inexperience of employees in IT matters will not work, because GitHub users are overwhelmingly developers, i.e. well qualified and competent personnel. Unfortunately, even very good specialists sometimes make banal mistakes, especially when it comes to security issues. Let's take it as negligence.


    Here are some very famous examples related to GitHub:


    • 2014 - Uber company leaked personal data to 50,000 of its drivers. The reason was that in the GitHub public repository, the Uber developers saved Amazon cloud access keys (AWS), which, in turn, stored those lost data.
    • 2017 - it turned out that the developers of the manufacturer of quadrocopters DJI saved the company's SSL certificate private key and AES keys for encrypting the firmware in the public GitHub repository. In addition, credentials for Amazon Web Services were also stored there, which, in turn, contained flight logs, data on passports and driver licenses for DJI customers.
    • 2017 - DXC Technologies, a major US IT outsourcing engineer, uploaded AWS access keys to the GitHub public repository.
    • 2017 - GitHub public repository revealed source codes, reports and development plans for several major financial institutions in Canada, the United States and Japan, which were placed there by employees of the Indian outsourcing company Tata Consultancy Service, whose customers were affected financial institutions.

    Obviously, all these instances of unintended leaks could easily have been prevented by monitoring the data uploaded to GitHub. No one talks about a total ban on access to GitHub, this is a meaningless and even harmful idea (if there is a ban, but the service is needed, then the developers will bypass this ban). A solution is needed that prevents information leaks and has a real-time content analyzer that does not allow to upload only data to GitHub that should not be there for security reasons (for example, access keys to the Amazon cloud).


    I will show you how to solve this particular problem, using the example of DeviceLock DLP. Our baseline data are:


    • GitHub account,
    • AWS key,
    • DeviceLock DLP version 8.3.

    To begin with, we define that the AWS key is the protected data and its getting to GitHub must be prevented.



    Since the key is a set of bytes without any pronounced signatures (yes, I know about the text “BEGIN / END PRIVATE KEY” at the beginning and at the end, but this is a very weak signature and it is better not to rely on it), we will use identification on digital prints .



    Add a key file to the database of DeviceLock DLP digital fingerprints so that the product “knows” our key “in person” and can identify it unambiguously later (and not confuse, for example, with test keys that may well be uploaded to GitHub).



    Now let's create a content filtering rule for file storages in DeviceLock DLP (GitHub falls under our classification of “file storages”, within which, in addition to GitHub, more than 15 different file exchange and synchronization services are supported).



    According to this rule, any users are prohibited from loading data with digital fingerprints that match the ones specified above, and when detecting prohibited data, corresponding events (incident records) and shadow copies should be recorded in the central archive logs, in addition to the actual execution of the action with the prohibition of downloading data to GitHub .


    Let's now try to load the AWS key into the GitHub repository.



    As you can see, the “for some reason” download process failed, and DeviceLock DLP warned us that this operation was blocked by it (of course, the message is configurable and disableable).



    At the same time, if you look at the DeviceLock DLP shadow copy log, you can find that key there.



    Thus, in this example, it was shown how using DeviceLock DLP to solve the particular problem of preventing the leakage of any confidential data (digital prints can be taken from almost any files) to cloud storages.


    Of course, in addition to preventing data leakage to GitHub, you can still conduct periodic inventory repositories and identify information in them that should not be there. GitHubs, Git Secrets, Git Hound, Truffle Hog, and many other free utilities have been created for scanning GitHub repositories.


    Also popular now: