Classification of unstructured data - why is it needed?

    The bulk of the data stored by modern companies is unstructured, i.e. this is data created by company employees, not, say, a database or unloading an automatic service. At the same time, even with a perfectly tuned system of access rights to resources, it cannot be guaranteed that the content that we expect to see there really lies in a single folder. Passport and credit card numbers in the contractor folder? Elementary. Photos with no doubt a fascinating vacation in Goa in the folder of financial statements? Easy! New movie rentals in the catalog for employee training? Yes easily! Are you still surprised?


    Most of our customers are sure that “here we are” everything is fine with this. Those who doubt often do not even suspect the true scale of disasters. When after scanning the classifier you show a bunch of confidential documents in the succinctly named daddy "!!! for Vasya ”in the main ball, representatives of the IT security department start uncomfortably crawling in his chair. And if you find a document with senior management awards in the public domain ... Yes, yes, it happened.

    To identify and prevent such situations, data classification is necessary. It can be configured to work with metadata (name, type, size, file creation date, etc.) and with the content. First you need to create a series of rules, consisting of a set of filters, logical operations and regular expressions, as well as specify a work schedule - because we do not want the analysis to occur during the hours of maximum load on the server. To facilitate the task, most full-text analysis products already have a set of predefined templates, such as PCI DSS compliance, but in reality you still have to sit and think about filters that are most suitable for solving specific business problems.

    Among the standard rules that we usually customize for our clients, we can single out the search for passport data and credit cards, the definition of confidential data and data for official use, the identification of audio and video recordings, as well as startup files (software). Many are not limited to this and already independently add a search for SNILS, TIN numbers, financial statements with difficult conditions and much more.

    Okay, let's say we classified the data, what's next? Of course, you need to put everything in order in accordance with security policies: hide passport data and credit cards away from prying eyes, delete personal photos, delete films, upload them back to the Internet, and conduct an educational conversation with the creator of daddy for Vasya. For convenience, you can use the results of the relevant reports, which will clearly show what exactly and how often it appears in your files, and where these files are located.

    It sounds good, but it still will not solve the problem with relapses and new cases. To do this, it’s already worth setting up alerts in case of detection of new files that fall under the configured classification rules, so we will quickly find out about violation of policies without the need for periodic “cleanups”. Why do everything manually if it lends itself to automation? But unfortunately, administrators do not always respond quickly enough to messages sent, therefore, to minimize risks, these newly discovered files can be moved to quarantine automatically first, and only then conduct debriefing. Fast, convenient and safe.

    As a result, you can gain an understanding of the structure and full control over the dissemination of data within the organization, identify the perpetrators of violations of security policies and automatically take measures to minimize risks when new cases arise. We believe that data classification is too important an element of control over unstructured information so that it can simply be ignored, since without it it is simply impossible to be sure that the data is exactly where it should be.

    Also popular now: