1cloud December 21, 2017 at 14:37

How data warehouses work: a beginner's review

The international market for hyper-scalable data centers is growing at an annual rate of 11%. The main "drivers" - enterprises, connected devices and users - they ensure the constant emergence of new data. Along with market size, requirements for storage reliability and data availability are growing.

A key factor affecting both criteria is storage systems. Their classification is not limited to equipment types or brands. In this article, we will consider the types of storage - block, file and object - and determine for what purpose each of them is suitable. / Flickr / jason baker / cc

Types of storages and their differences

Block-level storage is at the heart of a traditional hard drive or tape. Files are divided into “pieces” of the same size, each with its own address, but without metadata. An example is the situation when the HDD driver writes and reads blocks to addresses on a formatted disk. Such storage systems are used by many applications, for example, most relational DBMSs, in the list of which are Oracle, DB2, etc. In networks, access to block hosts is organized through the SAN using Fiber Channel, iSCSI or AoE protocols.

A file system is an intermediate between block storage and application I / O. The most common example of file type storage is NAS. Here, the data is storedas files and folders collected in a hierarchical structure, and are accessible through client interfaces by name, directory name, etc.

/ Wikimedia / Mennis / CC

At the same time, it should be noted that the separation “SAN is only network drives and NAS is a network file system” is artificial. When iSCSI appeared, the boundary between them began to blur. For example, in the early 2000s, NetApp began to provide iSCSI on its NAS, and EMC began to “put” NAS gateways on SAN arrays. This was done to improve the usability of the systems.

As for object storages, they differ from file and block storages in the absence of a file system. The file storage tree structure is replaced by a flat address space. No hierarchy — just objects with unique identifiers that allow the user or client to retrieve data.

Mark Goros, CEO and co-founder of Carnigo, compares this way of organizing with a car parking service. You just leave your car to a valet who takes it to a parking place. When you come to pick up the transport, you just show the ticket - you get the car back. You do not know what parking place he was standing on.

Most object stores allow you to attach metadata to objects and aggregate them into containers. Thus, each object in the system consists of three elements: data, metadata and a unique identifier - an assigned address. At the same time, object storage, unlike block storage, does not limitmetadata with file attributes - here you can configure them.

/ 1cloud

Applicability of different types of storage systems

Block storage

Block storage has a set of tools that provide increased performance: the host bus adapter offloads the processor and frees its resources for other tasks. Therefore, block storage systems are often used for virtualization. Also good for working with databases.

The disadvantages of block storage are the high cost and complexity of management. Another drawback of block storage (which applies to file storage, which will be discussed later) is the limited amount of metadata. Any additional information must be processed at the application and database level.

File storages

Among the advantages of file storage distinguish simplicity. The file is given a name, it receives metadata, and then "finds" a place in directories and subdirectories. File storages are usually cheaper compared to block systems, and the hierarchical topology is convenient when processing small amounts of data. Therefore, with their help, file sharing systems and local archiving systems are organized.

Perhaps the main drawback of file storage is its "limitedness." Difficulties arise as a large amount of data accumulates - it becomes difficult to find the necessary information in the heap of folders and attachments. For this reason, file systems are not used in data centers where speed is important.

Object Repositories

As for object storages, they are well scalable, therefore they are able to work with petabytes of information. According to statistics, the volume of unstructured data worldwide will reach 44 zettabytes by 2020 - this is 10 times more than in 2013. Object stores, due to their ability to work with growing volumes of data, have become the standard for most of the most popular services in the cloud: From Facebook to DropBox.

Storage sites like Haystack Facebook replenish 350 million photos daily and store 240 billion media files. The total amount of this data is estimated at 357 petabytes.

Storing copies of data is another feature that object storages do well. According to research , 70% of the information is in the archive and rarely changes. For example, such information may be system backups needed for disaster recovery.

But it’s not enough just to store unstructured data, sometimes they need to be interpreted and organized. File systems have limitations in this regard: the management of metadata, hierarchy, backup - all this becomes an obstacle. Object storages are equipped with internal mechanisms for checking the correctness of files and other functions that ensure data availability.

Flat address space is also an advantage of object storage - data located on a local or cloud server is retrieved equally easily. Therefore, such repositories are often used to work with Big Data and media . For example, they are used by Netflix and Spotify. By the way, the capabilities of object storage are now available in the 1cloud service .

Thanks to built-in data protection tools using object storage, you can create a reliable geographically distributed backup center. Its API is HTTP-based, so it can be accessed, for example, through a browser or cURL. To send a file to the object store from a browser, you can specify the following:

After sending, the necessary metadata is added to the file. There is a request for this:

curl -i [url_storage/account/container/object] -X POST 
-H "X-Auth-Token: [token]" -H "X-Object-Meta-ValueA: [value-a]"

The rich meta-information of the objects will optimize the storage process and minimize its costs. These advantages - scalability, extensibility of metadata, high speed of access to information - make object storage systems the best choice for cloud applications.

However, it is important to remember that for some operations, for example, working with transactional workloads, the effectiveness of the solution is inferior to block storage. And its integration may require changes in application logic and workflows.

PS A few more materials about storing data from 1cloud blog:

Tags: