DmitryMe March 26, 2012 at 15:03

Cloud - what is it and why?

Recently, we launched the ABBYY Cloud OCR SDK service , running on the Windows Azure cloud, and simultaneously gained 100,500 experience. For example, they learned that many people use the word “cloud” and have heard that “clouds are fashionable,” but very few understand what a cloud is and most importantly, why make a service in the cloud. The word "cloud" is used everywhere and, it seems, has begun to grow into urban legends.

Watch, for example, this video:

You won’t lose much if you just focus on the fact that the blonde looks good and she has a pleasant voice.

Let us consider in detail what a public cloud is, why it might make sense to use it for software operation, and is it true that “soon everything will be in the clouds”.

Unprecedented opportunities for your customers.

For starters, how does the “in the cloud” service differ for the client from the “not in the cloud” service.

It is believed that the "cloud" service has a unique property - accessibility for any users. The clouds have nothing to do with it. Our service works in the cloud, it looks like a regular website for the user (some of the requests even give out normal-looking web pages), for example, it has a user account that looks like regular web pages.

For comparison, look at Stack Exchange (best known for its Stack Overflow site) or Yandex.Mail - they look exactly the same to the user. They are also available to any users and from anywhere. There is also a web server that also accepts HTTP requests, it also doesn’t care what operating system the client has, what architecture his machine has, what language his programs are written in.

You can find claims that due to the cloudiness of the service "user data is accessible to them from anywhere." Yes, users of the service can upload images to our service from anywhere and get results from anywhere, too. By the way, users of Stack Exchange or Yandex.Mail can also work with these services from anywhere - ask questions, receive answers, send and receive letters.

Functionally cloud service is no different for the user. What's in the cloud, not in the cloud, on some IP address is a server (usually a web server) that accepts and processes requests. If there are no settings restricting access to the server from specific ranges of IP addresses and the client himself is not sitting at the paranoid firewall, then the service is accessible from anywhere and from any device. Cloudiness has no effect.

Cloud Services for Cloud Services

It is also believed that the service in the cloud is made so that other services in the cloud can interact with it - something from the series "for use by developers of cloud services", as the authors of a press release recently wrote. In particularly delusional presentations, you can find pictures with pegs pecked in a naively sketchy cloud - this is a cloud, there are services in it, and they interact there.

Let's look at it from the point of view of our service. The goal of developing our service is to provide a service programmatically accessible from anywhere in the world so that third-party developers who lack optical recognition of text in their programs can develop software that uses our service for recognition. For example, a program for a smartphone that photographs a check, extracts data from it and saves it into a budgeting program on the same smartphone. Captain Evidence suggests: the smartphone is not in the cloud. Our service is not only for “developers of cloud services”, it is for developers of any programs that are ready to use a third-party service for text recognition. In the cloud, those programs work or not - it does not matter, and our service just doesn’t care.

It is believed that a cloud service is a service for servicing numerous external requests. Usually yes, but not necessary. Nobody bothers you to start factoring prime numbers on your service, store the source data for it somewhere outside, so that the service takes them from there, and uploads the results to an external ftp server.

Cloud architecture of cloud services

Further, it is believed that a service running in the cloud is fundamentally different in structure, its development requires a fundamentally different architecture compared to a service that does not work in the cloud. Some differences do exist, but they are secondary.

Imagine that you need to make a web service that receives images from a user, puts them in a processing queue (because recognition takes some time), processes them, and after processing gives the user a link to download the result. How would you do it? Most likely, you would create a “task” in the internal storage (most likely the database) for each received image, give it a unique identifier, recognize the image in a separate stream or separate process, then on the next request “how are things with such and such a task” returned a link to the result. This is a completely obvious architecture for such a service, and cloudiness has nothing to do with it either.

It is believed that the cloud uses a "cloud operating system." Usually this is just a dubbed “regular operating system”. On Windows Azure, this is Windows Server 2008 R2 with slightly tightened nuts (for example, the temporary folder is very small). All “cloudiness” in such an environment is created by additional services - for example, a long-term data storage that is not tied to the machine on which the user service is running.

Some time ago we told that FineReader Engine now supports work in Windows Azure. This revision did not require a complete rewrite of the entire FRE, we just took into account the limitations of the platform, finalized a bit for them, tested, updated the documentation, and committed ourselves to continue to support. Painstaking and important work, but nothing more.

Unprecedented Reliability

It is also believed that the cloud service is certainly more reliable, because there is also a cloud cloud provider offering a lot of nines after the decimal point. There are nine separately, reliability separately.

First of all, you need to read the fine print in the agreement on nines (SLA - Service Level Agreement). It states exactly what these nines mean, what specific properties of the service they affect, what is the responsibility of the provider.

Usually, the provider’s liability is no more than the relatively small money that you paid him, and while your service does not work, your company can lose much more money and damage the reputation. Yes, the provider will answer, but this may not make you feel better.

A similar example from life: on average once a year in the building, the power supply is turned off for a second, so that computers restart. From the point of view of the electricity supplier, this is a miserable second per year (how many nine are there?), And from your point of view, this is the loss of several minutes of work by each employee, because he will need to wait until the OS boots up, all programs will start, then remember, what did he stop at. There are a lot of nines, but this is not easier for you.

The agreement can guarantee the availability of any specific services (for example, that the virtual machines that your software runs on will be connected and connected to the network) - a situation may arise when it fails for a long time, for example, a secondary service of managing these virtual machines - they will continue to work, and you won’t be able to launch new ones or reconfigure them. You just needed to increase the service throughput by a hundred times in order to take on the peak load from a very important and generously paid advertising campaign that has just begun. The provider did not even violate the agreement, because the agreement does not say anything about this secondary-looking service.

From hosting in the cloud, the service does not become guaranteed more or less reliable. No one cancels risks, just risks become different.

So what is it?

Now that obscurantism has become smaller, let us return to the question of what a public cloud is. This is a remote-controlled service that provides you computing power and pay-as-you-go data warehouses. You use capacities to operate your software (your service), and storage - to store data with which this software (your service) works.

You may have different levels of control over the facilities provided. For example, you can be allocated a virtual machine with a specific OS and assign it to you and give you remote access to it, so that you yourself configure it as you need and continue to leave it at your disposal. Or (as in Windows Azure) you can download a special archive with the executable code of your service and a configuration file that says “run this on 5 machines with 2 cores each”, the cloud service infrastructure itself will find suitable virtual machines, deploy, launch and configure the OS on them, then deploy your archive there and transfer control to the entry point (fixed function of type main ()), and will monitor if something has broken, in which case it will restart your service on the same one or (if the machine crashes) on another car. In the first case, you have more control

What is the profit?

Profit in flexibility and delegation of duties. Need to increase the number of machines your service runs on? A few clicks of the mouse, waiting around 10 minutes - and you have already found new virtual machines, launched your service on them. Need to turn it down? Same.

Same thing with storage. We need a repository - a few clicks of the mouse, and they provided it to you and gave the address and access keys to it. The storage is usually rubber, payment depends on the actual volume used.

A provider can, for example, provide a database server - also “somewhere” and also with payment for the volume used. On Windows Azure, this is SQL Azure, based on specially configured and dubbed SQL Server 2008.

Need to try a new feature and is there a risk of breaking the service? You can do so. Create another repository and another database. Configure your service on a new storage and a new base, deploy it on additionally allocated virtual machines. We tried, freed the machines, if there is a lot of data in the storage and database, you can also delete them so as not to pay for them.

At the end, our automatic assembly deploys our service directly to the cloud on a virtual machine specially allocated for this and performs tests there. At each assembly, the machine is allocated anew, it is freed after assembly, so on weekends and at night, when there are no code changes, we do not pay for it. The code is tested in exactly the same environment in which it will then work.

Such flexibility is very convenient. This is the bright side of the cloud, for which it is primarily valuable. It’s necessary - you rent it, don’t do it — you stop renting, and both require a few clicks (or a program request) and not a very long wait.

It is convenient for a company of any size. There is no need to purchase each piece of iron through bookkeeping, no need to purchase equipment in reserve, you can achieve much less downtime and much more flexibility in management.

Plus, you shift part of the responsibilities to the provider. You don’t buy servers anymore, don’t assemble racks, don’t do electrical connections, you don’t need a place for equipment, you can even not configure the OS (it depends on the cloud). Please note, we are talking about shifting responsibilities, but not responsibility, more on this below.

As usual, there is a dark side

The dark side of the cloud is that many things cannot be influenced. If you believe the Stack Exchange team’s blog , their service does not work in the cloud, but on their own equipment, precisely because they are not satisfied with the level of control provided by cloud providers.

For example, virtual machines are standard and you may not even know the characteristics of real hardware. Most likely, when you deploy the service on one single-core node in Windows Azure, you are actually given a virtual machine that runs on some 16-core server under HyperV. Maybe you can tweak something there and out of the blue get a 15 percent increase in productivity, but you can not do anything about it.

If you are paranoid or bound by strict requirements of the law or contract, you may not be happy that you generally have very little control over the iron. For example, you uploaded documents with trade secrets there, they were copied to a bunch of hard drives, you can not affect their guaranteed removal. Yes, the provider promises you, but you cannot verify this.

The same goes for reliability. You can not be sure that the racks at one point, for example, will not flood with condensate from the torn off tube of the air conditioning system. If your server was in the office or in colocation, then you could do something, even if it seems crazy, such as draining water from the space above your equipment. You won’t be able to do anything here - you don’t control where the equipment is, whether it is well fixed there and whether mice run around it. All the crazy events that you could foresee (or not foresee and feel remorse about a poorly done work) is now completely out of your control.

Crazy events are very different. Here are examples of real failures in data centers.

FAIL. The car crashed into a power line support near the data center, high voltage wires broke off and fell to the ground in front of the substation that feeds the data center. The transition to backup power has begun. From the wires lying on the ground, the current flowed to the ground, in the data center, the protective circuits reacted to the leakage of current to the ground and turned off the entire data center.

Another FAIL . Presumably, due to a lightning strike, the transformer supplying the data center failed, the transition to backup power has begun. For some reason, the generators could not be synchronized (most likely, there was no power on the equipment performing the synchronization), the data center could not switch to backup power, all the equipment was turned off.

Please note that we are aware of these cases because they have affected hundreds and thousands of cloud users. How many similar events happen to servers standing in offices, we simply do not know.

Of course, something similar could happen with the servers in the office, but in this case there will be a fraction of your fault - they could have foreseen, not foreseen. You will be ashamed of a job poorly done. In the case when the equipment is "somewhere out there", there are no such opportunities, you are forced to believe the provider.

This is not bad, you just need to clearly understand this. By hosting a service in the cloud, you transfer a significant part of the responsibilities to the provider, but not the responsibility for the viability of your service. Cloud does not automatically mean more reliable and does not automatically mean less reliable. You still need a risk assessment, for critical services you need duplication in different data centers and redistribution of load. It may very well happen that when you take into account all the costs of duplication and synchronization of data between data centers, the price tag will upset you.

Cloud architecture of cloud services again

Finally - about the special requirements for cloud services. There are such requirements - you need to be prepared that anything can break at any moment. If you like extremes, you can, like Netflix, create a service that breaks something in your service at random moments . Especially you need to be prepared for episodic short-term failures. For example, sometimes the connection with SQL Azure will disappear briefly - your code should not panic or break, but wait a bit and try again.

Just remember what usually annoys users in programs - all kinds of “could not find a server, here are 18 points to check” in a distributed system are absolutely normal, your service should try to deal with it yourself, then try a few more times. The user after the browser message “no server response” usually presses F5, and your service should just try to repeat the action. For this, it is important that the repeated execution of any action does no harm - this is called the smart word idempotency. If you do not take this feature into account, then your service will fail at the most inopportune moment due to some nonsense.

Similarly, the service should be prepared for the fact that it can be stopped at any time - on all nodes or on some - and then started again, without data corruption, the loss of the latest data should be minimal, after restarting the service should be able to continue working as if nothing had happened. This happens, for example, when automatically installing software updates in Windows Azure - the nodes stop in turn, then the service starts on the node with the already updated software.

The requirements are substantial, but feasible, just Murphy will often come to your service. It depends on you whether the small FAIL will turn into an epic failure.

A cloud is not a bunch of words “scalable”, “accessibility”, “migration”, “productivity”, “trend”, used in a random order in a marketing text. This is just a model of computing power ownership. In certain cases, this model is very convenient.

By the way, we have a service for developers working in the cloud.

Dmitry Meshcheryakov,
product department for developers

Tags: