Practical tips for choosing a cloud provider

    Choosing a cloud provider is a difficult task. In this post I will tell you how to approach her, what to look for first of all, where the catch may be hidden, and how to build communication with the provider in general. Below is a description of the most complex and complex scenario of the development of events, the transfer of the entire IT infrastructure to the cloud. Let's look at moving a critical part of IT infrastructure to the cloud, the inaccessibility of which even for several hours can cause significant damage to the company’s business.

    Memo


    How to weed out hosting providers
    1. Is server virtualization used in principle?
    2. Are storage virtualization or network virtualization used? These are optional requirements, but they indicate the technological level of the cloud provider.
    3. How to manage the services? Is there a self-service portal? Is it possible to start new servers yourself, to control the performance of already running ones? Can I add drives, configure internal addressing, and manage routing? Can I set up a backup schedule myself and run data recovery tasks? Etc.
    4. How are resources taken into account? Is there automated billing (per-second-hourly)? Or is everything counted by hands?


    Playground
    1. Where is the data center located: abroad or in the Russian Federation? How far is your office and second data center, if any? Delays?
    2. Who owns the data center? Can I come in to see?
    3. Is he certified? What were the accidents on this site before?
    4. Which communication providers are present on the site?
    5. How can I connect to the "cloud"?


    Cloud Services
    1. What is vCPU (virtual core)? What does it equal: the whole physical core of the processor or, for example, its quarter?
    2. What disk resources are used? Local or SAN Connected?
    3. How are channels to the Internet reserved?
    4. What if the standard functionality of the "cloud" is not enough? Is it possible, for example, to connect specialized network equipment or non-x64 architecture machines to the “cloud” and so on?
    5. Is hybrid mode available? How is integration done in this case?
    6. Is there a backup service?
    7. How are IS tools available in the database, which ones need to be ordered separately?
    8. If you need to build HA (high availability) or DR (disaster recovery) solutions, is it possible to separate the parts of the hosted IT service between two data centers? Does the provider have a second cloud for building such solutions?


    Support
    1. Does support 24/7 respond quickly and to the point, rather than “we'll figure it out later”?
    2. Language - Russian and English?
    3. How far can you go beyond SLA if you really need to? (As a rule, in the West - not a step to the side).
    4. Do I need to contact support for monitoring resources and balance, or is all the data available through the self-service portal?
    5. Is there a demo mode? How different is it from the “combat” one, and how exactly?


    Remark: many questions will not be of particular practical use as part of the transfer to the "cloud" of the website - not that scale. Although the sites, of course, are different.

    1. Site selection



    The choice of a cloud provider begins with the physical site of the data center. If the data center of the provider "becomes", then the "cloud" is also turned off, which means that all IT systems working in it will become inaccessible. Many ask the provider about internal means to ensure accessibility of the cloud platform. This is correct, but not enough.

    Find out where the data center is located
    In Russia or abroad? This is important because some type of data under the law cannot be taken outside the Russian Federation. The reverse is also true: some companies are trying to hide part of the data in the western data center in order to at least partially protect themselves from checks. In addition, if you need to transfer the entire IT infrastructure to the “cloud”, the issue of delays is relevant. Let's say you decide to place part of your systems in a public “cloud” somewhere in Ireland. Are you sure that the existing delays on communication channels will allow you to work comfortably with systems? This moment stops many, and the choice is made in favor of local data centers.
    Our example: at CROC, all 3 of its own data centers are located in Moscow and are united by a single optical ring.


    Who owns the data center?
    It is also worth paying attention to whether the data center is owned by the provider or the provider rents it from his partner. There is nothing wrong with a rental site. It is possible that the partner company employs highly qualified specialists who are able to act quickly and smoothly in any situation. But still, when renting a site, the path of your application from the moment you send it to the start of work as an engineer of the company owner of the data center takes a little longer. In this case, the cloud provider acts as a transfer link, and the work scheme is less flexible and operational.

    When choosing a site, you should also pay attention to the opportunity to visit the data center. If the data center is owned and the provider has nothing to hide, you will be happy to be invited on a tour. If an excursion is not possible, then this raises some doubts. Failure means that the site is rented and the cloud provider could not agree with the owner of the data center about the tour, or that something was wrong on the site. Of course, there may be other reasons, but doubts about the reliability of the data center still arise.
    All CROC data centers are owned. Excursions to the data center are conducted on a periodic basis: both group and individual.


    Is the site certified?
    The data center certification market leader is Uptime institute. This company maintains an up-to-date list of data center component reliability requirements. Its requirements and recommendations are based on practical experience in operating data centers around the world, taking into account real data center failures.

    Uptime certification consists of two stages: certification of the project on paper and certification of the finished site. Site certification takes up to three weeks. Uptime specialists come and personally check the compliance of all technical solutions on the site with design solutions on paper. As a result, a certificate of conformity is issued. Currently in Russia there are 5 certified data centers, one of them belongs to us.

    There is an alternative to certification by the Uptime institute - verification of compliance with the TIA-942 standard. This is an American standard that carries recommendations for creating data centers. The disadvantage of this standard is that it has not been updated for a long time and lags behind Uptime in terms of a number of requirements. Also a big minus is that this standard is advisory in nature and data centers are not checked for compliance with it, at least in Russia. You have to believe the honest word of your cloud provider.

    In general, the issue of data center certification is a source of perpetual debate. Many say that certification is useless, that they still do not believe the certification authority (Uptime institute), as they do not believe the provider of data center services. Many, on the contrary, trust only the practice of working with external auditors.

    If you approach this issue constructively and soberly look at things, then, ceteris paribus, there is more confidence in a certified data center. In the event of an accident at such an object, the reputation of not only the provider, but also the external auditor will suffer - and the reputation of Uptime is expensive. On an uncertified site there are a lot of nuances that are very difficult to verify if you are not a specialist. The certified site is verified by external specialists and contains significantly less controversial technical solutions.
    Let me give you an example of a seemingly small detail that can distinguish a certified data center from an uncertified one. She is from the category of things that the customer does not even come to check. There is an air conditioning system in the data center. It consists of an indoor unit (fan coil) located in the data center and an external cooling unit (chiller and cooling tower) installed on the roof. The air conditioning system is reserved according to the N + 1 scheme. The failure of any air conditioning unit does not stop the data center. The problem lies in the fact that in order to replace a failed air conditioning unit, it is necessary to shut off the supply of coolant. And if there is only one supply and is not reserved, then all air conditioners will turn off, which means that the data center will stop. This is where the “cloud” with your systems “rises”.

    Here's another example: a few years ago there was a hurricane in Moscow. A metal sheet was blown from the roof of a neighboring building by the wind and dropped onto the roof of the data center. Liszt interrupted the coolant supply to the outdoor air conditioning unit on the roof of the building. Who could have imagined such a scenario? Who would have thought that something could happen with a cooling system located on the roof and enclosed by a fence? As a result, coolant leaked, the data center stopped, all customer systems were turned off.

    If the site were certified and built in accordance with the Uptime institute TIER III standard, it would be possible to switch to the backup coolant supply and isolate the damaged section of the pipeline. Therefore, if you choose a cloud provider to transfer serious tasks, you have to pay attention to many aspects, down to the data center level. Because cloud services are the “nesting doll”, and the data center is its core.

    Someone may notice that, they say, it does not matter what kind of data center provider is there, we have a clear SLA, we are working within it. And we do not want to drive and watch the site, we need a "cloud". But few people think that if the data center, and with it all your systems, "stand up", then the fines will be of last interest to your management. And first of all, there will be a scandal due to the fact that the company’s work has risen, and something needs to be done urgently.

    CROC currently owns three sites:
    • Volochaevskaya 1 (70 racks + cloud 1) TIER 3 TIA-942
    • Volochaevskaya 2 (110 racks) TIER 3 TIA-942
    • Compressor (800 racks + cloud 2) TIER 3 Uptime institute

    Read more about the principles of certification here . Check site certification by UI here .

    Communication with the outside world
    The cloud provider should always be asked how you can connect to the data center and the "cloud", in particular, from the outside. What are the default Internet access options? And is it possible to connect to the “cloud” using point-to-point channels?

    If you can connect to your “cloud” with your own channels, you should definitely ask which communication providers are present in the data center, since communication services are monopolized at some sites. For example, there is only provider X and that’s all - you cannot bring your providers.
    There are currently 13 communication providers in the CROC data center network, and we are ready to accept providers that are convenient for you.


    2. Cloud platform



    CPU resources - why pay money?
    We figured out what to look for when choosing a physical site. Now let's move on to the list of questions that should be asked when choosing the cloud platform itself. Let's start with the principles of allocation and sale of computing resources, namely, processor capacity. There are several ways to sell processor capacities on the market:
    • Selling "shared" resources. Everything that is free now can be occupied by your servers.
    • Guaranteed resource allocation.

    The first option, as a rule, is offered in a beautiful marketing wrapper. You can choose the lower and upper boundaries of the allocation of computing resources. The lower boundary will be allocated with guarantee, and the resources between the lower and upper boundaries will be allocated on demand and paid upon use. The approach is beautiful. But there is a nuance. When you urgently need all the resources available to your virtual server, it is far from the fact that your neighbors on the physical server do not use them and that they can be allocated.
    The second option of providing resources is less flexible in terms of payment, but more stable in terms of allocation of resources and the operation of your systems.
    Using this or that approach has a number of advantages and disadvantages that can be advantageously used in a particular situation. Both approaches have the right to life.
    CROC, in particular, provides guaranteed resources.


    What is vCPU?
    Cloud providers measure the processing power of their servers in vCPUs. Let's see what it is. Here are the possible options for calculating the power of vCPU that I came across:
    • 0.4 - 1 GHz frequency of the physical core of the processor
    • 1/3 or 1/4 of the physical core of the processor
    • 1 physical processor core

    This can still be sorted out somehow by asking the provider for the calculation methodology. But there are other pitfalls. The power of processor cores has increased 3.5 times since 2007. This can be seen by the available types of virtual servers on Amazon. In 2007, Amazon began providing cloud services, and equipment was purchased for this. Then Intel Celeron processors were used. Their performance was measured and taken as a reference. The standard was named ECU (Elastic compute unit). Now in Amazon you can order virtual servers, the power of the physical cores of which is equal to 3.5 ECU. From this we can conclude that the growth of processor cores over the past 6-7 years by 3.5 times.

    And now we take into account that the cloud provider under vCPU may imply not the physical core, but parts of it, but it can also use the old hardware. This means that vCPU for different providers can vary by 20-30 times. One should always ask what is vCPU, how does vCPU relate to physical cores, and which processors are generally used.
    CROC, in particular, is attached to the methodology for measuring the power of Amazon processors. The power of our vCPU is 3.23 ECU and corresponds to the power of the physical core of the Intel Xeon x5650 processor 2.6 GHz.


    Disk resources
    When choosing a cloud provider, you should pay attention to the disk resources that are provided to virtual machines. First of all, it is worth asking how the data warehouse physically looks:
    • Local drives on board servers
    • Direct-attached shelf with disks
    • SAN connected storage

    The first and second options are fraught with data loss or their long inaccessibility when the server fails. Amazon, in particular, provides customers with virtual machines disk space on local drives as a bonus. Server failure is fraught with the loss of all data. For an additional fee, it is possible to rent additional disk space on an external storage system (EBS).

    CROC at the stage of formation of the architecture of the cloud platform refused to use local server disks for storing virtual machine data. All server drives are stored on SAN-connected storage systems. Failure of a physical server leads to automatic restart of virtual servers on the surviving part of the "cloud".


    The second key point when considering disk resources is a guaranteed SLA in terms of IOPS, speed of reading or writing data. Does your provider guarantee a certain storage performance? CROC cloud platform disks are provided without a guaranteed SLA. The company makes every effort to timely scale existing storage systems and add new ones; it is constantly monitoring performance. If the customer needs guaranteed IOPS near the cloud, you can always place a physical storage system in the data center that will satisfy these requirements. Our large customers in practice do just that. Fortunately, there are 3 own sites ready to accept physical equipment.

    How to connect to the "cloud"?
    Cloud providers by default provide services for accessing virtual servers on the Internet. And they do it, of course, in different ways.
    First of all, you need to ask if this connection is reserved? Are different providers used? How is switching in case of failure of one of their communication channels?
    CROC, in particular, provides for access from the Internet two 1Gbit communication channels from different providers operating in active-passive mode with automatic switching between them at the level of an autonomous system.


    The second important question is whether the provider is able to shape the bandwidth of communication channels to the Internet, that is, to guarantee a certain bandwidth.
    CROC does not provide this service, but constantly monitors the utilization of channels and timely tries to expand the capacity of communication channels.


    All our large customers work with telecom providers convenient for them, organize point-to-point connections. This is a safer way to connect than connecting over the Internet.

    What if the standard functionality of the "cloud" is not enough?
    A cloud platform is not a panacea for all problems. It is impossible to solve absolutely all IT tasks. Here are examples where the cloud platform will not help you:
    • Available types of virtual servers do not suit you. Need a machine with 40 physical cores.
    • You have a heavy database server running on AIX or HP-UX. It needs to be transferred to the "cloud". And these are servers not of x64 architecture, but SPARC and EPIC. In the "cloud" does not work to host this server in a virtual machine.
    • Provider does not guarantee IOPS on storage. Need dedicated storage.
    • Requires the use of GOST encryption for data transfer. It is necessary to install special crypto-gateways.
    • It is required to connect dedicated communication channels and control physical equipment by switching between communication channels.

    Examples can be listed for a long time. The fact remains that someday you will outgrow the built-in functionality of the cloud platform, if you have not already done so. What will you do when you hit the ceiling? In this case, the provider must be asked if it is possible to connect additional physical equipment to the “cloud” and place it in the data center.
    CROC provides such services. Moreover, most of our customers rent physical equipment from us and consume it in the cloud. The most commonly used rental network routers for connecting dedicated communication channels.


    Integration with the cloud. Hybrid mode of work.
    Most likely, when you think about moving to the “cloud”, you will not do it at one moment. There will be a long transition process when you live both on the local site and in the “cloud”. In some companies, this process may drag on for a year, and may never be completed.
    In this case, it is important for the provider to ask about the mechanisms for integrating your local IT infrastructure with the cloud.
    If we consider the use of a public “cloud”, then by default the management of network settings on the side of the “cloud” is practically absent:
    • You cannot manage internal addressing. If they could, a collapse would happen.
    • You cannot control routing.
    • If we consider the majority of Russian providers, then the networks inside the "cloud" are built on VLANs. Typically, 1 customer = 1 VLAN. All cars live on the same network. Not everyone can create additional VLANs.

    Managing networks within the cloud is far from the level of flexibility and convenience that is available on-site. It’s worth forgetting about network integration tools. On the strength there is the possibility of building a VPN tunnel between sites.
    Fortunately, there are technical solutions for implementing the same convenient mechanism for working with networks in the "cloud" as on the local site.

    CROC uses network virtualization software , which was described in a previous post . This software allows the customer to independently manage through the self-service portal:
    • The internal addressing of cloud networks
    • Create the necessary number of additional networks
    • Control access between them through the firewall
    • Configure static or dynamic routing
    • Upload configurations for local physical network equipment to configure VPN

    Thus, this functionality allows you to configure internal addressing in the "cloud" in a way convenient for you, up to building horizontal L2 networks between sites. You can configure your own VPN and prescribe routes. In fact, you can manage your network settings in the “cloud” just as you would on your site. “Cloud” actually becomes a logical continuation of your local infrastructure at the customer’s site. IT systems can work perfectly transparently with each other, despite the fact that they will be located in physically different places.

    If you intend to tightly integrate parts of systems delivered to the “cloud” with local systems, you will definitely need to ask your provider about network management capabilities.

    Data backup
    Data backup is one of the basic services of the cloud platform. But many forget about him. First you need to ask the provider if such a service is available in principle. If you still have, then you can ask additional questions. Used a commercial product or Opensource? If free software is used, then it is immediately worthwhile to lay down the risks that the provider, and not the vendor, is engaged in supporting this solution.

    Hence the second risk. The vendor completes its backup software with all the necessary agents for a consistent backup of application software. It is unlikely that you will be able to backup SAP using the Opensource backup solution. You should always ask your provider for a compatibility list for this backup solution.

    It is important to check with the provider on what copies are stored, on disks or on tape. A tape is a cheaper medium, but the risk of not recovering is much higher. Moreover, work with tape is slower than with disk media. If the information is stored on disks, and as an additional service you need to periodically rewrite copies to seized media, then the provider needs to inquire about this possibility.

    And finally, the important point is the type of service. How can it be controlled? Is there a self-service portal? Can I manage the backup schedule myself, backup and restore data without involving a provider?
    CROC built its backup solution based on the EMC Avamar hardware and software system . Information is stored on disks in a deduplicated form. Service management is carried out through a fully functional self-service portal.


    Information Security
    This question is the most frequently discussed and most scary for cloud providers. So what should I pay attention to? In addition to questions about accessibility tools (redundancy of equipment and communication channels, separate data storage, RAID, backups), you should ask about the composition of the built-in access control tools, as well as about the list of additional information security services.

    As regards the built-in tools, it is worth asking about the password protection of Windows machines and the protection of Linux machines using keys. You have created a new machine, how is access to it protected? Does it immediately become available to the entire Internet at an external IP address with a standard password?
    It is also worth asking about network access control. Is it possible to control the built-in Firewall, if at all, of course, it is?

    As regards additional services, one needs to ask the provider the possibility of purchasing services for protection against (D) DoS, as well as systems for detecting and preventing intrusions (IDS, IPS) and renting antivirus and antispam.

    Building disaster recovery or high availability solutions
    There is some chance that your tasks will become so critical that you will need to reserve them between the data center , cost clusters, and configure replication between storage systems. Is your provider ready for this turn of events? Or will you have to attract a second provider?

    Attracting a second provider is fraught with erosion of responsibility and work with two technical support services, possibly within different SLAs. This can adversely affect the performance of your IT infrastructure.
    CROC currently has 2 own cloud platforms located in different data centers (Volochaevskaya-1 and Compressor). Both platforms are managed through a single self-service portal. This allows you to build highly available distributed solutions. Well, or at least to store backups on a remote site relative to the main data.

    Moreover, there is no need to sign an additional agreement. For the customer, the process of working with two cloud platforms is completely transparent.

    Conclusion


    Even asking the provider-applicant the whole list of questions and getting answers to them, you should not stop there. You need to ask for demo access to make sure in practice everything you hear. In the end, test the performance of servers, storage and networks on their tasks. Testing will be the most striking indicator of whether a particular provider is right for you or not. He warrants for his words, do his promises correspond to reality?

    But this is not all. Unfortunately, many features of working with a cloud platform cannot be felt at the testing stage. Some surprises will inevitably come up during the long-term operation of the service. These surprises may turn out to be both non-critical and very noticeable, after which the possibility of further work with this provider will be called into question.

    At the end of communication with the provider, you should always discuss the issue of moving to another service provider. You need to find out the possibilities for data collection, for converting virtual machines to the desired format, etc.

    Also popular now: