Introducing the Content Delivery Network

    Content: what is a CDN? History of occurrence. Why is it needed? Who needs it and who doesn’t? Entry threshold, cost, costs. Key technologies.

    CDN stands for content delivery network, or “content delivery network.” Most often, this is a lot of servers with specialized software that accelerate the delivery ("return") of content to the end user. Servers are located all over the world so that the response time to site visitors is minimal. “Content” most often means video and static elements of websites (which do not require code on the server or database queries, such as css / js), but “unexpected” things also apply to “content” - for example, games in Steam (uses CDN to render games), updates for operating systems, etc.



    A bit of history

    The sharp growth of the Internet in the mid-90s led to the situation that the servers of those years could not stand the load alone (how much can a powerful Pentium Pro-based dual-processor server at a frequency of 266 MHz with 128 megabytes of memory give?). The server performance limit and the need for more and more productivity gave rise to the now forgotten words: “server farm”, “hierarchical caching” ... Aishnoe newspeak is surprisingly sensitive to age - and words like “servers farm” or “information superhighway” are now associated with warm lamp CRT monitors, not progress. During the development and implementation of various solutions, one important feature was noticed: there are two types of content - static and dynamic.

    Dynamic contentit is formed by the server at the time the server receives the request, most often with the active participation of the database. If the inscription “page was generated in 0.333 seconds” on the bottom page is just an example of dynamic content.

    Static content on the server is ready-made - no matter who sends the request, the server will return the same thing (adjusted for possible ACLs). It is important that the content does not change from request to request.
    Static and dynamic content create a different type of server load. When “dynamics” is heard, the processor, IO (for the database) and some memory are important. When static is heard, the processor is almost not important, IO is important only for those files that are not cached, and the main requirement is network speed. It is possible to force static distribution by servers that distribute dynamics, but this is a combination of roles that interferes with each other. It is especially difficult at the moment when the IO from the static begins to interfere with the IO from the dynamics, and the load on the IRQ makes it difficult to execute dynamic scripts.

    An even more important detail is that “dynamic” usually means the presence of a “state” (session and related data), but static is not. Statics can be scaled horizontally without complex two-way synchronization with a central server. In the case of dynamics, this does not work out - you need either a common database, or synchronization and lock methods.

    Medium and large companies began distributing statics and dynamics from different servers located in different places of the planet, reducing the load on sites with dynamics by moving statics from them to easily scalable servers. Then it was easy to take a step to the “outsourcing” of distributing statics, and companies began to appear that made distribution of statics the basis (or at least a large component) of their business.

    The main thing


    Note that CDN solves an even more important problem than making life easier for application servers.
    All modern CDNs place copies of content on different servers around the world and direct the client to the server closest to the client. The result is a reduction in latency, that is, delays between request and response. If there are a lot of images on the page (even small pictures), then the faster they are at the client, the faster the client will see the page. And if we remove from consideration sufferers on dialup / gprs, then the time for which the page will be displayed is determined almost exclusively by network delay. If we talk about distances of hundreds of kilometers (~ 10ms delay), this is not significant. But if we are talking about distances to continents - then a delay of hundreds of milliseconds (up to 500-600!) Is already beginning to play a radical role. And if the content is delivered from a server a few kilometers from the user, a miracle happens! Australia sees data from a site from the United States in units of milliseconds, China from a site from Russia, France from a site from Brazil. Without the participation of ocean cables.

    This also works on a smaller scale: For example, Yandex with the help of CDN at one time significantly accelerated the work of mail in the regions of Russia, which are used to stomp and stomp optics to Moscow.

    Accelerated content delivery has become the main killer feature of CDN, and everything else (load reduction, load balancing, etc.) has become secondary. Important, but not critical. In the end, any load can be inundated with money. But no money can be made so that without local points of presence, the signal from Perm reached tens of milliseconds to San Francisco.

    While saving is not a killer feature, it is also important. CDN in some situations can significantly save on traffic. Transferring files to another continent once, keeping them there on a local server and sharing them through local links is cheaper than chasing the same traffic ten thousand times through the trans-Atlantic. Most often, they start thinking about saving at the moment when it becomes critical (video hosting in the first place).

    However, servers around the world, a system for synchronizing content and directing clients to nearby servers, etc. - all this is not free. Most often, CDNs ask for extra money compared to regular uplink traffic, although for some regions it may turn out that CDN traffic is more profitable than uplink traffic (but this, rather, indicates that the Internet in the region is not so hot).

    How does it work in practice?


    From the side of the site visitor: he goes to the example.com site, where he is given the html page. In this html page, all css, js, pictures and videos - point to cdn.example.com - content is loaded from there. When the client’s browser accesses this address, thanks to the magic of BGP, its request is sent to the nearest presence node. The very magic of BGP is that a visitor’s provider is sent to the IP network where cdn.example.com is located, several announcements from different networks (in which there is a point of presence), and the provider’s router selects the closest one from them. As a result, the request goes to the nearest server that responds to it, and the answer goes the same way, also along a short route.

    From the side of the site owner there are two options:
    1. Static files are uploaded to the object store using ftp, scp or another convenient method. An dns name is assigned to the object storage (in the control panel) (either own or issued by the provider - depends on the technology), which is indicated in the html page.
    2. The site owner specifies the 'origin' for the domain, after which, upon the client’s request, the CDN goes to the site to which cdn is connected and downloads files to itself and gives them to the user's browser.

    Magically, the data is available to the client much faster than the main html page.

    By the way, it can also be static. For example, the pages on github.io work by this principle - this is a pure CDN, everything in it is distributed statically.

    Who needs a CDN?


    For those who need to give static quickly to many visitors who are far from the company's servers (the situation is even more acute for companies whose visitors are scattered over a large territory, that is, even moving servers “closer” does not make sense - the majority will still be “far” )

    For those who have a very large volume of files — and the cost of CDN traffic is lower than the cost of traffic going to uplinks (large sites usually cost different money — local cheaper, “global” more expensive).

    With a certain band, the removal of statics on a CDN is more profitable than an upgrade of network equipment. Usually, statics occupies a significant part of the band, and instead of upgrading from 1G to 10G, or from 10G to 40G, it is much cheaper to throw 80% of the traffic on the CDN and stay on reasonably priced servers.

    Differences


    If everything is clear with the CDN, what about their suppliers? There are many companies, they differ in price, services and quality.
    Here are the main factors that you need to determine for yourself when choosing a provider:

    1. Number of points of presence (Point of Presence)
    The more points, the better, however ... However, why do you need points of presence in China if the site is Russian-speaking? And the number of points of presence in Australia when entering the American market ... When comparing CDN, the number of points of presence in countries and regions of interest should be taken into account. Just assurances about a large number of points of presence and good connectivity are not enough - for an informed choice you need to see a list of points of presence and compare them with the potential audience of the site.

    The points of presence themselves are also not equivalent - connectivity and peer-to-peer agreements with local providers are very important. Unfortunately, it is rather difficult for “non-residents” to assess connectivity (you need to understand the alignment of forces in the local provider market), but comparing offers it is worth clarifying the list of peers for each of the candidates at the most important points of presence.

    2. Caching policy
    In order to quickly deliver content from the local server, it is necessary that the content on the local server appears (and remains). There are many caching schemes, here are the most obvious ones:
    • Replication of all content. Plus - it works quickly right away, even the first request is issued quickly. The downside is expensive.
    • First call replication (the most common scheme). The first call is slow, then faster. It can be expensive, depending on the retention policy.
    • Asynchronous replication on exceeding a certain access threshold. More economical version, more customers get slower service.

    Next to the cache policy is a retention policy: when exactly is the object deleted from the server at the point of presence? By timeout, by reducing the number of calls below a certain value, “never”, after a fixed time? And who pays for the storage of the copy?

    3. SLA
    Yes, yes, the legendary and immense Service Level Agreement. Before you rejoice in a long series of nines, specify - is it an SLA for CDN “in general” or for all points of presence? If the server breaks down in the most important location for you and the content is delivered “from a neighboring country”, will this be counted for downtime by SLA? Well, the main thing, what threatens non-compliance with the SLA to the supplier? Will they return a penny from the monthly payment, or are there substantial fines?

    By the way, even though the selling manager will resist, it will be great if you are shown failure statistics for the previous time. There will be failures, and they will happen to everyone (hint: if you are told that someone has never had an accident - either these are very young or very arrogant) - the whole question is in their duration and frequency.

    4. Value added services
    CDN may provide additional services. Example (the list is incomplete):
    • real-time reporting of failure of individual nodes
    • Analytics
    • Integration with CMS
    • DRM for content
    • Ready html / flash video player for video files with CDN features support
    • Manage Caching Policies


    It is very important to pay attention to the support needed protocols and files. Find out if your provider supports streaming flash and media files (RTMP, RTSP) if you plan to deliver just such content.

    Perhaps the provider is very good in everything else, but if it does not support the technologies you need, you are unlikely to like it.

    5. Technical nuances
    Forwarding technology: This is either an enicast at the DNS level or forwarding through redirects. Enikast, for obvious reasons, is faster.

    Forwarding accuracy: Unfortunately, the supplier itself will not be able to objectively evaluate this indicator, although this indicator is very important - what part of the target audience gets to the nearest server. People often talk about the expected delay (since the actual distance does not concern anyone, but everyone cares about the packet transit time - for example, it happens that the junction between the two networks is overloaded and the packets go slowly, in this situation it is better to go a little further, but faster).

    6. Accounting
    How exactly does the supplier take the money? Per megabytes or per megabytes per second? Is there a minimum commit (“if it was less than the amount stipulated by the agreement to pay to a minimum”), what happens when overcommit (exceeding the limit) - disconnect / take more money? Is there a minimum contract period? Is there a contract at all (concluded between the site owner and the CDN provider), or is it an automatic self-serving on-demand provisioning, that is, “threw money into the account and got a control panel”?

    Starting with what volumes does it make sense to think about CDN?


    Let us repeat the thought: if you need to quickly serve customers, then the amount of traffic is no longer important - the presence points closer to the target audience are important.

    If there is no significant need for low latency, and CDN is used to ease the load on the server, then the meaningful amount of traffic with which to start thinking about CDN is a few terabytes per month.

    The main question is: how much does it cost?


    The price varies greatly from the specificity of the CDN, the degree of “coolness” of the supplier and the adaptation of CDN to specific special needs. The price range on the market is from $ 1 to $ 140 / megabit band, or $ 0.03- $ 0.3 per GB of traffic. The actual price very often depends on the added services and capabilities of the CDN. Traffic in the USA and Europe is usually the cheapest, then there is traffic in Asia / Australia, the most expensive traffic is outside these regions.

    Market Overview


    All companies are divided into two categories - operating at existing public tariffs and operating on the basis of agreements. The second companies are extremely difficult to compare, as the conditions in them can vary greatly. However, “private” does not mean “small” - private companies often have very large clients with huge volumes of hundreds of terabits (bands), but they don’t bother with “small fry” with a dozen gigabits.

    Here is a list of popular CDNs (so as not to offend anyone, the list is sorted in random order):

    Public CDNs:
    • NetDNA, 2009, minimum contract 1 year, prices from ¢ 1 to ¢ 6 per GB depending on volume, traffic outside the EU / US is one and a half times more expensive, free storage
    • Rackspace Cloud Files - ¢ 4- ¢ 12 per GB of bandwidth, ¢ 10 per GB of storage (Akamai resell)
    • MaxCDN from ¢ 3 to ¢ 8 per GB of traffic
    • Amazon CloudFront - EU / US - ¢ 6 to ¢ 12 per GB, free storage.
    • CacheFly - ¢ 20- ¢ 30, the minimum contract is $ 99 / month, excess space is paid ($ 15 / GB)
    • CDN77 - ¢ 3- ¢ 15 / GB
    • CloudFlare - traffic is not paid, a different level of service costs different money, from the basic free one following it at $ 5 / month to $ 200 / month at the best.
    • BitGravity - from ¢ 7 to ¢ 20 per GB, depending on volume and region
    • Level 3 - from $ 100 per month, ¢ 10- ¢ 25 per GB
    • Leaseweb - from ¢ 6 to ¢ 8 per GB, with a minimum cost of $ 60 / month
    • Windows Azure CDN ¢ 3 to ¢ 20
    • CDNsun - from ¢ 3 to ¢ 5 per GB

    Private CDN:
    • Internap
    • Akamai
    • Limelight networks
    • AT&T
    • Peer1
    • Edgecast


    Additional Information


    This article was written with the support of our colleagues at UCDN , who are too modest to include themselves in the list above.

    Also popular now: