DNS-based fault tolerance
It’s easy to organize fault tolerance inside one data center - there are tons of tools and techniques.
But what if you need to organize fault tolerance on the basis of several data centers?
Below I will give, in my opinion, an elegant and very cheap solution, not without deviations, of course.
The point is that each data center has its own NS server that gives the IP of its data center.
Now in the pictures, imho is so clearer and more understandable ...
And so what happens when the browser tries to open the web page (simplified version):
If the DNS does not respond, then the dns client accesses the following ns server:
Zone settings for each data center .
It can be seen that in some data centers of the fronts there can be more than 1.
In general, I talked about the idea. And from it you can wind up a lot of interesting things.
Advantages:
Disadvantages:
PS Be sure to put in the zone file:
$ TTL 60; 1 minutes
But what if you need to organize fault tolerance on the basis of several data centers?
Below I will give, in my opinion, an elegant and very cheap solution, not without deviations, of course.
The point is that each data center has its own NS server that gives the IP of its data center.
Now in the pictures, imho is so clearer and more understandable ...
And so what happens when the browser tries to open the web page (simplified version):
If the DNS does not respond, then the dns client accesses the following ns server:
Zone settings for each data center .
It can be seen that in some data centers of the fronts there can be more than 1.
In general, I talked about the idea. And from it you can wind up a lot of interesting things.
Advantages:
- If the data center falls within a minute, all customers will go to working sites.
- If you need to carry out maintenance work - turn off named, wait a minute, you can work.
Disadvantages:
- A very small part of customers will still be crowded into a “turned off” data center.
- It is necessary to maintain a separate zone file for each data center, but this task can be easily solved using for example puppet.
- The load is not evenly distributed, but bearable
PS Be sure to put in the zone file:
$ TTL 60; 1 minutes