Save on matches: how to increase locality in OpenStack using filters
Author: Alexey Ovchinnikov
Quite often, when creating a virtual machine on the cloud, there is a desire to associate it with some storage device. Quite often, when creating a virtual machine on the cloud, I want it to work as quickly as possible. In the case when some data storage device is connected to the virtual machine (VM), the exchange of information with it can significantly degrade the performance of the bundle. Therefore, it is clear that if the storage device is located on the same physical node on which the VM is deployed, the delay will be minimal. What is not obvious is how to achieve such a convenient placement using the OpenStack platform.
Unfortunately, OpenStack does not yet provide the means for such fine-tuning by default, however, being an open and easily extensible platform, OpenStack allows you to complement itself with similar functionality. In this post I will discuss the features of the implementation of such add-ons and pitfalls that may occur during their development and use.
I will begin my discussion with a simple question, namely, how the VM can be placed on a particular node.
As everyone (possibly) is well aware, the scheduler (the nova-scheduler component) is responsible for placing the VM on the nodes, therefore, in order to achieve the initial goal, it is necessary to somehow modify its behavior so that it takes into account the characteristics of the distribution of storage devices. A standard approach to this is to use scheduler filters. Filters can influence the choice of a node by the scheduler, while the filters can be controlled from the command line, passing them the characteristics that the nodes selected by the scheduler should correspond to. There are several standard filters that allow you to solve a fairly wide class of planning tasks and described in the OpenStack Docs project documentation.. For less trivial tasks, there is always the opportunity to develop your own filter. This is what we will do now.
The general idea of planning with filtering is quite simple: the user indicates the characteristics that the node should respond to, and then the scheduler selects a set of nodes that correspond to them. Then the VM can be started on one of the nodes selected at the previous stage. On which one, it is determined by its load and a number of other characteristics that are not significant at the filtering stage. Consider the filtering procedure in more detail.
Quite often, several filters are present in the system at once. The scheduler first makes a list of all available nodes, then applies each of the filters to this list, discarding inappropriate nodes at each iteration. In such a model, the filter task is very simple: consider the node submitted to it at the input and decide whether it meets the filtering criterion or not. Each of the filters is an object of one of the filter classes, which has at least one method -
For example, consider a RAM filter that selects nodes with enough memory. This is a standard filter that has a fairly simple structure, so you can develop more sophisticated filters on its basis:
class RamFilter (filters.BaseHostFilter):
"" "Ram Filter with over subscription flag" ""
def host_passes (self, host_state, filter_properties):
" "" Only return hosts with sufficient available RAM.
instance_type = filter_properties.get('instance_type')
requested_ram = instance_type['memory_mb']
free_ram_mb = host_state.free_ram_mb
total_usable_ram_mb = host_state.total_usable_ram_mb
memory_mb_limit = total_usable_ram_mb * FLAGS.ram_allocation_ratio
used_ram_mb = total_usable_ram_mb — free_ram_mb
usable_ram = memory_mb_limit — used_ram_mb
if not usable_ram >= requested_ram:
LOG.debug(_("%(host_state)s does not have %(requested_ram)s MB "
«usable ram, it only has %(usable_ram)s MB usable ram.»),
locals())
return False
# save oversubscription limit for compute node to test against:
host_state.limits ['memory_mb'] = memory_mb_limit
return True
To determine whether this node is suitable for a future VM, the filter needs to know how much RAM is currently available on the node, and also how much memory is required for the VM. If it turns out that the node has less free memory than is necessary for the VM, it
Scheduler hints are nothing more than a dictionary of key-value pairs that is contained in each request generated by the command
To use the hints, they obviously need to be extracted from the request. This procedure is also quite simple: all the hints are stored in the dictionary
scheduler_hints = filter_properties ['scheduler_hints']
important_hint = scheduler_hints.get ('important_hint', False)
In the scheduler, they are
Now we have the opportunity to receive arbitrary prompts. To achieve this goal, it remains to discuss how to use them so that
Having knowledge of ways to expand the functionality of the scheduler, you can easily design a filter that allows you to run VMs on the same nodes on which the storage devices of interest to the user are physically located. Obviously, we need to somehow distinguish the storage device that we are going to use. Here the volume_id line, unique for each device, can come to our aid. From volume_id, you should somehow obtain the name of the node to which it belongs, and then select this node at the filtering stage. Both last tasks should be solved by the filter, and for everything to work, the filter needs to be informed of the name of the node with the help of the appropriate hint.
First, use the hint mechanism to pass volume_id to the filter. For this, we agree to use the name
There are many ways to use cinder services: for example, use a combination of API calls to get the metadata associated with a given volume_id, and then extract the node name from them. However, this time we will use a simpler method. We will use the ability of the cinderclient module to generate the necessary queries and work with what it returns:
volume = cinder.cinderclient (context) .volumes.get (volume_id)
vol_host = getattr (volume, 'os-vol-host-attr: host', None)
It should be noted here that this approach will only work for the release of Grizzly and later, since the cinder extension, which allows us to get the information we are interested in, is available only in them.
The further implementation is trivial - it is necessary to compare
No, the considered method is neither optimal nor the only possible one. So, in a straightforward implementation, there is a problem associated with multiple calls to cinder, which is quite expensive, and a number of other problems that slow down the filter. These problems are not significant for small clusters, but can lead to significant delays when working with a large number of nodes. To improve the situation, you can modify the filter: for example, by entering a cache for the host name, which will limit itself to a single call to cinder to load the VM, or adding flags that will actually turn off the filter as soon as the desired node is detected.
To summarize, I note that VolumeAffinityFilter is just the beginning of work on using locality to improve cloud performance, and there is room for development in this direction.
The example I examined shows how you can develop a filter for the nova scheduler, which has a feature that distinguishes it from others. This filter uses the API of another component of the OpenStack platform to fulfill its purpose. Along with the addition of greater flexibility, this approach can be detrimental to overall performance, since services can be located at a considerable distance from each other. A possible solution to the problem of such fine-tuning can be combining the schedulers of all services into one that has access to all the characteristics of the cloud, but at the moment there is no simple and effective way to solve this problem.
Original article in English
Quite often, when creating a virtual machine on the cloud, there is a desire to associate it with some storage device. Quite often, when creating a virtual machine on the cloud, I want it to work as quickly as possible. In the case when some data storage device is connected to the virtual machine (VM), the exchange of information with it can significantly degrade the performance of the bundle. Therefore, it is clear that if the storage device is located on the same physical node on which the VM is deployed, the delay will be minimal. What is not obvious is how to achieve such a convenient placement using the OpenStack platform.
Unfortunately, OpenStack does not yet provide the means for such fine-tuning by default, however, being an open and easily extensible platform, OpenStack allows you to complement itself with similar functionality. In this post I will discuss the features of the implementation of such add-ons and pitfalls that may occur during their development and use.
I will begin my discussion with a simple question, namely, how the VM can be placed on a particular node.
As everyone (possibly) is well aware, the scheduler (the nova-scheduler component) is responsible for placing the VM on the nodes, therefore, in order to achieve the initial goal, it is necessary to somehow modify its behavior so that it takes into account the characteristics of the distribution of storage devices. A standard approach to this is to use scheduler filters. Filters can influence the choice of a node by the scheduler, while the filters can be controlled from the command line, passing them the characteristics that the nodes selected by the scheduler should correspond to. There are several standard filters that allow you to solve a fairly wide class of planning tasks and described in the OpenStack Docs project documentation.. For less trivial tasks, there is always the opportunity to develop your own filter. This is what we will do now.
A few words about filters
The general idea of planning with filtering is quite simple: the user indicates the characteristics that the node should respond to, and then the scheduler selects a set of nodes that correspond to them. Then the VM can be started on one of the nodes selected at the previous stage. On which one, it is determined by its load and a number of other characteristics that are not significant at the filtering stage. Consider the filtering procedure in more detail.
Quite often, several filters are present in the system at once. The scheduler first makes a list of all available nodes, then applies each of the filters to this list, discarding inappropriate nodes at each iteration. In such a model, the filter task is very simple: consider the node submitted to it at the input and decide whether it meets the filtering criterion or not. Each of the filters is an object of one of the filter classes, which has at least one method -
host_passes()
. This method should accept the node and filtering criteria as input and return True
either False
depending on whether the node satisfies the specified criteria. All filter classes must inherit from the base class BaseHostFilter()
defined innova.scheduler.filters
. At the time of launch, the scheduler imports all the modules specified in the list of available filters. Then, when the user sends a request to start the VM, the scheduler creates an object for each of the filter classes and uses them to filter out inappropriate nodes. It is important to note that these objects exist during one planning session. For example, consider a RAM filter that selects nodes with enough memory. This is a standard filter that has a fairly simple structure, so you can develop more sophisticated filters on its basis:
class RamFilter (filters.BaseHostFilter):
"" "Ram Filter with over subscription flag" ""
def host_passes (self, host_state, filter_properties):
" "" Only return hosts with sufficient available RAM.
instance_type = filter_properties.get('instance_type')
requested_ram = instance_type['memory_mb']
free_ram_mb = host_state.free_ram_mb
total_usable_ram_mb = host_state.total_usable_ram_mb
memory_mb_limit = total_usable_ram_mb * FLAGS.ram_allocation_ratio
used_ram_mb = total_usable_ram_mb — free_ram_mb
usable_ram = memory_mb_limit — used_ram_mb
if not usable_ram >= requested_ram:
LOG.debug(_("%(host_state)s does not have %(requested_ram)s MB "
«usable ram, it only has %(usable_ram)s MB usable ram.»),
locals())
return False
# save oversubscription limit for compute node to test against:
host_state.limits ['memory_mb'] = memory_mb_limit
return True
To determine whether this node is suitable for a future VM, the filter needs to know how much RAM is currently available on the node, and also how much memory is required for the VM. If it turns out that the node has less free memory than is necessary for the VM, it
host_passes()
returns False
and the node is removed from the list of available nodes. All information about the state of the node is contained in the argument host_state
, while the information necessary for making a decision is placed in the argument filter_properties
. Constants that reflect some general planning strategy, such asram_allocation_ratio
, can be defined elsewhere, in the configuration files or in the filter code, but this, by and large, is not important, since everything necessary for planning can be transferred to the filter using the so-called scheduler hints.Planner Tips
Scheduler hints are nothing more than a dictionary of key-value pairs that is contained in each request generated by the command
nova boot
. If nothing is done, then this dictionary will remain empty and nothing interesting will happen. If the user decides to pass some hint and thus replenish the dictionary with hints, then this can be easily done using the key - hint
as, for example, in the following command:nova boot … --hint your_hint_name=desired_value
. Now the dictionary with the hints is not empty, it contains the transmitted pair. If any extension of the scheduler knows how to use this hint, then he has just been given information that should be considered when working. If there is no such extension, then nothing will happen again. The second case is not as interesting as the first, so let's focus on the first. Let's see how the extension can take advantage of the tips. To use the hints, they obviously need to be extracted from the request. This procedure is also quite simple: all the hints are stored in the dictionary
filter_properties
by key scheduler_hints
. The following code snippet fully explains the procedure for receiving the prompt: scheduler_hints = filter_properties ['scheduler_hints']
important_hint = scheduler_hints.get ('important_hint', False)
In the scheduler, they are
nova scheduler_hints
always present in the request, so when developing your extension you can not expect unpleasant surprises here, however, you should be careful when reading the value of the hint. Now we have the opportunity to receive arbitrary prompts. To achieve this goal, it remains to discuss how to use them so that
Improve the connectivity of storage devices!
Having knowledge of ways to expand the functionality of the scheduler, you can easily design a filter that allows you to run VMs on the same nodes on which the storage devices of interest to the user are physically located. Obviously, we need to somehow distinguish the storage device that we are going to use. Here the volume_id line, unique for each device, can come to our aid. From volume_id, you should somehow obtain the name of the node to which it belongs, and then select this node at the filtering stage. Both last tasks should be solved by the filter, and for everything to work, the filter needs to be informed of the name of the node with the help of the appropriate hint.
First, use the hint mechanism to pass volume_id to the filter. For this, we agree to use the name
same_host_volume_id
. This is a simple task, having solved which we immediately encounter the following, which is less obvious: how to get the host name, knowing the identifier of the storage device? Unfortunately, apparently, there is no easy way to solve this problem, so we will ask for help from the one who is responsible for data storage: the cinder component. There are many ways to use cinder services: for example, use a combination of API calls to get the metadata associated with a given volume_id, and then extract the node name from them. However, this time we will use a simpler method. We will use the ability of the cinderclient module to generate the necessary queries and work with what it returns:
volume = cinder.cinderclient (context) .volumes.get (volume_id)
vol_host = getattr (volume, 'os-vol-host-attr: host', None)
It should be noted here that this approach will only work for the release of Grizzly and later, since the cinder extension, which allows us to get the information we are interested in, is available only in them.
The further implementation is trivial - it is necessary to compare
vol_host
with the names coming in and return True
only when they match. Implementation details can be found either in the package for Grizzly or in the implementation for Havana. With some reflection on the resulting filter, the question inevitably arises:Is that the best you can do?
No, the considered method is neither optimal nor the only possible one. So, in a straightforward implementation, there is a problem associated with multiple calls to cinder, which is quite expensive, and a number of other problems that slow down the filter. These problems are not significant for small clusters, but can lead to significant delays when working with a large number of nodes. To improve the situation, you can modify the filter: for example, by entering a cache for the host name, which will limit itself to a single call to cinder to load the VM, or adding flags that will actually turn off the filter as soon as the desired node is detected.
To summarize, I note that VolumeAffinityFilter is just the beginning of work on using locality to improve cloud performance, and there is room for development in this direction.
Instead of an afterword
The example I examined shows how you can develop a filter for the nova scheduler, which has a feature that distinguishes it from others. This filter uses the API of another component of the OpenStack platform to fulfill its purpose. Along with the addition of greater flexibility, this approach can be detrimental to overall performance, since services can be located at a considerable distance from each other. A possible solution to the problem of such fine-tuning can be combining the schedulers of all services into one that has access to all the characteristics of the cloud, but at the moment there is no simple and effective way to solve this problem.
Original article in English