How does Cisco monitor security on its internal network?
From the point of view of ensuring cybersecurity, we usually face only three main tasks, which, of course, are then divided into smaller subtasks and projects, but, exaggerating a little, for the most part, there are only three tasks:
Whatever solutions we consider, they fit into these three tasks that we must implement anywhere in the corporate network. This life cycle of the fight against threats (BEFORE - TIME - AFTER) is the basis for the activities of the Cisco IS service. Moreover, I note that since Cisco does not have the concept of a perimeter, we try to implement the three tasks described above everywhere - in data centers, in the clouds, in the Wi-Fi segment, on mobile devices of employees, at Internet access points, and, of course , in our internal network, which we’ll talk about monitoring today.
After all, we don’t have such a question regarding the perimeter (why we don’t have a perimeter, I’ll tell you some other time), where we put ITU, IDS, content filtering, and many other network security tools. Why should the internal network be an exception? What, it is impossible to get into it from the outside, bypassing the perimeter? Yes, a bunch of ways. Through unprotected your own Wi-Fi or through the access point of the neighboring “Chocolate Girl”, to which the mobile devices of your employees who are used to grab a cup of hot coffee or dine at the cafe automatically connect. Through a hacked home laptop or tablet / smartphone, a guide that dragged him into the office to “IT people sorted out.” Through a flash drive thrown to the office, on which an undetectable malicious code is poured. Through the encrypted channel of the client-server application, which was developed without taking into account security issues. Yes, through vulnerabilities in the perimeter in the end (we won’t assume that our ITU is absolutely invulnerable). That is, the internal network needs the same protection as the perimeter, on which many organizations unjustifiably often focus their defense efforts, completely forgetting about the truth about the weakest link.
How threats are prevented in a Cisco internal network I already told“For this we use the Cisco Identity Service Engine (ISE), which acts as a distributed ITU, turning every switch or router, and we only have the last over 40 thousand, into part of an internal access control tool that works on dynamic policies. But delimitation and prevention alone are not enough for us. We must also monitor activity within the framework of connections permitted inside, and also monitor any violations of the established rules of internal access (we are all human beings, we all tend to make mistakes). On the perimeter we would put an intrusion detection system (IDS) and solve this problem. Installing IDS on the internal network is not easy, although Cisco would be glad to sell as many sensors as possible to our Cisco NGIPS, especially since we once again became the leader in this market according to Gartner. But this is often impossible (excluding some places inside the network such as data centers or individual segments). It is also impossible from the point of view of architecture - not every switch port can have an IDS sensor, not every trunk or span port that IDS is often connected to will pull traffic from all switch ports, the span port is not always free. It is impossible from a financial point of view - it’s very expensive to buy high-performance sensors on each switch or router. Even at Cisco, despite the fact that we ourselves produce NGIPS, we cannot spend a lot of money on monitoring using the IDS of our internal network - this is quite expensive. Even if you try to use splitters (tap), they will not solve all the problems, neither from the point of view of architecture, nor from the point of view of finance. Moreover, while on the perimeter, security guards still somehow learned to live with IT people (networkers), then in the internal network the conflict continues to smolder. Do we have an alternative to classic network IDS to solve the same problem?
The answer will be yes and it is called Netflow. This is a protocol that was originally developed by Cisco for the purpose of detecting network problems (troobleaching), which then became the de facto standard for many network manufacturers who either supported Netflow in their equipment or created its clones - Cflow, sFlow, Jflow, NetStream, Rflow, etc. But since we are talking today about how the internal Cisco network is monitored, and we use our own network equipment, we will concentrate only on the Netflow protocol, which today is on almost any normal network hardware of the enterprise level (from any home it’s not worth waiting for devices to support Netflow - at home it is simply not needed and will only make the decision heavier and more expensive).
So, there is Netflow on the network equipment through which all traffic that requires control passes. This means that we can try to use it not only to detect any problems on the network, but also for information security purposes. In addition, the support of Netflow with network equipment allows us not to build a separate, superimposed network for monitoring purposes, but allows us to use already used network equipment for IS tasks. On the one hand, this protects the investments already made and reduces the cost of the monitoring solution, and on the other hand, it makes it simpler from the point of view of both architecture and implementation - we don’t have to try to correctly direct the flows we need to tens or hundreds of IDS sensors, which in the opposite case would have been forced to put on their network. Working with Netflow gives us one more thing. not immediately visible advantage. In the case of installing conventional IDS, we must solve the problem of directing traffic or its copy to the sensors of the protection system. If for some reason this does not happen (for example, due to a change in topology or lack of bandwidth at the sensor), then we will absolutely not see anything and will think that there are no attacks in the traffic of interest to us. The sensor itself works - just does not receive and does not analyze the necessary traffic. This does not work with Netflow - we see everything that flows through a network device, which can be not only a router or switch (including virtual ones), but also, for example, a firewall (for example, Cisco ASA supports the NSEL function, Netflow Security Event Logging,
It's time to make a couple of comments about Netflow that came from using Netflow for security on a Cisco network. First, you need to know that Netflow can be unsampled and sampled. The differences between them are the same as if you read the whole book “from cover to cover” or leaf through it, stopping only on every tenth page. Many network devices support Netflow, but only sampled, which is not suitable for security purposes, since we see only a small part of all the traffic that we need for monitoring. Therefore, pay attention to which Netflow you have. Sampled for IS purposes is not suitable. Secondly, you need to know that the processing of Netflow, especially unsampled, creates a load on the processor of a network device, what needs to be considered when planning a network and building a security monitoring system based on Netflow. If your switches or routers are already running to the limit and their CPU utilization reaches 80-90% in normal operation, then it’s worth ten times to think about whether to enable Netflow on them, because of which the device’s performance will certainly drain even more. There are two solutions to this situation - updating the network infrastructure (all the same, sooner or later it would have to be done) and using Netflow generation devices. We at Cisco used both options. In those cases where monitoring Netflow was a critical task and it was time for an upgrade, we installed new switches and routers with hardware processing Netflow, not loading the CPU. In other cases, we used a solution called FlowSensor,
There are several versions of the Netflow protocol, the most common of which are today the 5th and 9th. Based on the latter, an open IPFIX specification was developed. The 5th version of the protocol allows you to collect the following information about network traffic:
The 9th version supports additional fields related to IPv6, MPLS, BGP. There are also extended versions of Netflow, the same IPFIX, Flexible Netflow, which support custom fields.
This information is superficial at first glance and reflects only what is in the headers of network sessions, but in fact, as we have already seen in the story about encrypted traffic analysis technology (ETA), this is sufficient in many cases to classify and recognize traffic. For example, an excessively large number of packets or bytes can characterize DoS, a large number of destination addresses over a limited time interval can mean a network scan, etc. With Netflow, you can profile nodes and track deviations from their standard behavior.
Initially, Cisco used the free nfdump solutions for monitoring ( https://github.com/phaag/nfdump) and OSU FlowTools (OSU is an abbreviation of Ohio State University), which allows you to work with Netflow - filter, make selections and perform other operations on network streams. Both solutions are fast enough (they can process several tens of gigabytes of streams per day), easy to use for those who already have experience with classic utilities like tcpdump, are flexible in filtering. But these tools also have problems that can be divided into three parts. Firstly, these are the drawbacks of the utilities themselves, which did not allow, for example, normal aggregation of streams, which often leads to duplication (imagine that the same stream goes through 5 network devices - the utilities will also see and write them 5 times). In addition, nfdump and Flow Tools didn’t allow you to track stream loss, which led to a feeling of false security (to paraphrase the classic phrase from the movie “DMB”, “you think you see the stream, but really not”). In a small network, this is not so critical, but in such a distributed network as in Cisco, it began to create great difficulties as the implementation of the monitoring system at new sites expanded. Secondly, any open source project has difficulties associated with supporting it, adding new features, expanding the number of supported versions of Netflow, etc. And finally, working with nfdump and OSU Flow Tools required not only highly qualified personnel, but also considerable efforts to automate typical and routine tasks related to incident investigation and response. but in such a distributed one as in Cisco, this began to create great difficulties as the implementation of the monitoring system at new sites expanded. Secondly, any open source project has difficulties associated with supporting it, adding new features, expanding the number of supported versions of Netflow, etc. And finally, working with nfdump and OSU Flow Tools required not only highly qualified personnel, but also considerable efforts to automate typical and routine tasks related to incident investigation and response. but in such a distributed one as in Cisco, this began to create great difficulties as the implementation of the monitoring system at new sites expanded. Secondly, any open source project has difficulties associated with supporting it, adding new features, expanding the number of supported versions of Netflow, etc. And finally, working with nfdump and OSU Flow Tools required not only highly qualified personnel, but also considerable efforts to automate typical and routine tasks related to incident investigation and response.
For example, to detect infected internal machines that communicate with command servers, you need to create and keep up to date the appropriate list in the Cisco ACL format, and then submit it to the input of flow-filter utilities, which will apply it to collected Netflow streams.
It is clear that for effective work you need to keep up to date many pre-created lists - attacking, C & C-servers, internal nodes, segmented by various criteria, etc. At the same time, the last list itself will constantly change due to the variability of the Cisco dynamic infrastructure.
Another problem with the use of nfdump and OSU Flow Tools is the inability to recognize who initiated a particular connection (this is important in the investigation), since the flows are unidirectional. We have to carry out additional work in order to understand who was the first in client-server connections. Finally, we came across another subtlety associated with the work of these utilities. They record only completed streams, which can lead to the inability to quickly track attacks that occur in real time. For example, an attacker has already scanned the network, compromised the host and penetrated it, and neither nfdump nor Flow Tools are aware of this, since they have not recorded the network flow.
After gaining experience with nfdump and OSU Flow Tools, and as we switched to IPv6 and Netflow, version 9, we began to look for a tool that was free from the shortcomings we encountered. It was Lancope’s Stealthwatch solution, which we later acquired and became part of Cisco. Stealthwatch is built on the classic architecture for any analyzer “sensor - collector - analyzer”.
As sensors, we use our network infrastructure, which passes all internal network traffic through itself, translates it into Netflow and passes it to collectors for analysis. As I wrote above, network equipment does not always support Netflow or is capable of efficiently processing it. For this task we use separate hardware or virtual FlowSensor (we have 13 of them in all). Given the geographically-distributed infrastructure of Cisco, we do not reduce all flows to one or two collectors, but a whole distributed cluster of 21 FlowCollector, which process about 20 billion Netflow flows every day in search of malicious activity in our corporate backbone and data centers. And we have only two consoles - Cisco Incident Response Services have access to them in accordance with their roles
Perhaps the main obstacle to the effective use of open source Netflow monitoring tools in our network (and in general) was their lack of normal analytics. They have flexible means of filtering and sampling, but without a person they are not able to decide on the presence or absence of a problem in network flows. Stealthwatch was deprived of this drawback - its key advantage was the presence of a built-in database of algorithms that allowed it to evaluate Netflow and recognize various security breaches in it - network scanning, DoS, malicious code distribution, information leakage, etc.
Key scenarios in which we use Stealthwatch (actually there are more):
Netflow is an ideal source of information for incident investigation, which contains all the necessary information about who, what, when, and what actions were carried out. At the same time, all this information is stored in the database and response service employees can make the necessary requests for it, filtering and selecting by the necessary fields, quickly finding answers to their questions. Integration with Cisco ISE provides information not only with binding to the IP addresses of the nodes involved in the incident, but also with binding to the user accounts of the company in Active Directory. The latter option is not only convenient, but also significantly reduces the time it takes to map a user name to its dynamic IP address, which he had at a particular moment in time. Reducing time is a critical success factor in investigating incidents.
The second case, which shows the power of Stealthwatch, is the detection of interaction with the command servers of botnets and malware. It would seem that this can be done on the perimeter, but let's recall what this note began with. You can attack the user today in a bunch of different ways and not always through the perimeter. What if the ransomware made its way inside the network through unprotected Wi-Fi and through it leaked information or received updates for malicious code? This can only be fixed by monitoring internal network traffic and Stealthwatch is indispensable here. Previously, we performed this task using nfdump, but it had one limitation - it was necessary to manually update the list of IP addresses of command servers, collecting it from different sources. In the case of Stealthwatch, this task is solved automatically - he regularly uploads updated indicators of compromise, containing information about the management servers. The usefulness of this function also lies in the fact that it monitors the obsolescence of addresses from the list and removes them as necessary. In the case of nfdump, I had to do it manually, which took up valuable time.
Denial of service attack detection is another popular use case that we use on our network. This is not to say that such incidents occur regularly, but it does happen. “Floods”, “storms”, and “avalanches” of requests using various network and application protocols are easily detected using Stealthwatch.
Network Traffic Analysis class solutions, which include Stealthwatch, do not have DLP functionality and are not able to monitor the content of correspondence using various protocols. However, they are ways to deal with information leaks, for which a slightly different principle is used. Given that it is possible to track the amount of information transmitted within Netflow, we can set for each node or user some average values of the amount of information using different protocols that a user can download from the Internet or upload to the Internet. Let's say for HTTP this figure will be 100 MB per day.
Accordingly, exceeding this value will be considered an anomaly, and a significant excess, for example, 5 times or more, as a clear violation of the IS policy. Uploading large amounts of data to cloud storage may mean that the user is trying to steal confidential information. Of course, I am not in vain using the word “maybe”, since this can also be a sign of completely legal activity, for example, a user sends a new software distribution or set of documents or video training through the cloud. In any case, the trigger for exceeding the threshold data values should be the reason for the investigation.
Another scenario that we actively use in our network in relation to Stealthwatch is to check the settings of access control rules to track unauthorized traffic between segments. Segmentation is one of the most useful tools in the arsenal of information security services, which can significantly reduce the attack area, localize problems, quickly conduct investigations, etc. In our network, segmentation is actively used on the basis of network equipment, and Cisco ISE manages it. Using Stealthwatch, we check the correctness of segmentation settings and see traffic that should not appear in a particular segment.
This same feature allows you to check the correctness of the firewall settings that are on the perimeter and, possibly, allow some unauthorized traffic. In fact, Stealthwatch in this use case turns into an additional tool for monitoring the actions of administrators.
Stealthwatch is a solution that, although it can be used by networkers and IT specialists, is intended for security professionals. At Cisco, he is involved in the Cisco CSIRT Incident Response Service. We collect data from 180 key network devices installed in data centers, large corporate hubs and in the DMZ, receiving approximately 180 thousand streams per second.
In one of the previous notes, I already wrote about the availability of API in our products. There is such an API in Stealthwatch and it is very actively used by our incident response service. In particular, it is through the API that we update information about the nodes included in certain groups.
It is through the API that we update information on new malicious nodes, the interaction with which is monitored using Stealthwatch. Using the API, we integrate Stealthwatch with our open source Threat Intelligence CRiTS platform. This allows us, when receiving data on new indicators of compromise, to distribute this information for all security measures integrated with CRiTS through the API.
The API allows us to collect from Stealthwatch the events and flows we need to transfer them to Splunk, which is the main monitoring tool at Cisco, including for conducting more detailed investigations.
An interesting experience that I haven’t seen anywhere else is the concept of Mobile SOC (Security Operations Center), which we use to monitor information security at remote sites that we buy companies, new factories, partners, or when conducting investigations on sites that are not connected to central monitoring system. Mobile SOC is a transportable rack with information security equipment, which includes not only Stealthwatch, but also the Netflow Generation Appliance, Splunk, Firepower, Web Security Appliance, etc.
We are not satisfied with what has already been achieved and plan to actively develop the use of Stealthwatch in our infrastructure. Among the priority plans:
In general, we must admit that the analysis of Netflow with Stealthwatch helps our information security service detect more incidents than the usual set of security tools used on the perimeter of the corporate network. You can track the dynamics of changes in the sources of data on incidents that occur in our country. If earlier it was mainly the signatures of attacks from IDS, today only one fifth of all incidents account for this source. Another fifth comes from behavioral analysis, 40% from indicators of compromise. Detection of the remaining 20% of incidents is based specifically on Netflow.
- The “Stealthwatch at Cisco” success story (Eng.)
- A story about using Stealthwatch at Cisco from the CSIRT CEO (video)
- The Stealthwatch page on Cisco
- threat prevention
- threat detection
- response to threats.
Whatever solutions we consider, they fit into these three tasks that we must implement anywhere in the corporate network. This life cycle of the fight against threats (BEFORE - TIME - AFTER) is the basis for the activities of the Cisco IS service. Moreover, I note that since Cisco does not have the concept of a perimeter, we try to implement the three tasks described above everywhere - in data centers, in the clouds, in the Wi-Fi segment, on mobile devices of employees, at Internet access points, and, of course , in our internal network, which we’ll talk about monitoring today.
Why do I need to monitor the internal network?
After all, we don’t have such a question regarding the perimeter (why we don’t have a perimeter, I’ll tell you some other time), where we put ITU, IDS, content filtering, and many other network security tools. Why should the internal network be an exception? What, it is impossible to get into it from the outside, bypassing the perimeter? Yes, a bunch of ways. Through unprotected your own Wi-Fi or through the access point of the neighboring “Chocolate Girl”, to which the mobile devices of your employees who are used to grab a cup of hot coffee or dine at the cafe automatically connect. Through a hacked home laptop or tablet / smartphone, a guide that dragged him into the office to “IT people sorted out.” Through a flash drive thrown to the office, on which an undetectable malicious code is poured. Through the encrypted channel of the client-server application, which was developed without taking into account security issues. Yes, through vulnerabilities in the perimeter in the end (we won’t assume that our ITU is absolutely invulnerable). That is, the internal network needs the same protection as the perimeter, on which many organizations unjustifiably often focus their defense efforts, completely forgetting about the truth about the weakest link.
Isn't IDS enough?
How threats are prevented in a Cisco internal network I already told“For this we use the Cisco Identity Service Engine (ISE), which acts as a distributed ITU, turning every switch or router, and we only have the last over 40 thousand, into part of an internal access control tool that works on dynamic policies. But delimitation and prevention alone are not enough for us. We must also monitor activity within the framework of connections permitted inside, and also monitor any violations of the established rules of internal access (we are all human beings, we all tend to make mistakes). On the perimeter we would put an intrusion detection system (IDS) and solve this problem. Installing IDS on the internal network is not easy, although Cisco would be glad to sell as many sensors as possible to our Cisco NGIPS, especially since we once again became the leader in this market according to Gartner. But this is often impossible (excluding some places inside the network such as data centers or individual segments). It is also impossible from the point of view of architecture - not every switch port can have an IDS sensor, not every trunk or span port that IDS is often connected to will pull traffic from all switch ports, the span port is not always free. It is impossible from a financial point of view - it’s very expensive to buy high-performance sensors on each switch or router. Even at Cisco, despite the fact that we ourselves produce NGIPS, we cannot spend a lot of money on monitoring using the IDS of our internal network - this is quite expensive. Even if you try to use splitters (tap), they will not solve all the problems, neither from the point of view of architecture, nor from the point of view of finance. Moreover, while on the perimeter, security guards still somehow learned to live with IT people (networkers), then in the internal network the conflict continues to smolder. Do we have an alternative to classic network IDS to solve the same problem?
How to monitor the internal network without IDS?
The answer will be yes and it is called Netflow. This is a protocol that was originally developed by Cisco for the purpose of detecting network problems (troobleaching), which then became the de facto standard for many network manufacturers who either supported Netflow in their equipment or created its clones - Cflow, sFlow, Jflow, NetStream, Rflow, etc. But since we are talking today about how the internal Cisco network is monitored, and we use our own network equipment, we will concentrate only on the Netflow protocol, which today is on almost any normal network hardware of the enterprise level (from any home it’s not worth waiting for devices to support Netflow - at home it is simply not needed and will only make the decision heavier and more expensive).
So, there is Netflow on the network equipment through which all traffic that requires control passes. This means that we can try to use it not only to detect any problems on the network, but also for information security purposes. In addition, the support of Netflow with network equipment allows us not to build a separate, superimposed network for monitoring purposes, but allows us to use already used network equipment for IS tasks. On the one hand, this protects the investments already made and reduces the cost of the monitoring solution, and on the other hand, it makes it simpler from the point of view of both architecture and implementation - we don’t have to try to correctly direct the flows we need to tens or hundreds of IDS sensors, which in the opposite case would have been forced to put on their network. Working with Netflow gives us one more thing. not immediately visible advantage. In the case of installing conventional IDS, we must solve the problem of directing traffic or its copy to the sensors of the protection system. If for some reason this does not happen (for example, due to a change in topology or lack of bandwidth at the sensor), then we will absolutely not see anything and will think that there are no attacks in the traffic of interest to us. The sensor itself works - just does not receive and does not analyze the necessary traffic. This does not work with Netflow - we see everything that flows through a network device, which can be not only a router or switch (including virtual ones), but also, for example, a firewall (for example, Cisco ASA supports the NSEL function, Netflow Security Event Logging,
It's time to make a couple of comments about Netflow that came from using Netflow for security on a Cisco network. First, you need to know that Netflow can be unsampled and sampled. The differences between them are the same as if you read the whole book “from cover to cover” or leaf through it, stopping only on every tenth page. Many network devices support Netflow, but only sampled, which is not suitable for security purposes, since we see only a small part of all the traffic that we need for monitoring. Therefore, pay attention to which Netflow you have. Sampled for IS purposes is not suitable. Secondly, you need to know that the processing of Netflow, especially unsampled, creates a load on the processor of a network device, what needs to be considered when planning a network and building a security monitoring system based on Netflow. If your switches or routers are already running to the limit and their CPU utilization reaches 80-90% in normal operation, then it’s worth ten times to think about whether to enable Netflow on them, because of which the device’s performance will certainly drain even more. There are two solutions to this situation - updating the network infrastructure (all the same, sooner or later it would have to be done) and using Netflow generation devices. We at Cisco used both options. In those cases where monitoring Netflow was a critical task and it was time for an upgrade, we installed new switches and routers with hardware processing Netflow, not loading the CPU. In other cases, we used a solution called FlowSensor,
What can Netflow tell us?
There are several versions of the Netflow protocol, the most common of which are today the 5th and 9th. Based on the latter, an open IPFIX specification was developed. The 5th version of the protocol allows you to collect the following information about network traffic:
- Source Address
- Destination Address
- Source port for UDP and TCP
- Destination port for UDP and TCP
- Message Type and Code for ICMP
- IP Protocol Number
- Network interface (ifindex SNMP parameter)
- Type of Service Value
- Time parameters
- The amount of bytes and packets transmitted
- TCP flag values
- Route Information
- Information about autonomous systems.
The 9th version supports additional fields related to IPv6, MPLS, BGP. There are also extended versions of Netflow, the same IPFIX, Flexible Netflow, which support custom fields.
This information is superficial at first glance and reflects only what is in the headers of network sessions, but in fact, as we have already seen in the story about encrypted traffic analysis technology (ETA), this is sufficient in many cases to classify and recognize traffic. For example, an excessively large number of packets or bytes can characterize DoS, a large number of destination addresses over a limited time interval can mean a network scan, etc. With Netflow, you can profile nodes and track deviations from their standard behavior.
How did we monitor our network before?
Initially, Cisco used the free nfdump solutions for monitoring ( https://github.com/phaag/nfdump) and OSU FlowTools (OSU is an abbreviation of Ohio State University), which allows you to work with Netflow - filter, make selections and perform other operations on network streams. Both solutions are fast enough (they can process several tens of gigabytes of streams per day), easy to use for those who already have experience with classic utilities like tcpdump, are flexible in filtering. But these tools also have problems that can be divided into three parts. Firstly, these are the drawbacks of the utilities themselves, which did not allow, for example, normal aggregation of streams, which often leads to duplication (imagine that the same stream goes through 5 network devices - the utilities will also see and write them 5 times). In addition, nfdump and Flow Tools didn’t allow you to track stream loss, which led to a feeling of false security (to paraphrase the classic phrase from the movie “DMB”, “you think you see the stream, but really not”). In a small network, this is not so critical, but in such a distributed network as in Cisco, it began to create great difficulties as the implementation of the monitoring system at new sites expanded. Secondly, any open source project has difficulties associated with supporting it, adding new features, expanding the number of supported versions of Netflow, etc. And finally, working with nfdump and OSU Flow Tools required not only highly qualified personnel, but also considerable efforts to automate typical and routine tasks related to incident investigation and response. but in such a distributed one as in Cisco, this began to create great difficulties as the implementation of the monitoring system at new sites expanded. Secondly, any open source project has difficulties associated with supporting it, adding new features, expanding the number of supported versions of Netflow, etc. And finally, working with nfdump and OSU Flow Tools required not only highly qualified personnel, but also considerable efforts to automate typical and routine tasks related to incident investigation and response. but in such a distributed one as in Cisco, this began to create great difficulties as the implementation of the monitoring system at new sites expanded. Secondly, any open source project has difficulties associated with supporting it, adding new features, expanding the number of supported versions of Netflow, etc. And finally, working with nfdump and OSU Flow Tools required not only highly qualified personnel, but also considerable efforts to automate typical and routine tasks related to incident investigation and response.
For example, to detect infected internal machines that communicate with command servers, you need to create and keep up to date the appropriate list in the Cisco ACL format, and then submit it to the input of flow-filter utilities, which will apply it to collected Netflow streams.
[mynfchost]$ head bot.acl
ip access-list standard bot permit host 69.50.180.3
ip access-list standard bot permit host 66.182.153.176
[mynfchost]$ flow-cat /var/local/flows/data/2007-02-12/ft* | flow-filter -S bot.acl
Start End Sif SrcIPaddress SrcP DIf DstIPaddress DstP
0213.08:39:49.911 0213.08:40:34.519 58 10.10.71.100 8343 98 69.50.180.3 31337 0213.08:40:33.590 0213.08:40:42.294 98 69.50.180.3 31337 58 10.10.71.100 83
It is clear that for effective work you need to keep up to date many pre-created lists - attacking, C & C-servers, internal nodes, segmented by various criteria, etc. At the same time, the last list itself will constantly change due to the variability of the Cisco dynamic infrastructure.
Another problem with the use of nfdump and OSU Flow Tools is the inability to recognize who initiated a particular connection (this is important in the investigation), since the flows are unidirectional. We have to carry out additional work in order to understand who was the first in client-server connections. Finally, we came across another subtlety associated with the work of these utilities. They record only completed streams, which can lead to the inability to quickly track attacks that occur in real time. For example, an attacker has already scanned the network, compromised the host and penetrated it, and neither nfdump nor Flow Tools are aware of this, since they have not recorded the network flow.
What are we using now?
After gaining experience with nfdump and OSU Flow Tools, and as we switched to IPv6 and Netflow, version 9, we began to look for a tool that was free from the shortcomings we encountered. It was Lancope’s Stealthwatch solution, which we later acquired and became part of Cisco. Stealthwatch is built on the classic architecture for any analyzer “sensor - collector - analyzer”.
As sensors, we use our network infrastructure, which passes all internal network traffic through itself, translates it into Netflow and passes it to collectors for analysis. As I wrote above, network equipment does not always support Netflow or is capable of efficiently processing it. For this task we use separate hardware or virtual FlowSensor (we have 13 of them in all). Given the geographically-distributed infrastructure of Cisco, we do not reduce all flows to one or two collectors, but a whole distributed cluster of 21 FlowCollector, which process about 20 billion Netflow flows every day in search of malicious activity in our corporate backbone and data centers. And we have only two consoles - Cisco Incident Response Services have access to them in accordance with their roles
Use case
Perhaps the main obstacle to the effective use of open source Netflow monitoring tools in our network (and in general) was their lack of normal analytics. They have flexible means of filtering and sampling, but without a person they are not able to decide on the presence or absence of a problem in network flows. Stealthwatch was deprived of this drawback - its key advantage was the presence of a built-in database of algorithms that allowed it to evaluate Netflow and recognize various security breaches in it - network scanning, DoS, malicious code distribution, information leakage, etc.
Key scenarios in which we use Stealthwatch (actually there are more):
- investigation
- detect interaction with C&C
- DoS attack detection
- data leak detection
- checking the settings of the rules for delimiting the firewall.
Netflow is an ideal source of information for incident investigation, which contains all the necessary information about who, what, when, and what actions were carried out. At the same time, all this information is stored in the database and response service employees can make the necessary requests for it, filtering and selecting by the necessary fields, quickly finding answers to their questions. Integration with Cisco ISE provides information not only with binding to the IP addresses of the nodes involved in the incident, but also with binding to the user accounts of the company in Active Directory. The latter option is not only convenient, but also significantly reduces the time it takes to map a user name to its dynamic IP address, which he had at a particular moment in time. Reducing time is a critical success factor in investigating incidents.
The second case, which shows the power of Stealthwatch, is the detection of interaction with the command servers of botnets and malware. It would seem that this can be done on the perimeter, but let's recall what this note began with. You can attack the user today in a bunch of different ways and not always through the perimeter. What if the ransomware made its way inside the network through unprotected Wi-Fi and through it leaked information or received updates for malicious code? This can only be fixed by monitoring internal network traffic and Stealthwatch is indispensable here. Previously, we performed this task using nfdump, but it had one limitation - it was necessary to manually update the list of IP addresses of command servers, collecting it from different sources. In the case of Stealthwatch, this task is solved automatically - he regularly uploads updated indicators of compromise, containing information about the management servers. The usefulness of this function also lies in the fact that it monitors the obsolescence of addresses from the list and removes them as necessary. In the case of nfdump, I had to do it manually, which took up valuable time.
Denial of service attack detection is another popular use case that we use on our network. This is not to say that such incidents occur regularly, but it does happen. “Floods”, “storms”, and “avalanches” of requests using various network and application protocols are easily detected using Stealthwatch.
Network Traffic Analysis class solutions, which include Stealthwatch, do not have DLP functionality and are not able to monitor the content of correspondence using various protocols. However, they are ways to deal with information leaks, for which a slightly different principle is used. Given that it is possible to track the amount of information transmitted within Netflow, we can set for each node or user some average values of the amount of information using different protocols that a user can download from the Internet or upload to the Internet. Let's say for HTTP this figure will be 100 MB per day.
Accordingly, exceeding this value will be considered an anomaly, and a significant excess, for example, 5 times or more, as a clear violation of the IS policy. Uploading large amounts of data to cloud storage may mean that the user is trying to steal confidential information. Of course, I am not in vain using the word “maybe”, since this can also be a sign of completely legal activity, for example, a user sends a new software distribution or set of documents or video training through the cloud. In any case, the trigger for exceeding the threshold data values should be the reason for the investigation.
Another scenario that we actively use in our network in relation to Stealthwatch is to check the settings of access control rules to track unauthorized traffic between segments. Segmentation is one of the most useful tools in the arsenal of information security services, which can significantly reduce the attack area, localize problems, quickly conduct investigations, etc. In our network, segmentation is actively used on the basis of network equipment, and Cisco ISE manages it. Using Stealthwatch, we check the correctness of segmentation settings and see traffic that should not appear in a particular segment.
This same feature allows you to check the correctness of the firewall settings that are on the perimeter and, possibly, allow some unauthorized traffic. In fact, Stealthwatch in this use case turns into an additional tool for monitoring the actions of administrators.
Who uses Stealthwatch and where?
Stealthwatch is a solution that, although it can be used by networkers and IT specialists, is intended for security professionals. At Cisco, he is involved in the Cisco CSIRT Incident Response Service. We collect data from 180 key network devices installed in data centers, large corporate hubs and in the DMZ, receiving approximately 180 thousand streams per second.
In one of the previous notes, I already wrote about the availability of API in our products. There is such an API in Stealthwatch and it is very actively used by our incident response service. In particular, it is through the API that we update information about the nodes included in certain groups.
It is through the API that we update information on new malicious nodes, the interaction with which is monitored using Stealthwatch. Using the API, we integrate Stealthwatch with our open source Threat Intelligence CRiTS platform. This allows us, when receiving data on new indicators of compromise, to distribute this information for all security measures integrated with CRiTS through the API.
The API allows us to collect from Stealthwatch the events and flows we need to transfer them to Splunk, which is the main monitoring tool at Cisco, including for conducting more detailed investigations.
An interesting experience that I haven’t seen anywhere else is the concept of Mobile SOC (Security Operations Center), which we use to monitor information security at remote sites that we buy companies, new factories, partners, or when conducting investigations on sites that are not connected to central monitoring system. Mobile SOC is a transportable rack with information security equipment, which includes not only Stealthwatch, but also the Netflow Generation Appliance, Splunk, Firepower, Web Security Appliance, etc.
Development plans
We are not satisfied with what has already been achieved and plan to actively develop the use of Stealthwatch in our infrastructure. Among the priority plans:
- Continued integration with ISE not only to obtain contextual information about the nodes and users involved in the incident, but also to implement the blocking function. In the future, through ISE, a combination of Stealthwatch at the network level and AMP4E at the PC level should be implemented, which will more quickly localize problems with information security.
- As you upgrade to the new version of Stealthwatch, Encrypted Traffic Analytics features will automatically appear, allowing you to detect malicious code in encrypted traffic.
- Introducing Stealthwatch Cloud to monitor the IaaS and PaaS cloud platforms that are widely used in Cisco.
- Integration with AnyConnect, which is implemented by every Cisco employee on his laptop, smartphone or laptop, in order to receive data on user and application activity in Netflow format and correlate this information with Netflow at the network level.
In general, we must admit that the analysis of Netflow with Stealthwatch helps our information security service detect more incidents than the usual set of security tools used on the perimeter of the corporate network. You can track the dynamics of changes in the sources of data on incidents that occur in our country. If earlier it was mainly the signatures of attacks from IDS, today only one fifth of all incidents account for this source. Another fifth comes from behavioral analysis, 40% from indicators of compromise. Detection of the remaining 20% of incidents is based specifically on Netflow.
Additional Information:
- The “Stealthwatch at Cisco” success story (Eng.)
- A story about using Stealthwatch at Cisco from the CSIRT CEO (video)
- The Stealthwatch page on Cisco