By Josh Liburdi, Sqrrl Security Technologist, and George Aquila
The Hunter’s Den blog series aims to go beyond framework and theory and dig into practical tips and techniques for threat hunting. In our previous post, we examined the practical ways that one can hunt for Internal Reconnaissance. In this post, we will take a look at how to hunt for Command and Control (C2) activity. Command and control is the process through which an attacker establishes a connection with a compromised asset that they have taken control of in a target network. C2 is a critical step in the process of carrying out an attack on a network. It is a category broad enough that it has its own kill chain step (KC6, “Command and Control”). Although it is a broad tactic, this post will survey the different ways that it might generally be carried out by an adversary.
Understanding Command and Control
C2 enables remote access for attackers into target networks. Architecturally, C2 is fairly predictable. It will follow generally one of two models for implementation: a Client-Server model or a Peer-to-Peer model. Attackers have multiple options of building their C2 channel, each of which are outlined below.
Command and Control can be identified in either network or endpoint data. Network data, including network session and application protocol metadata, is likely the most common source that organizations have access to for identifying C2 activity. Endpoint data, like process execution and file metadata, can also be used to find C2, but the availability of this type of data is less common. As such, the focus of this post will be on network data.
High-level TTP overview
Attackers generally build C2 channels into common protocols (ComPro) or custom protocols (CusPro). A few examples of common protocols include HTTP/S, SSL/TLS, or DNS. Custom protocols are harder to predict, but include techniques such as encrypting packet data with an XOR cipher.
Like in the case with protocols, attackers generally use common network ports (ComPor) or uncommon network ports (UncPor) for their C2 channels. Examples of standard ports include 80/TCP (HTTP), 443/TCP (SSL/TLS), 53/UDP (DNS). Uncommon ports are difficult to predict, but they typically deviate from ports registered with IANA. Attackers can use any combination of the above protocols and ports:
- ComPro + ComPor
- ComPro + UPor
- CusPro + ComPor
- CusPro + UncPor
In practice, these are the 4 methods through which an attacker may configure a command and control channel. It is instructive to quickly survey each of these methods.
Method 1 - Utilizing a common protocol on a common port
This method of establishing C2 relies very much on the concept of “blending in” with normal network traffic. Often this includes using HTTPS and utilizing high traffic ports in the effort of looking like legitimate traffic. The risk of using this method, from the perspective of an attacker, is that common ports and protocols are often those that are the most regulated by automated detection systems like a SIEM or IDS. As an example, the “Cozy attackers” (Cozy Bear / CozyDuke) have used the standard HTTPS protocol in the past to establish command and control channels. In these cases they were craftily encrypting data inside the HTTP header.
Method 2 - Utilizing a custom protocol on a standard port
Some types of malware have been observed to use custom protocols on standard ports. For example, the FakeM RAT uses a modified implementation of the SSL protocol to transmit data (and it can also mimic common messaging protocols). This can be useful for establishing a connection out of a network because one can expect that in most networks common ports (like 80/TCP, 53/UDP, and 443/TCP) will be open. From the perspective of an attacker, the risk of using this method is that common ports are usually closely watched and the use of a custom protocol might make that transmission stand out from the rest of the traffic. A defender can ask and quickly investigate simple questions to home in on whether this is happening (e.g., “is there any traffic on port 80 that is not HTTP?”).
A comparison of normal MSN Messenger traffic and FakeM RAT’s mimicked traffic
Method 3 - Utilizing a common protocol on an uncommon port
Various kinds of malware use common protocols to establish communication links, but do so over uncommon ports that may be open but not as closely monitored as common ones. This might include sending data from common protocols (for example, HTTPS) to ports that will usually be reserved for other protocols (such as DNS). Cisco has documented examples of malware using TLS over uncommon ports.
Method 4 - Utilizing a custom protocol on uncommon port
As a common way to circumnavigate the regular communications channels that might be monitored on a network, attackers can use a custom protocol and send data through an uncommon port. In some target networks, this may be the stealthiest C2 method available to attackers and has been observed as configurations in both commodity malware and custom attack tools. One example of this method is used by Kryptik malware, which is described on Malware-Traffic-Analysis.
A packet sample of Kryptik's custom protocol
As covered in previous posts, hypotheses should be the foundation and the basis for whatever hunt you undertake, specifying exactly what it is you’re looking for. There are a number of common hypotheses that you can develop to hunt for C2 that directly correlate to the four kinds of TTPs described above. Hypotheses can be derived from working out details of how an attacker might utilize common protocols or ports. Here are generalized hypotheses that align with each type of C2 approach:
ComPro + ComPor
- Attackers may be operating on a C2 channel that uses a common protocol on a common network port
- Look for unique artifacts pertinent to the protocol you are interested in. For example, if you are interested in identifying C2 in HTTP traffic, then you might consider looking for anomalous domains, URLs, or User-Agent strings.
ComPro + UncPor
- Attackers may be operating on a C2 channel that uses a common protocol on an uncommon network port
- Look for identifiable artifacts pertinent to the protocol you are interested in on uncommon ports. For example, look for HTTP artifacts over ports that are not 80/TCP or 8080/TCP. Uncommon ports can be identified externally (for example, ports not registered with IANA) or internally (for example, use statistical analysis of network session metadata to determine which ports are infrequently used).
CusPro + ComPor
- Attackers may be operating on a C2 channel that uses custom encryption on a common network port
- Look for anomalies in monitored network port channels. For example, look for connections that have no identifiable SSL metadata over port 443/TCP.
CusPro + UncPor
- Attackers may be operating on a C2 channel that uses custom encryption on an uncommon network port
- Look for anomalous characteristics (e.g., connection duration/length, bytes transferred) in traffic over uncommon ports. Once again, Uncommon ports can be identified externally (for example, ports not registered with IANA) or internally (for example, use statistical analysis of network session metadata to determine which ports are infrequently used).
Datasets to Explore
The type of dataset that you use to hunt for C2 depends on what you are hunting for and, by extension, what your hypothesis is.
For identifying use of custom protocols, you will want to focus primarily on network session metadata, including:
- Netflow (“flow” data in general)
- Firewall logs (should log allowed / accepted packets)
- Bro Conn log
For identifying use of common protocols, you will want to focus primarily on application protocol metadata, including:
- Proxy logs, IIS logs
- DNS resolution logs
- Bro HTTP, SSL, DNS, SMTP logs
Techniques to Use
After having developed the right hypotheses and chosen the necessary datasets, a hunter must still know what techniques to use to investigate a hypothesis. Here we will survey 3 types of techniques that you can use to investigate the above.
As in all cases of using indicators in hunting, the value of this approach will be impacted by the value of the indicator. Locally sourced indicators will generally provide a high value because they tend to be timely and relevant to the network or systems you might be trying to protect. These types of indicators can be gathered from previous incidents or by internal threat intelligence teams.
It’s also important to remember that it is relatively easy for attackers to change the infrastructure that they use to conduct attacks; if you are using indicators to hunt, then you should be aware that the indicators may no longer be relevant to a particular attacker or attack tool.
Some common network session indicators to search for include:
- IP address
Some common application protocol indicators to search for include:
- Domain (HTTP, DNS, SSL)
- URL (HTTP)
- User-Agent string (HTTP)
- X.509 Certificate Subject (SSL)
- X.509 Certificate Issuer (SSL)
- Email address (SMTP)
Stacking is a technique commonly used in many different kinds of hunts. In the case of hunting for command and control activity, a hunter will want to stack for anomalous instances of inbound or outbound traffic. The same metadata types from indicator search above can be used for stacking, including:
- X.509 Certs
To find C2, you will want to focus on either bidirectional or internal to outbound connection flows. The effectiveness of using stacking is dependent on having a finely tuned input. Too little won’t reveal enough and too much will flood your ability to tease out meaningful deviations. If a given result set is too large, then consider further filtering of the input data set (e.g., isolate your focus to specific internal subnets). Alternatively, change the metadata that is being stacked (e.g., change from stacking destination IP addresses to stacking destination ports).
A more advanced technique involves using machine learning to isolate malicious C2 activity. Supervised machine learning uses labeled training data to make predictions about unlabeled data. Given a set of known good and known bad examples, you can create a binary classifier capable of taking in new transactions and deciding if they look more similar to the good training set or the bad training set. After the classifier is trained, assuming you have done a good job, you can feed your HTTP (or other network logs) through it and get back much smaller set of records that require analyst attention.
Clearcut is a proof-of-concept/demonstration tool created by David Bianco and Chris McCubbin for using machine learning to find C2-like traffic in Bro HTTP logs. In one extreme example, it was used to process approximately 180,000 log entries and produced less than 300 log entries after the machine was trained (a 99.999% reduction). The resulting entries generated by the machine can be manually reviewed to check for C2 activity. You can find Clearcut at https://github.com/DavidJBianco/Clearcut.
For more information on using machine learning for hunting, we highly recommend watching the presentation ‘Practical Cyborgism: Getting Started with Machine Learning for Incident Detection’ from David Bianco and Chris McCubbin (presented at BSides DC 2016).
The example below illustrates how to use the hypotheses laid out above with the data and techniques enumerated.
Command and Control
1. What are you looking for? (Hypothesis)
Hypothesis: Attackers may be operating on a C2 channel that uses custom encryption (uncommon protocol) on a common network port
2. Investigation (Data)
Datasets: For identifying use of common protocols, you will want to focus primarily on application protocol metadata, including:
3. Uncover Patterns and IOCs (Techniques)
Take the output of step 1 and remove the common protocol connections from the session data on the common port. This should leave uncommon protocol connections on the common port
Take the results of step 2 and stack the data for what is useful to investigating your hypothesis
4. Inform and Enrich Analytics (Takeaways)
The destination IP addresses involved in the C2 activity you have discovered can be taken as IOCs and added to an indicator database in order to expand automated detection systems.
Always keep in mind that for each instance of a hunt, there will always be multiple different paths that a hunter can take to address a given hypothesis.