By Josh Liburdi, Security Technologist at Sqrrl, and George Aquila
In part 1 of this hunter’s den post we took a look at the adversary tactic of internal reconnaissance, including what kinds of artifacts might be left behind when internal reconnaissance has occurred on your network. In this post we’ll take a look at the types of data and the various hunting techniques that you can use to hunt for the various kinds of internal reconnaissance.
Datasets to explore
Data is a critical component of hunting, and many different kinds of datasets can be useful depending on the type of hunt that you are carrying out. For internal reconnaissance, there are two major data types that are useful to a hunt, process execution metadata and network connection metadata.
Process execution metadata
This type of data contains information on processes that were executed on a host. In this context, critical metadata associated with process execution includes command-line commands/arguments and process filenames-- both of these pieces of metadata can be used to identify the artifacts described in part 1. Additionally, this metadata should include the name of the host that the process was executed on and the name of the user who executed the process.
Some of the sources and tools that produce this kind of data includes
- Carbon Black
- PowerShell auditing
- Process creation auditing
Network connection metadata
This type of data contains information on network connections between hosts. In this context, critical metadata associated with network connections include the source IP address, destination IP address, destination port, start time of the connection, and end time/duration of the connection. For hunting network enumeration with this type of metadata, it’s best to have data that includes internal-to-internal connections between hosts on a local subnet.
Some of the sources and tools that produce this kind of data includes
Network connection metadata should allow a hunter to map out the connections between specific IPs on a network, especially given a central node, seen here represented in Sqrrl
In general, process execution metadata is preferred over network connection metadata because it provides coverage for many internal reconnaissance hypotheses.
Techniques to Use
After having developed the right hypotheses and isolating the necessary datasets, a hunter must still know what techniques to use to investigate the hypothesis. Available techniques will depend on an organization’s hunting maturity, but there are still effective ways to hunt lower down in the Hunting Maturity Model.
Various techniques will fit into different stages of the hunting maturity model, though many techniques can be carried out to different levels of effectiveness even at a lower maturity levels.
The following are a few techniques that can be used to look for artifacts of internal reconnaissance:
Searching for internal reconnaissance commands and patterns can be useful if the search includes thoughtful filtering, especially friendly intelligence. Consider these three scenarios:
- Searching for command execution of ‘whoami’ across an enterprise will produce far too many results that will not be useful
- Searching for command execution of ‘whoami’ across a particular class of workstations (e.g., engineering systems) will produce less results that may be more useful
- Searching for command execution of ‘whoami’ on across a particular class of workstations that should not normally execute the command (e.g., C-suite laptops) will produce far less results that may be very useful
To make this technique effective at finding internal reconnaissance, it’s best to have an explicit goal in mind-- such as the last example above.
Grouping for internal reconnaissance commands is similar to searching, except you can review multiple artifacts across multiple assets in one result set. It’s valuable to take commands related to a specific architecture (e.g., Windows), put them into a single group, and look for the execution of the group on a single asset.
For example, if a group contains 14 unique commands related to process information gathering, then looking for at least 5 executions of the unique commands in the group on a single asset may indicate that internal reconnaissance occurred. By setting a threshold (5 of 14 executed commands must appear), potentially compromised assets are separated from the larger data set.
This technique can be enhanced with the introduction of a time or window element. We can expect that the information gathering commands that adversaries use for internal reconnaissance will appear in our logs rapidly (and possibly even in a consistent order across multiple assets). By applying a time threshold, the chances of finding internal reconnaissance increases.
For example, without a time threshold, the previously discussed implementation of grouping would occur across “all time” for any given data set-- that is, from the beginning of time in the data set to the end of time in the data set. This “all time” range may be hours, days, weeks, or even months. It is more realistic for internal reconnaissance to be conducted within small time ranges-- for example, within 10 minutes. Applying this concept to the technique requires the ability to separate the “all time” range into distinct buckets of time-- once again, 10 minutes is sufficient-- over which the grouping is then performed per asset in each distinct bucket of time. For example, if the “all time” range is 24 hours, then the number of 10 minute buckets of time is 144, and grouping occurs in each individual bucket of time. The originally defined threshold (5 of 14 executed commands must appear) still applies, but it’s on a much smaller time range-- this increases the chance of finding malicious activity.
This technique works best when you are hunting for multiple, related instances of unique artifacts.
Multiple visualizations may be applied to hunting for internal reconnaissance, but for this example, we will focus on one: box plots. Box plots visually describe distribution of data, with a box that represents median values and whiskers that represent high and low values (outliers). It may be useful to visualize each of the following with a box plot:
- Frequency of command execution across hosts
- Command execution across hosts across time
- Variety of command execution across hosts
Taking the last example from the list above, box plots are a simple way to visualize the variety of command executions across hosts. Once again, it’s helpful to thoughtfully filter the input to the box plot with friendly intelligence-- for example, grouping hosts based on type or function. The simplest example of this is to group hosts into categories by their type: workstations and servers. In the visualization below, we have done this and included a box plot for both categories combined (the first box plot, labeled “All”). Each data point in the visualization represents a unique host; the Y plane represents the number of unique commands that each host executed. There are 17 hosts in the dataset and each host may have executed up to 8 unique commands related to internal reconnaissance.
A significant portion of hunting with visualizations is collecting relevant data, preparing that data, and choosing which visualization can best represent the data. Now that the visualization is complete, outliers can be easily seen: there is at least one server that has a high number of anomalous command executions (2.5 is the median) and two workstations that are also high, but not as anomalous (6.5 is the median). These three hosts are good candidates for investigation.
A box plot of the number of recon commands executed by workstations and servers. There are 17 hosts in the dataset and each host may have executed up to 8 unique commands related to internal reconnaissance. Here there are three potential points of interest: the two workstation upper outliers and the one server upper outlier.
Pulling it all together - An example hunt
The example below illustrates how to use the hypotheses laid out in part 1 along with the data and techniques outlined above.
1. What are you looking for? (Hypothesis)
Hypothesis: An attacker conducting internal reconnaissance would attempt to carry out host enumeration and automate these commands with a script
Look for these commands to be spawned by a script:
2. Investigation (Tools and Data)
Datasets: Process execution metadata
Can be explored with many log analysis tools, such as Sqrrl, Kibana or simply a command line.
3. Uncover Patterns and IOCs (Techniques)
Using grouping, search for the above artifacts in process execution metadata. Specify that the commands need to be executed within a given time frame.
Doing this, you discover a previously unidentified script that contains commands to enumerate host information and saves the results in a unique file.
4. Inform and Enrich Analytics (Takeaways)
Taking the script and output files, you can now add those file names to your indicator database and automated detection tool’s watchlists. In this way, if the attacker continues to try and use this script on another host it will be detected automatically. The indicators can also be used to identify other previously compromised systems.
In each instance of a hunt, there will always be multiple different paths that a hunter can take to address a given hypothesis. To find instances of a specific attacker behavior, such as internal reconnaissance, a hunter may also need to iterate on multiple types of hypotheses.