Sqrrl Blog

Mar 9, 2017 8:00:00 AM

The Nuts and Bolts of Detecting DNS Tunneling

DNS-based attacks have been commonly used since the early 2000’s, but over 40% of firms still fall prey to DNS tunneling attacks. Tunneling attacks originate from uncommon vectors, so traditional automated tools like SIEMs have difficulty detecting them, but they also must be found in massive sets of DNS data, so hunting for tunneling manually can be challenging as well. So, how can we use more advanced analytic techniques to isolate these adversary behaviors? In a different publication we covered Domain Generation Algorithms and what the best sources are for detecting them. In this piece, we’ll be covering how best to sniff out malicious DNS tunneling on your network.

As we discussed in last week’s webinar, attackers can use DNS queries to send data to an external server. This can be useful for both command and control or data exfiltration, since it bypasses common security measures such as firewalls and proxies. Here is where having a deep knowledge about domains that your firm commonly interacts with can be useful. Since an attacker has to use a compromised domain to exfiltrate data, you can use that knowledge to separate out suspicious domains from those that are popular and trusted. This helps to filter DNS data and make it more manageable. This data can then be collated into smaller channels, with the source IP, destination domain, and group of data being sent to the domain.

DNS tun 1.png

For each collated session, we want to ask which ones have instances of tunneling in them. This can be done a number of ways. First, you can count the number of queries that the channel is making. Exfiltration tunnels will generally produce an abnormally large amount of queries. DNS tunnels will also produce an unusually large number of unique subdomains. Be also on the lookout for the average subdomain length and the average information content of the subdomain.

DNS tun 2.png

Once you compute the features for each session, you want to classify them as either “malicious” or “normal” traffic. If we have particular examples of malware in mind, we could look for sessions that closely match the feature values we expect to get from the malware. However, we can also take a broader view that looks for any outliers from the “normal” data. This allows us to detect unknown malware and avoid overfitting to specific examples.

The vast majority of sessions should be normal traffic, so we can treat any significant outliers as potential malicious sessions. A number of tools can be used to do this classification and outlier detection, including scikit-learn and TensorFlow. We use a multivariate Bayesian approach that looks for outliers in multiple features. This approach takes into account the observed distribution of feature values, expected rate of attacks and observation time to find the most suspicious sessions.

You also want to orient your system to avoid false positives. Calling something a “false positive”  can often be a tricky distinction, because even in some cases where tunneling is used, it may not necessarily be malicious. A common example of non-malicious tunneling would be a person attempting to use DNS tunneling to gain access to wifi that is beyond a paywall at an airport or motel. Additionally, internal DNS and popular services such as Slack, Spotify, anti-virus and anti-spam software can be mistaken for being malicious, since they both employ long strings of random-looking data in their queries. These issues can generally be resolved by whitelisting approved domains, or by using a hard cut alerting only when the number of unique subdomains per user per hour exceed the cut.

So, that’s the nuts and bolts of detecting DNS. If you want a more in depth exploration of how Sqrrl detects DNS related behaviors (including DGA detection), you can check out a recorded session of our webinar here. If you’re interested in Sqrrl’s threat hunting technology, consider requesting a demo of our software, which is available here.

Sqrrl Test Drive VM

Topics: Machine Learning, UEBA, DNS