An Interview with a Threat Hunter, Sqrrl’s David Bianco
By George Aquila
Big Data Security Analytics techniques are critical to hunt for advanced cyber threats. Starting with just some hypotheses, a seasoned threat hunter can use a Big Data tool, such as Sqrrl's threat hunting platform, to iterate through large amounts of data and detect anomalies that would otherwise go unnoticed by traditional defenses. While more and more companies are attempting to build cyber threat hunting capabilities, few tools exist to assist analysts in the challenges of the hunt. The expansion of data science capabilities into the cybersecurity realm holds great promise for the advancement of cyber hunting. Sqrrl’s David Bianco sheds some light on these crucial developments surrounding the rise of threat hunting, and how Sqrrl’s solution can provide these much needed solutions.
Could you describe your background and role at Sqrrl?
I’ve been in the IT industry for a little under 20 years, and I have been focused mostly on incident detection and response for about 12 of those years. At Sqrrl, I am a subject matter expert for hunting and incident detection and response, working on new ways to implement big data security analytics in threat hunting. Prior to Sqrrl, I was the Hunt Team manager at Fireeye, and I was a founding member of General Electric’s Computer Incident Response Team (GE-CIRT).
How do you define hunting?
Hunting is the practice of searching iteratively through your data to detect and isolate advanced threats that evade more traditional security solutions. You are not really starting with automated alerts, just a bunch of data and some questions. You interact with the data and try out some techniques to get the answers to your questions, exploring to see what works and what does not. Each step gets you a bit closer and you try to build on that to figure out what is the next piece of data that you need.
From a general cyber defense perspective, there is detection of bad things, and there is response to bad things, and it is a continuum with detection flowing right into response. You start by detecting something, and while you’re investigating that, you are already starting to figure out what to do about it. Hunting is typically viewed as a type of proactive detection, but it can also bleed over into response. As you track your prey, you may also realize that damage has already happened that requires immediate attention.
Do you use indicators of compromise to organize and drive hunting trips, or do you use other approaches?
Your starting point is never an indicator; it’s always a question, or a hypothesis. Your question might be “Is data exfiltration happening?” or your hypothesis might be “If there is data exfiltration happening, it’s most likely going on through this part of the network.” So you check to see whether there’s any exfiltration going through that subnet, and you try to figure out what protocol it would be sent with and what that would look like. There might be multiple ways you can look for it. For example, you might have HTTP logs, Netflow logs and FTP logs that you’re using in your hunt. An adversary could be exfiltrating data by FTPing it straight out, or using HTTP to more easily bypass firewalls. Having some hypotheses helps you figure out what data you need to examine and what analytic techniques might be most fruitful.
Alternatively, you might approach it more generally; you don’t know what they are doing or what they are sending out, but you know they will be sending large amounts of data, so you can look for that in your logs. Sometimes you may want to take a dataset that you’ve somehow arrived at and enrich it with intel, because maybe that will help you filter the suspicious things out. So for example, if you say “I’m concerned about a specific adversary” you can use all the intel that you know about that adversary to see if you have any matches against their infrastructure. Or you can use what we call friendly intelligence (i.e., services that your company usually uses), which you can use to filter out known good traffic.
In pursuing adversaries via hunting, is there a point in the kill chain where an adversary is more exposed?
Mostly you’re looking for where they will leave the largest digital footprints, which is mostly in the command and control and act on objectives phases. You can examine the cyber kill chain by phase:
Reconnaissance - You might find people because they can provide footprints, but there are so many other people doing it, and in most cases it will be nothing, so it’s not useful unless you have some intelligence for why you should care about one set of recon activity over another.
Weaponization - This happens completely on the attacker’s side, so you can’t find or hunt for it. You can sometimes find artifacts of it, but only at another Kill Chain phase (usually Delivery).
Delivery - You could hunt focusing on this phase, but usually you’ll need some other information. For example with spear phishing, it is difficult just to go through log files and find out suspicious emails, because you either have to parse the message or find malicious attachments. Usually you can’t do that just from monitoring, although there are programs that try to detonate potentially malicious attachments for that reason.
Installation - This happens over usually a few seconds and is generally confined to a single host. It is tricky, but you can find things if you have the right data. For example, if you were to collect process audit logs from your endpoints, you might be able to use them to find when malicious services are started on a user’s desktop. You can hunt for that if you are prepared to invest in collecting that data in the first place.
Command and Control - Here is where you can start hunting for things you’ve never really seen before because you can capture almost everything on the network, if you have good network monitoring, like netflow data or transactional data like HTTP or FTP logs.
Act on Objective - Adversaries can be doing practically anything by this point, and hopefully you have a lot of logs on your network that you can find them in. That makes it difficult because you don’t know their objectives, so you have to prioritize what kinds of activities you’re looking for. It depends on what you’re trying to protect. Nevertheless this is where an adversary will be most exposed, and you’ll be more likely to catch something they do.
To what extent does the use and manipulation of Big Data assist with hunting?
You need data to hunt, and with more data, you can have more detailed and thorough hunting trips. However, more data also brings it’s own challenges. It is easy to eyeball anomalies in a small dataset. With Big Data, there’s just too much; you can’t manually read it line by line. But the advantage to having all of those logs in a central place is that you have a lot of data that can support answering different types of questions.
With Big Data, advanced analytic techniques become even more important, as security analysts need assistance to filter through and prioritize data. However, many security analysts don’t possess advanced data science skillsets to directly manipulate and filter Big Data. In this sense, automated algorithms and prioritization are needed. The power of the Sqrrl Enterprise platform is the ability to pivot in real time between advanced data science techniques and the underlying linked data that powers them. An optimal hunting platform, such as Sqrrl’s, enables a threat hunter to filter and prioritize Big Data and iteratively ask the data questions and explore the relationships in the data.
Are there any specific factors that you think have motivated organizations to transition to a more offensive threat hunter approach? Is it being pulled by by the necessity of trying to keep up with new threats or is it being pushed by new technological advances that are enabling better hunting?
The question of whether it is a push or a pull is interesting. I don’t claim to know the definitive answer. It starts with organizations realizing that their existing traditional security solutions, such as firewalls and SIEMs, are not finding everything that they need to find. So on the detection side they’re not necessarily performing 100%. They’re doing well for what they do, but the problem is that signature-based or even intelligence-based network monitoring systems are limited. But attackers are virtually unlimited in what they can do. Adversaries are very flexible and agile, but traditional alerting systems usually are not.
Right now, though, it’s difficult for organizations to get into hunting. There are very few trained hunters and there is a lack of good tools. The people who are trained and know how to hunt are not necessarily data scientists, so they usually have only a few basic techniques in their inventory. The demand for tools like Sqrrl Enterprise is growing, because many companies want to combine hunting skills sets with Big Data Security Analytics. In Sqrrl, we have a solution that will allow them to do that. We are providing the analytics, the machine learning, and statistics that are focused on helping the analysts hunt for things.
From the hunting perspective, what are the emerging trends or tools that have changed how we might respond to cyber threats?
There have been three major drivers. One is the increased adoption of what I call Enterprise Security Monitoring (ESM). It’s gathering the right combination of host-based, network-based and log data, everything you can get out of your enterprise, and using it to identify and research security incidents. This is becoming more accepted, and it is pretty well known as a best practice, but not everyone is doing it quite yet.
The second driver is threat intelligence. In the last couple of years people and companies have been talking about sharing information so that they can better counter threats. This works, but it’s not magic. Many people don’t really understand how to best use the information they’re gathering or, in some cases, don’t know what is good intelligence versus bad intelligence. For example, a lot of people think that threat intelligence is lists of IPs and domain names, but that is just too limited. I can say from experience that just those two things are not really useful unless you know what the type of threat is so that you can look intelligently. Just looking at IPs and saying “go for it” is ineffective.
The third driver is the rise of analytics - the “rise of the machines” - and using machine learning in a way that makes sense and provides useful results. However, machine learning is still relatively new to most security analysts. There are a number of factors that make doing machine learning and analytics like this in the security field much more difficult than they are in other fields.
In security, you’re not dealing with predictable patterns that you can depend on seeing again, like detecting the failure rate of hardware. You have an adversary who is making decisions on the fly, actively trying to trick you based on a secret goal that you don’t necessarily know in advance. There are so many factors that you have to consider, and it is very difficult for us to automate a defense that with the present level of understanding. We are also not talking about hackers sitting in their parents’ basement anymore. Right now we’re talking about sophisticated attackers who do this as a full time job, who by and large have more money and resources than the average defender at an organization has. It’s very asymmetric.
We may always be playing catch up, but the name of the game is not to detect everything that they do right away, it’s to make it as difficult as possible for them to do what they are trying to do without being detected.
What are the ideal skills or qualities a threat hunter should possess? From your experience, where do analysts acquire the requisite skills to be one?
I would say the best skills are inquisitiveness, persistence, and a real willingness to learn. If you have those 3 you can find things successfully. The more experience you have in the security the better, but I think there probably will be a new class of security analyst at some point soon, that are specialists not only in IT security, but also in data science in at least some ways. I don’t think every analyst will be that, but I think it will be difficult to do effective hunting or investigation without having some data science training; there’s a high overlap between those two.
The main thing is that if you’re doing hunting, most of it will not pan out. You might try different techniques and realize those are not working either, so you continue you try more. You do that a few times until suddenly you figure out “this is how to get my answer; this is the best way.” I think that hunting is mostly experience based, so you have to learn from someone. For a lot of the people who are doing hunting, we’re barely scratching the surface. So I probably would not consider any hunter I know an expert, including myself, because as a discipline we haven’t really learned much of what there is to learn yet. There’s still a lot to be explored.
If you are interested in seeing a live demo of Sqrrl Enteprise in use in an APT Hunting scenario, please request a demo here.