Full video is here: http://www.youtube.com/watch?v=GOWGIMRhAz4
Hi, everybody. This is Dave Vellante, and we’re here at Wikibon.org. This is theCUBE, where we bring you the best guests that we can find, the smartest people, the tech athletes we like to say. Adam Fuchs is here. He’s the CTO and one of the founders of Sqrrl, a company that is building a big data platform on top of Accumulo, and we’re going to do a drill down into security. Adam, welcome.
Thanks, Dave. Good to be here. Yeah, we’ve broken out the whiteboard again to try to go into some details on security in general, with specific implications on the big application and big data ecosystem.
All right. So the first thing I want to cover really is just some basics on where security requirements come from and where privacy policies and legal restrictions on data access can originate. So here I’ve drawn a basic example where we have some source IP, destination IP, maybe user information in here. This could be log data derived from a proxy, a web proxy, some systems that are commonly mined for cyber security information, and as I’ve drawn it, we have a couple of security policies that might be in place. So you might consider that this is a table combined from a lot of different sources, where some of those sources are maybe a little more sensitive than others. If I have a proxy that’s going to one particular part of my subnet, maybe I want to restrict access to those logs to administrators that are dealing with that part of the subnet. So we may have a row-based policy, where I might say this particular row can only be seen by a subset of administrators. Maybe I’ve got some higher level administrators that can see across all of the data, so I still want to put this into the same table, but I have a fine-grained access control problem here.
Typically the policies that are applied to data come from legal restrictions, so they might be something like the Health Insurance Portability and Accountability Act, which restricts personally identifiable information. They might be Sarbanes-Oxley, which has similar restrictions, or any number of the geographically-based legal restrictions on how data can be used in various countries. Typically a lot of those attributes, or a lot of those security policies, are implemented as schema controls, so you might select the user column here. You might say anything in this user column is personally identifiable, so I am restricted in how that data can be used, and it’s a subset of users that can see that data, even if I give broader access to maybe the structure formed by the IP addresses in that same log data.
So with the combination of row-level access controls and column-level access controls, we have a little bit of cell-level access control already in place, but we want to go a little bit further than that. So typically with big data analysis, we’ll take this data, and we’ll transform it. We’ll de-normalize it. We’ll create graph structures around it. And as we do that transformation, we want to preserve the providence of the data and the policy that was applied to the original data in that transformed version.
Here I have another graphic, which displays really the interactions between different users here, Alice, Bob, and Charlie. Maybe all of this data is restricted by that user column-level security, but some of it is also indicative of more row-based or more source-based access here. I can look at the connections between Bob and Charlie here and say that those were really derived from one of my rows in the previous diagram, where there’s going to be a different set of policy applying to those. Here we don’t necessarily have any column-based or row-based orientation preserved from our original view of data. We need cell-level security to be able to represent with high fidelity that access control and the policy surrounding those particular data elements. This is something that, certainly in my experience, is fairly ubiquitous across any kind of big data applications that cover data with diverse policy requirements on them.
All of this stuff fits inside of really a bigger ecosystem of security, so that system is going to be defined by some data with policies applied to it and a set of users that are going to access that data. But in order to create an overall secure architecture, we also need things like authentication mechanisms. These could be your active directories, your identity management systems. There are user attributes, which could be stored in LDAP or some other user database. There are auditing systems. There are key management systems. All of this stuff provides the necessary components and services required to create the overall secure ecosystem, and what I’m going to draw here are some data flows that preserve all of the policies and all of the cell-level access controls that are provided in this particular database here. So here what we have is data flowing in through some labeling mechanism, and this is going to decide what were the source-based attributes, what were the column-based attributes that were in there before, and really those are all derived from policy. So policy and data together form some sort of labeled data. We store that in a fine-grained access control database, sqrrl enterprise, backed by Accumulo in there, and then as we access it, applications need to see a view of that, which is then filtered in a way so that the end user that’s querying that application really is getting a view that is only derived from data that they’re allowed to see.
In order to do that, there’s an additional step here. The user authenticates their authentication system. That triggers user attribute look up. This also gives an access to the application itself and tells sqrrl enterprise that this is the end user that’s expecting to see that data. The user attributes are fed to a policy engine, which also understands the policy that’s applied to the data. That policy engine lets us know what are the data labels that that particular user on the query side has access to. So that feeds into a filter, which is implemented inside of the database. We’ve pushed all of that fine-grained access control into the database, so that guarantees that all of this work has been done on the outside and that the user only sees data that they’re allowed to see. At the same time, the database ties into audit managing systems, and for more of the traditional security aspect, it ties into key management systems to do proper encryption at rest and in motion, and that forms sort of an overall secure flow of information.
I mentioned that typically the source information, when you’re dealing with big data applications, gets transformed. So there’s actually another flow in here, and that’s really coming from this app back through that labeling mechanism. This is one of the places where fine-grained access control is very important, so instead of having to kind of remodel the security as I ingest that data back into the system, whatever the results of this application were, this process is a lot simpler once I have labeled data. It’s really a matter of preserving that label, rather than recreating it and totally remodeling all the data.
So there you have it. There’s the overall secure ecosystem. So what we covered today are big data implies complex security. When you have big data, you have diverse data typically, you have diverse security requirements, and you need a complex security system in order to handle them. Data centric security is the approach that we advocate for dealing with that, and Sqrrl enterprise is at the core of that data centric security model. And really Sqrrl provides a complete ecosystem, where we can provide the labeling mechanism, the policy engine, access to the key management systems, access to the auditing databases, ties into the enterprise identity and access management systems, and a great API that apps can be developed on top off to take advantage of all of this security.
Awesome. Well, thank you, Adam, for coming by and sharing your deep knowledge in this really important area. Check out Sqrrl.com if you want to see more of Adam’s work. Go to YouTube.com/siliconangle, or to go to SiliconAngle.com for all the news, and also check out Wikibon.org for all the research. No pay walls. It’s all free open source. Thanks for watching, everybody.