by Joe Travaglini, Director of Product Marketing for Sqrrl
It has been said that we entered the age of Big Data when the opportunity cost of throwing data away exceeded the monetary cost of keeping it. This inflection point has led to the adoption of Hadoop, the emergence of tools to tame the wild yellow elephant, and more recently the attractive notion of a ‘data hub’ or ‘data lake’. Driven by the commoditization of both hardware and software, this fundamental change in technology economics may very well be true, but it still takes proper execution to convert this opportunity cost into actual income.
Many folks are still struggling to find that “killer app” – the big data use case that will produce significant measurable impact to their line of business. This is to be expected for such a nascent field. There are a number of reasons contributing to this reality – the concepts and technology are still very new, especially when compared to incumbent systems. Other than the obvious obstacle – coming up with the right idea – there are a wide variety of tactical challenges that are easy to take for granted.
Risky Business – Avoiding the Big Data Deluge
Many folks would claim that is best to minimize the amount of discrete ‘clusters’ of big data, in terms of operational efficiency, optimizing infrastructure utilization rates, economies of scale, and the promise of data science over multi-structured sources of data. In other words, the ideal big data platform is massively multitenant.
In practice, early adopters have found that rolling out a multitenant system comes with a fair share of obstacles – some technical, others having to do with business processes and regulatory compliance. Consider traditional data systems: issues like masking, audit, metadata management, and security are all well understood and have mature solutions to meet their needs. Additionally, these implementations tend to be contained in various ‘silos’, making it easier to ‘secure the perimeter’ and allow direct access only to a small set of trusted users and applications.
Now, take a look at our new, multitenant, ‘big data’ world: not only is there a dearth of solutions addressing these needs, the stakes are actually amplified on account of the consolidation of data and compounding of risk that comes along with a multitenant system. Without the right architecture and controls in place, your ‘data hub’ would look more like a ‘data dump’ – the ‘data lake’ will inevitably swell into to a data tsunami.
Realizing the Opportunity
We’re extremely excited to see the community acknowledge these challenges and room for improvement in the Hadoop sphere, as evidenced by broader support for the Apache Accumulo project as well the introduction of fine-grained access control into Apache HBase. Embracing “cell-level security” is a great first step, but it’s not nearly enough. At Sqrrl, we’ve built Data-Centric Security into our product, end to end. This includes not only an enhanced version of Accumulo’s cell-level security, but also Enterprise-grade encryption and key management, secure search indexing, seamless integration with Enterprise authentication and authorization systems, and the ability to specify attribute-based access control – a more expressive, advanced form of role-based access control.
We believe that in order to build a secure, scalable, and flexible platform for big data applications, you must have:
- Robust security – Sqrrl’s Data-Centric Security is the most mature approach to Big Data security available today
- Easy ways to acquire data – Sqrrl Enterprise provides custom data input formats, bulk loaders, and a labeling engine that empowers users to dictate and control access to their data
- Easy ways to retrieve data – Sqrrl Enterprise provides higher level APIs for data access. We publish Apache Thrift bindings that allow access from most major programming languages, in addition to an SQL-like query language for manipulating data, SqrrlQL
- Compelling ways to interact with data – Sqrrl Enterprise ships with a number of tutorials and demo applications that allow the user to query and visualize data