Sqrrl Blog

May 31, 2013 3:39:26 PM

Sqrrl Named a Hot Tech Startup by CRN

We are excited that CRN has named us a “Hot Tech Startup for May.” Check out the post here:

http://www.crn.com/slide-shows/channel-programs/240155799/10-hot-tech-startups-for-may.htm?pgno=6

Read More

Topics: Blog Post

May 29, 2013 2:25:00 PM

Is Accumulo the World's Most Scalable Graph Store?

NSA just released some fascinating new data on Apache Accumulo performance as a massively scalable graph store. The data was presented last week at Carnegie Mellon University, and the abstract of the report reads:

Read More

Topics: Blog Post

May 28, 2013 11:30:59 AM

New Sqrrl Video Series

We are starting a new video series here at Sqrrl in partnership with Wikibon and SiliconAngle. These videos will be whiteboard sessions given by our CTO and Co-Founder, Adam Fuchs.

Read More

Topics: Blog Post

May 7, 2013 10:36:00 PM

How to Choose a NoSQL Database

The world of NoSQL databases is a very noisy (and confusing) space. Matt Aslett at the 451 Group has done an amazing job of cataloging various databases (including NoSQL) in his Database Landscape Map.

To simplify the NoSQL world, lets take a look at the top 3 databases in terms of current popularity and how they compare to Apache Accumulo, which is at the core of our product, Sqrrl Enterprise.

MongoDB: It is a wonderfully easy-to-use document store that many select as a flexible replacement for a SQL database, as it (like all NoSQL databases) does not require pre-defined schemas. However, MongoDB has difficulty scaling to very large datasets (e.g., 100+ TB) and does not natively work with your Hadoop cluster. It also does not possess fine-grained security controls.

Cassandra: This is an excellent choice if your data is too big for MongoDB and you require multi-datacenter replication. Although Cassandra was not originally designed to run natively on your Hadoop cluster, it now has integrations with MapReduce, Pig, and Hive. It does not possess fine-grained security controls.

HBase: HBase natively integrates with Hadoop, and it can handle very large datasets. However, it does not have fine-grained security controls.

Accumulo: Accumulo has an architecture most similar to HBase, which allows it also to natively plug into your Hadoop cluster. It is far more scalable than MongoDB, and with reported cluster sizes in the multiple thousands within the Intelligence Community it is also significantly more scalable than HBase and Cassandra. Accumulo is the only NoSQL database with cell-level security capabilities. Accumulo also has other features that could lead one to choose it over HBase or Cassandra for reasons other than security or scalability. For example, Accumulo has a powerful server-side programming mechanism called Iterators, which provide it with the capability to do a variety of real-time aggregations and analytics.

These high level differences between MongoDB, Cassandra, HBase, and Accumulo are summarized in the decision tree diagram below. Of course, there are a wide variety of more detailed technical differences that will be explored in greater detail in a later post. This decision tree can be summarized with a few simple statements:

  • If you need a quick, simple solution and have “small” Big Data (e.g., a few dozen terabytes), MongoDB may be the answer.
  • If you need cell-level security or multi-petabyte scalability, Accumulo is the right answer.
  • If you have data that is too big for MongoDB and don’t need cell-level security or massive scalability, we would recommend testing HBase, Cassandra, and Accumulo for your specific workloads. Each has their own nuanced advantages and disadvantages.
  • If you don’t need real-time analytics, you are probably on the wrong decision tree and can stick with the Hadoop Distributed File System and batch analytics.

It is worth noting that the NoSQL databases above are all open source databases. Sqrrl Enterprise builds upon Accumulo and adds a number of additional features to Accumulo including streaming ingest, JSON, encryption, identity management integrations, full-text search, SQL queries, graph search, and statistics. We believe that these features set Sqrrl Enterprise apart from other Big Data platforms.

Read More

Topics: Blog Post

May 6, 2013 9:50:06 AM

The History of Sqrrl

Interested in the history of Sqrrl? Check out this podcast with Ely Kahn from Sqrrl, Luke Fretwell from FedScoop, and Gunnar Hellekson (Red Hat’s Public Sector Chief Technology Strategist).

http://fedscoop.com/fedoss-sqrrl-brings-open-source-big-data/

Read More

Topics: Blog Post

May 2, 2013 5:33:35 PM

CSO Article on Securing Big Data Infrastructure

CSO discusses here the difficulty in bolting on security to Big Data infrastructure. Sqrrl Enterprise is the only Big Data platform with security baked in from the start.

Read More

Topics: Blog Post