Sqrrl Blog

Jan 3, 2014 4:04:00 AM

When To Use Sqrrl vs. SQL-on-Hadoop Tools?

Over the last year or so, a large number of SQL-on-Hadoop tools have come to the market. Hadapt has developed a nice taxonomy of these tools and they include Hive, Stinger, Hadapt, Impala, Hawq, and Drill.

Occasionally, Sqrrl Enterprise gets compared to these tools, because Sqrrl Enterprise has the capabilities to do low latency search and queries over data held in Accumulo and HDFS. However, this is typically not an appropriate comparison because Sqrrl Enterprise is an "operational database" and not a SQL-on-Hadoop tool.

Gartner defines an operational database as "including both relational and non-relational databases that are suitable for a broad range of enterprise-level transactional applications." Operational databases are designed to ingest data at very fast rates, write (or update/delete) them to the database, and enable large numbers of concurrent users to query the database to retrieve specific records.

SQL-on-Hadoop tools are more closely aligned to traditional Analytical Databases (i.e., OLAP) or Enterprise Data Warehouse (EDW) tools. These tools are designed to support ad hoc queries, query-side aggregations, and joins on data. Analytical databases have lower requirements to support high ingest rates, large numbers of concurrent users, and ACID-style transactions.

More important than the technological differences, operational databases support different types of uses cases than SQL-on-Hadoop tools. Operational NoSQL databases do not support ad hoc joins, so they are not typically used as platforms for traditional business intelligence tools that require query-side aggregations of data. Instead, operational NoSQL databases are typically used to power interactive and highly scalable web applications utilized by large numbers of concurrent users and where the types of queries can be reasonably defined ahead of time and/or the searches are focused on returning specific records instead of aggregations of data.

Despite these clear differences between operational databases and SQL-on-Hadoop tools, it is important to note that Sqrrl Enterprise does have some analytic capabilities. These analytic capabilities can be grouped into two categories:

  • Ingest-Side Aggregations: Sqrrl Enterprise can utilize Accumulo's Iterator Framework to keep counts, metrics, and dashboards up to date as new data streams into the system.
  • Hadoop Integrations. Sqrrl Enterprise has integrations with MapReduce and Pig, so that users can utilize various higher latency, batch-oriented Hadoop analytical tools, such as Mahout for machine learning or Hive for ad hoc aggregations. These analytics can be run from the same Hadoop cluster that also supports Sqrrl Enterprise.

In this sense, Sqrrl Enterprise is an operational database, but also provides some analytical capabilities. Depending on your use case, you may be able to use only Sqrrl Enterprise or you may want to use a SQL-on-Hadoop tool in conjunction with Sqrrl Enterprise.

To learn more about Sqrrl Enterprise, you can request a live demo here:

Request a Demo

Topics: NoSQL, Big Data, Hadoop, Blog Post, Sqrrl Enterprise