Quick Accumulo Install
Since joining sqrrl, I’ve been introducing many people to Apache Accumulo. While everyone is eager to take advantage of Accumulo’s unique technical strengths, inevitably their first question is “How do I get started?” Even though all of the steps are documented, it can be intimidating — especially if you haven’t even used Hadoop before.
So, I assembled this guide to getting Accumulo running quickly on a single machine. Most of these steps are documented in the Hadoop Single Node Setup Guide, the ZooKeeper Getting Started Guide, and the README installed with Accumulo. Take a look at these three documents if you would like to learn more about the steps below.
If you have questions or suggestions, contact me on twitter at @jbpopp. Our team at sqrrl is always working on making it even easier to get started with Accumulo, so we’d love to hear your feedback. If this manual walk-through isn’t your bag, you may want to download our pre-canned Accumulo 1.4.2 Virtualbox VM or check out sqrrl’s Accumulo setup shell script that accomplishes the same steps.
-Ben Popp, Director of Engineering at sqrrl
Install single-node Accumulo in minutes
The following instructions will:
- Install Apache Hadoop 1.0.4
- Install Apache ZooKeeper 3.3.6
- Install Apache Accumulo 1.4.2
Pre-requisites:
- This guide assumes you are running Linux.
- Java 1.6.x must be installed and the ‘java’ command must be on the path.
- ssh must be installed and sshd must be running so that the Hadoop scripts will be able to manage various processes. You need to be able to ssh to localhost without using a passphrase. The Hadoop Single Node Setup Guide has directions for this if needed.
Step 1: Install Hadoop 1.0.4
Download hadoop-1.0.4-bin.tar.gz from an Apache mirror and unpack the archive.
In the distribution, edit the conf/hadoop-env.sh file to define JAVA_HOME to be the root of your Java installation.
Even though we’re running on a single node, we’ll install in “Pseudo-Distributed Operation” where each Hadoop daemon runs in a separate Java process. Edit the hadoop configuration files to include the following. Make sure that the parent of the dfs.data.dir and dfs.name.dir is a directory that already exists.
conf/core-site.xml:
<configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration>
conf/hdfs-site.xml:
<configuration> <property> <name>dfs.data.dir</name> <value>/var/lib/hadoop/hdfs/data</value> </property> <property> <name>dfs.name.dir</name> <value>/var/lib/hadoop/hdfs/name</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
conf/mapred-site.xml:
<configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> </configuration>
Format a new distributed filesystem:
$ bin/hadoop namenode -format
Start the hadoop daemons:
$ bin/start-all.sh
View the web interface for the NameNode (http://localhost:50070/) and the JobTracker (http://localhost:50030/) to confirm that the processes are launched.
Step 2: Install Zookeeper 3.3.6
Download zookeeper-3.3.6.tar.gz from an Apache mirror and unpack the archive.
Create a new conf/zoo.cfg file with the following contents. Make sure to choose a valid local path as a value for dataDir.
tickTime=2000 dataDir=/var/lib/zookeeper clientPort=2181 maxClientCnxns=100
Start ZooKeeper with the command
$ bin/zkServer.sh start
Use the ZooKeeper shell to validate that ZooKeeper is running as intended. Start the shell with the command
$ bin/zkCli.sh
Enter “ls /” to see the contents of ZooKeeper (not much at this point), and type “quit” to exit the shell.
Step 3: Install Accumulo 1.4.2
Download accumulo-1.4.2-dist.tar.gz from an Apache mirror and unpack the archive.
Copy the example configuration to set up your accumulo environment. For testing on a single computer, use a fairly small configuration:
$ cp conf/examples/512MB/standalone/* conf
Edit conf/accumulo-env.sh to set your JAVA_HOME, HADOOP_HOME, and ZOOKEEPER_HOME.
Create the Accumulo logs directory at the default location, a ‘logs’ directory inside the Accumulo home directory.
Run “bin/accumulo init” to create the HDFS directory structure and initial ZooKeeper settings. Choose a name and root password for your instance when prompted.
Start Accumulo using the bin/start-all.sh script.
Visit the Accumulo monitor page at http://localhost:50095 to confirm that you’re live!
Use the command “bin/accumulo shell -u root” command to run an accumulo shell as the Accumulo root user. (Use the instance password you just chose above.) Now you have full access to your instance.
Congratulations!
After you get Accumulo up and running, jump into the Accumulo User Manual to learn more, and get involved with the project.