Posted by & filed under Content - Highlights and Reviews, Programming & Development.

codeA guest post by Kristen Hardwick, who has worked with several different parallel paradigms – including Grid, Cluster, and Cloud. She currently works at Spry where her focus is on designing and developing Big Data analytics for the Hadoop ecosystem. Kristen holds both a Bachelor of Science degree and a Master’s degree in Computer Science from Clemson University, with an emphasis on Parallel Computing.

Apache Giraph is a component in the Hadoop ecosystem that provides iterative graph processing on top of MapReduce or YARN. It is the open source answer to Google’s Pregel, and has been around since early 2012. The tool has a very active developer community that contributes to the code base. However, since the emphasis is on making Giraph as stable and robust as possible, there are some gaps in the documentation about how to use the tool – specifically in the realm of interpreting error messages.

If you haven’t seen them already, take a look at the Introduction to Apache Giraph and Understanding an Apache Giraph Application blog posts. This post will contribute information on some of the most common error messages that might be encountered when running the Giraph examples.

Class Not Found Exceptions

In order to avoid these types of errors:

Ensure that any necessary JARs from Giraph are being located on the classpath. One way to do this is to copy the JAR files from giraph-core/target and giraph-examples/target into the Hadoop lib folder with the following commands:

ZooKeeper Null Pointer Exceptions

At this point, Giraph requires ZooKeeper (configured externally). If the ZooKeeper information is not passed in as an argument when the example is launched, an exception like the following one will be thrown:

In order to avoid that issue, be sure to specify the ZooKeeper address on the command line in the following way:

Processing Hangs at “Wait To Finish ..”

When setting up the application, Giraph will attempt to launch all of the containers it needs at once. If the system is unable to accommodate the container requests, Giraph will sit at the “Wait To Finish” stage until enough resources are free. In an environment where there will never be enough resources to accommodate the requests, the application will appear to hang at this stage.

In order to address this situation, increase the memory specified in the following configuration properties in Ambari, or similar, until the containers are able to start up successfully:

“Unable to delete file /tmp/giraph-conf.xml”

Even if the hadoop.tmp.dir property is set to something other than the default in core-site.xml, Giraph will attempt to write its temporary files into the /tmp directory. If the permissions are not set appropriately, this will result in the following error:

In order to fix this problem, make sure that the group permissions are set so that the user Giraph is operating as, has the correct permissions in the local file system. In most cases, this will be “yarn,” but the username is printed in the logs in this message:

To ensure that the correct file permissions are set on the /tmp folder, perform the following commands where <groupID> is the group that has ownership of the local tmp folder:

Conclusion

This troubleshooting guide will help you if you run into any of these error messages that can occur when using Apache Giraph.

Look below for some great Big Data books from Safari Books Online.

Not a subscriber? Sign up for a free trial.

Safari Books Online has the content you need

Hadoop Real-World Solutions Cookbook provides in depth explanations and code examples. The book covers (un)loading to and from HDFS, graph analytics with Giraph, batch data analysis using Hive, Pig, and MapReduce, machine learning approaches with Mahout, debugging and troubleshooting MapReduce, and columnar storage and retrieval of structured data using Apache Accumulo.
Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2 is written by YARN project founder Arun Murthy and project lead Vinod Kumar Vavilapalli and demonstrates how YARN increases scalability and cluster utilization, enables new programming models and services, and opens new options beyond Java and batch processing. They walk you through the entire YARN project lifecycle, from installation through deployment.
Professional Hadoop Solutions is a practical, detailed guide to building and implementing those solutions, with code-level instruction in the popular Wrox tradition. It covers storing data with HDFS and Hbase, processing data with MapReduce, and automating data processing with Oozie. Hadoop security, running Hadoop with Amazon Web Services, best practices, and automating Hadoop processes in real time are also covered in depth.

Tags: Apache Giraph, Class not found, error messages, Giraph, Troubleshooting, zookeeper,

Comments are closed.