Achieving the first major milestone in Hadoop Compute Server Implementation

Today, I got multiple map tasks running in the same JVM. It is the first major milestone to building a memory efficient, intelligent Compute Server on each Hadoop Node. The Compute Server could perform optimizations for multi-core systems. There are more steps to be done before we can run large scale experiments. But getting multiple map tasks to run successfully in the same JVM is a major milestone! Here is a summary of what I did and what challenges I faced in the last phase of the implementation.

The key steps to getting multiple virtual JVM threads running in the same real JVM. To do that, I created a virtualJVM object that contains static variables associated with each virtual JVM thread, this includes the taskID and other flags


package org.apache.hadoop.mapred;
/**

* a utility class to encapusulate virtual JVM specific information

*/

public class VirtualJVMContext {

public TaskAttemptID taskid = null;

public boolean currentJobSegmented = true;

public boolean isCleanup;

public String cwd;

}

One thing I learned that was very helpful is to do everything in a incremental development fashion. I changed one variable at a time and make sure everything works with a single map slot. (In a single map-slot setting, everything is should work just the same as the original Child.java file). This helped debug a lot of issues early on.

Once I got the abstraction for virtual JVM worker thread working, I started to increase the number of mapper slots. I increased it to 2 and it crashed immediately because tasks failed. The key to the problems is in the runChild method. The runChild is in a separate thread, as long as the JVM is running, the call to runChild should not return. If it returns, the TaskTracker would assume the JVM exited. Since we turned on the JVM unlimited reuse, it should not exit at all. The following segment of code is critical in JvmManager.java

</pre>
//check if the port is used

if(! isTaken(computeServerPortNumber)){

System.out.println("port: " + computeServerPortNumber + " is available");

&nbsp;

//if not used, then the JVM has not being launched, start a new JVM and pass in task info

t.start();

exitCode = tracker.getTaskController().launchTask(user,

jvmId.jobId.toString(), taskAttemptIdStr, env.setup,

env.vargs, env.workDir, env.stdout.toString(),

env.stderr.toString());

System.out.println("JvmManager: finished lauching the childJVM");

System.out.println("JvvmManger: exit code is " + exitCode);

}else{

System.out.println("port: " + computeServerPortNumber + " is not available");
//if used, then the JVM has already being launched, just pass in task info

t.start();

t.join();

System.out.println("JvvmManger: exit code is " + exitCode);

//the infinite loop keeps the task tracker think the second JVM is still alive

while(true);

}

The while true loop is used to keep the runChild process alive, faking a running JVM illusion to the TaskTracker.

Notes:

  • Checking the task tracker log is the single most effective debug approach in developing complex extensions to the Hadoop MapReduce runtime. The only reason that I realized that I need the “while true” loop is by looking at the log and the log shows that one JVM launched and exited immediately. I realized that this is because the runChild process quoted prematurely. As a result, the task assigned to the terminated JVM failed.
  • When I was running 8 map tasks in one JVM, it ran out of heap space immediately. I found that by looking through the log and see the error message “heap space …”.
  • As a conclusion, always look at the logs first and understand the “root cause” of the problem before quickly arriving at a conclusion. This is especially important in building large and complex systems.

Next Steps:

Following the same incremental development + testing technique I am employing, I will test run some applications that I have, hashJoin, KNN, KMeans using the new implementation and make sure they work. (Always always make sure the existing changes work even if there are more changes to come. ).

The new ComputeServer implementation should be no worse than the original version. To reverse back to the original version, we just need to replace JvmManager and Child in org.apache.hadoop.mapred package and use ant clean, ant commands to recompile the package.

After that, I will start working on further changing the ComputeServer implementation so that multiple Map Tasks can share readonly, static data structures. This way, we could run all the benchmarks again and see an improved memory efficiency.

Advertisements
This entry was posted in Hadoop Research, HJ-Hadoop Improvements. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s