Hadoop Compute Server Implementation Notes – JVMId Assignment

This post follows up on the previous post on the implementation of Hadoop Compute Server. Now that we can establish a connection from the TaskTracker to the ChildJVM and pass in certain task specific arguments.

The next step that we are trying to achieve is to allow multiple tasks assigned to different JVMIds to be assigned to the same JVM. To accomplish this, we need to know (1) where the JVMId is assigned (2) Is it good enough to only change the runChild part of the code. (Limit the changes to one single function in the TaskTracker process)

1. Where JVMId is assigned?

To do this, we could trace to the place where TaskTracker communicates with the job tracker and get back a task. Once the task is received, the TaskTracker would seek to launch a new JVM, and hopefully assigning the JVMId in the process.

I tracked down the process and realized that

  1. JVMId is assigned at the time a new JVM is launched (spawnNewJVM)
  2. The JVMId is associated with a JVM Runner, it is generated in the construction of a JVMRunner
<br /><br />private void spawnNewJvm(JobID jobId, JvmEnv env,<br /><br />TaskRunner t) {<br /><br />JvmRunner jvmRunner = new JvmRunner(env, jobId, t.getTask());<br /><br />jvmIdToRunner.put(jvmRunner.jvmId, jvmRunner);<br /><br />//spawn the JVM in a new thread. Note that there will be very little<br /><br />//extra overhead of launching the new thread for a new JVM since<br /><br />//most of the cost is involved in launching the process. Moreover,<br /><br />//since we are going to be using the JVM for running many tasks,<br /><br />//the thread launch cost becomes trivial when amortized over all<br /><br />//tasks. Doing it this way also keeps code simple.<br /><br />jvmRunner.setDaemon(true);<br /><br />jvmRunner.setName("JVM Runner " + jvmRunner.jvmId + " spawned.");<br /><br />setRunningTaskForJvm(jvmRunner.jvmId, t);<br /><br />LOG.info(jvmRunner.getName());<br /><br />jvmRunner.start();<br /><br />}<br /><br />

The actual constructor of the JVMRunner

<br /><br />public JvmRunner(JvmEnv env, JobID jobId, Task firstTask) {<br /><br />this.env = env;<br /><br />this.jvmId = new JVMId(jobId, isMap, rand.nextInt());<br /><br />this.numTasksToRun = env.conf.getNumTasksToExecutePerJvm();<br /><br />this.firstTask = firstTask;<br /><br />LOG.info("In JvmRunner constructed JVM ID: " + jvmId);<br /><br />}<br /><br />

The third line generates the JVMId. As a result, I believe it is possible that we just change the runChild method to use the new JVMId.

This entry was posted in Class, Hadoop Research, HJ-Hadoop Improvements. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s