Meeting notes with Prof. Mellor-Crummey on Feb 17th

This is a summary for a meeting with Prof. Mellor-Crummey on the latest progress, TODO tasks and feedback for the HJ-Hadoop research.

The meeting covered three major parts

  1. Recap on the two approaches for the multi-core parallelization for Hadoop, ParMapper and ParJVM approaches.
  2. The steps I took to analyze the performance characteristics of an application
  3. Analyzing the heap memory footprint using VisualVM

  1. The first part is just a short recap of how I implemented the two approaches. TODO:Prof. Mellor-Crummey suggested an improved scheduling algorithm using guided scheduling that adjusts the chunk size as less work is left. The source code for the scheduling algorithm can be found at
  2. Using two par mapper is like Charm ++ paper’s approach of virtualizing MPI processes. TODO: I really should read up on Charm++ paper. 
  3. I briefly tracked down the class path for the cluster data
      1. Vectorized Object -> string, sparseDobleVector(1:1)
      2. ISparseArray<Double>, int
      3. LinearSparseArray->int[]dices, Vector<T> data
    1. The visual vm for the driver I wrote that analyze the different parts of the cluster centroids data
      1. cluster-heap-dum-visualvm
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s