This is a summary for a meeting with Prof. Mellor-Crummey on the latest progress, TODO tasks and feedback for the HJ-Hadoop research.
The meeting covered three major parts
- Recap on the two approaches for the multi-core parallelization for Hadoop, ParMapper and ParJVM approaches.
- The steps I took to analyze the performance characteristics of an application
- Analyzing the heap memory footprint using VisualVM
- The first part is just a short recap of how I implemented the two approaches. TODO:Prof. Mellor-Crummey suggested an improved scheduling algorithm using guided scheduling that adjusts the chunk size as less work is left. The source code for the scheduling algorithm can be found at https://code.google.com/p/ompt-intel-openmp/source/browse/itt/libomp_oss/src/kmp_sched.cpp
- Using two par mapper is like Charm ++ paper’s approach of virtualizing MPI processes. TODO: I really should read up on Charm++ paper.
- I briefly tracked down the class path for the cluster data
- Vectorized Object -> string, sparseDobleVector(1:1)
- ISparseArray<Double>, int
- LinearSparseArray->intdices, Vector<T> data
- The visual vm for the driver I wrote that analyze the different parts of the cluster centroids data