Monthly Archives: January 2014

Documentation on the scripts I used for STIC Cluster at Rice

This is a record for myself to document a series of scripts I wrote as I realized that I quickly forget how the scripts work. Compile and update Hadoop source file Analyze the log files for garbage collection information Record … Continue reading

Posted in Hadoop Research, Tools | Leave a comment

How to login, logout, login as a different user in my IPhone application

It is a bit tricky to switch Facebook users in IOS applications. This is a post documenting my experience

Posted in Facebook API, IPhone App Development | Leave a comment

Meeting notes with Prof. Cox on 1/18/2014

This post summarizes meeting with Prof. Alan Cox on Jan 18th and outlines the next steps for the project to get it ready for publication. It includes the following items Further improvements on Java Memory Footprint measurements and analysis Quantifying … Continue reading

Posted in Hadoop Research, HJ-Hadoop Improvements, Java | Leave a comment

Presentation Feedback from ACM Student Research Competition at SPLASH 13

This is a post recording the feedback I got from participating the ACM student research competition at SPLASH 13. I won a third place for undergraduate category. I gave a 6 minute presentation on HJ-Hadoop during the competition. The feedback … Continue reading

Posted in Presentation | Leave a comment

Classifying the MapReduce Data Analytics Applications (Part I)

This is the first part of a series of posts on analyzing the performance characteristics of common MapReduce data analytics applications. Most of the applications are chosen from Apache Mahout project and PUMA benchmark developed at Purdue University http://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1438&context=ecetr.

Posted in Algorithms, MapReduce Algorithms | Leave a comment

K Nearest Neighbor

The first part of the post introduces the K Nearest Neighbor Algorithm, a popular classification algorithm used in data analytics. The second part describes the MapReduce implementation.

Posted in Algorithms, Hadoop Research, MapReduce Algorithms | 1 Comment

Apache Mahout Clustering Algorithms Implementation

This is a post analyzing the implementation of a series of Clustering Algorithms, including KMeans, FuzzyKmeans in the Apache Hadoop Mahout Package (http://mahout.apache.org). I analyzed the memory footprint and other performance factor of the implementation.

Posted in Algorithms, Mahout, MapReduce Algorithms | 8 Comments

MapReduce KMeans Algorithm Implementation

This is a post introducing KMeans Clustering Algorithm and explaining the MapReduce implementation of the algorithms.

Posted in Algorithms, MapReduce Algorithms | Tagged | 7 Comments

Maven Notes

This is post describing how maven works. The example pom.xml is from Apache Mahout project at http://mahout.apache.org General description Maven is a system for building a project. It manages the dependency of the projects. It can grab dependencies from a central … Continue reading

Posted in Tools | Leave a comment

Meeting Notes with Prof. Vivek Sarkar and Prof. Alan Cox on Hadoop Project

This is a summary of two separate meeting with Prof. Alan Cox and Prof. Vivek Sarkar for feedback on the latest implementation in building a multi-core optimized Hadoop MapReduce runtime. The meetings recorded discussions on Implementation Strategy for the multicore … Continue reading

Posted in Hadoop Research, HJ-Hadoop Improvements | Leave a comment