Monthly Archives: January 2014

Documentation on the scripts I used for STIC Cluster at Rice

This is a record for myself to document a series of scripts I wrote as I realized that I quickly forget how the scripts work. Compile and update Hadoop source file Analyze the log files for garbage collection information Record … Continue reading

Posted in Hadoop Research, Tools | Leave a comment

How to login, logout, login as a different user in my IPhone application

It is a bit tricky to switch Facebook users in IOS applications. This is a post documenting my experience

Posted in Facebook API, IPhone App Development | Leave a comment

Meeting notes with Prof. Cox on 1/18/2014

This post summarizes meeting with Prof. Alan Cox on Jan 18th and outlines the next steps for the project to get it ready for publication. It includes the following items Further improvements on Java Memory Footprint measurements and analysis Quantifying … Continue reading

Posted in Hadoop Research, HJ-Hadoop Improvements, Java | Leave a comment

Presentation Feedback from ACM Student Research Competition at SPLASH 13

This is a post recording the feedback I got from participating the ACM student research competition at SPLASH 13. I won a third place for undergraduate category. I gave a 6 minute presentation on HJ-Hadoop during the competition. The feedback … Continue reading

Posted in Presentation | Leave a comment

Classifying the MapReduce Data Analytics Applications (Part I)

This is the first part of a series of posts on analyzing the performance characteristics of common MapReduce data analytics applications. Most of the applications are chosen from Apache Mahout project and PUMA benchmark developed at Purdue University http://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1438&context=ecetr.

Posted in Algorithms, MapReduce Algorithms | Leave a comment

K Nearest Neighbor

The first part of the post introduces the K Nearest Neighbor Algorithm, a popular classification algorithm used in data analytics. The second part describes the MapReduce implementation.

Posted in Algorithms, Hadoop Research, MapReduce Algorithms | 1 Comment

Apache Mahout Clustering Algorithms Implementation

This is a post analyzing the implementation of a series of Clustering Algorithms, including KMeans, FuzzyKmeans in the Apache Hadoop Mahout Package (http://mahout.apache.org). I analyzed the memory footprint and other performance factor of the implementation.

Posted in Algorithms, Mahout, MapReduce Algorithms | 8 Comments