The link to the paper
Abstract: Original Hadoop system can only start a reduce task after a map task has finished. (Note: Not the entire map phase). HOP (Hadoop Online Prototype) modified the hadoop system that allows pipelining between map and reduce tasks. In addition, the HOP supports two novel features, Online Aggregation and Continuous Queries.
Introduction: Hadoop MapReduce was originally designed for batch processing jobs. Pipelining Hadoop can enable the following features
Online Aggregation: Reducers can generate and refine their results throughout the execution. Initial estimates can be produced magnitude faster than the final result
Continuous Queries: Analyzing data as they arrive.
- Map task
- map: starts with reading the input split and apply map function to each key/val pair.
- commit: after the map function has been applied to each record, a commit phase registers the final output with the Task Tracker. The output is arranged with regard to the partition number. The records in each petition are sorted.
- Reduce task
- shuffle: fetches input with the right partition number. The reduce task does this by issuing HTTP request a number of Task Trackers. The Task Trackers then relay the output of the map task to Task Trackers responsible for a reduce task.
- sort: group records with the same key together. It does a merge sort over the output of the map phase (since the output of map phase is already sorted)
- reduce: applies user defined reduce function to each record
- Always write to disk design
- The output of both map and reduce is written to HDFS before they can be used. It is good for fault tolerance, but disk i/o is an expensive operation.
- Naive Implementation (Pipelining within a single job)
- Each reduce task talks to all of the map tasks for their output.
- Full implementation (Pipelining within a single job)
- Eager pipelining: reducer reads in partial output of the map tasks
- Dedicated IO thread to ensure full CPU utilization
- Write the output of map tasks to the disk if a reduce task has not yet been scheduled
- Pipelining between jobs
- The reduce tasks of job 1 can optionally pipeline output to the map tasks of the next job (job 2). HOP can not, however, overlap the reduce phase of job 1 with the map phase of job2, because the output of job1 relies the full completion of map tasks in job 1.
- Single Job: Apply reduce function to the input received by the reduce tasks so far. (Taking “snap shots”). Snap shots are computed periodically. Applications can update the snap shots by polling the HDFS in a predictable location.
- Multiple Job: Use snap shots from previous jobs. NOT SURE…
- The Reduce function must periodically invoke the map tasks to get the output from them. Currently the connection between map tasks and reduce tasks are manually configured. (Not elastic, subject to future work)
- Overall, the time it took to finish the execution is not very different from original Hadoop MapReduce (Blocking). The difference, however, mostly appears in that the reduce phase reaches a higher completion rate much faster.
- The future directions also mentioned to produce a light weight event driven system like SEDA. (It was a work first suggested by Prof. Cox)