Best unofficial Apache Server developers community
Username
Forgot password?
Sign in with Twitter account
Sign in with Facebook account

Sort and shuffle optimization in Hadoop MapReduce

0

85 views

I'm looking for a research/implementation based project on Hadoop and I came across the list posted on the wiki page - http://wiki.apache.org/hadoop/ProjectSuggestions. But, this page was last updated in September, 2009. So, I'm not sure if some of these ideas have already been implemented or not. I was particularly interested in "Sort and Shuffle optimization in the MR framework" which talks about "combining the results of several maps on rack or node before the shuffle. This can reduce seek work and intermediate storage".

Has anyone tried this before? Is this implemented in the current version of Hadoop?

asked February 22, 2011 4:53 pm CST
posted via StackOverflow

3 Answers

0
 

The project description is aimed "optimization". This feature is already present in the current Hadoop-MapReduce and it can probably run in a lot less time. Sounds like a valuable enhancement to me.

answered February 23, 2011 2:17 am CST
0
 

I think it is very challenging task. In my understanding the idea is to make a computation tree instead of "flat" map-reduce.The good example of it is Google's Dremel engine (called BigQuey now). I would suggest to read this paper: http://sergey.melnix.com/pub/melnik_VLDB10.pdf
If you interesting in this kind of architecture - you can also take a look on the open source clone of this technology - Open Dremel. http://code.google.com/p/dremel/

answered February 27, 2011 2:24 am CST
0
 

There is the combiner functionality (as described under the "Combine" section of http://wiki.apache.org/hadoop/HadoopMapReduce), which is more-or-less an in-memory shuffle. But I believe that the combiner only aggregates key-value pairs for a single map job, not all the pairs for a given node or rack.

answered February 27, 2011 2:24 am CST

Your answer

Join with account you already have


Sign in with Twitter account
Sign in with Facebook account
Sign in with Google Friend Connect

Preview
Similar questions
Iterative MapReduce
December 27, 2010
Query Optimization
January 7, 2011
Query optimization
January 15, 2011