В mapreduce как сортировать промежуточный вывод на основе значений?
How to sort intermediate output based on values in MapReduce ?
Что я уже пробовал:
How to sort intermediate output based on values in MapReduce?
How to sort intermediate output based on values in MapReduce ?
How to sort intermediate output based on values in MapReduce?
"The MapReduce sort the intermediate data(between mapper and reducer phase) by key by default. If we want the data should be sort based on value, then we need secondary sorting. There are 2 approaches to fulfill the same. 1. If reducers will get all the value for a particular key and buffer them all. Then we can do an in-reducers sort based on value. But this is not a good approach reducer will be receiving all the values for the key and there might be a chance that reducer will go with out of memory. But this can work well for the lesser data. 2. The next approach is to create a composite key which is having 2 values, Natural Key, and Natural values, where the natural key will be used for partitioning and value will be used for sorting. This is the best approach as it will not turn out to out of memory error. We will be writing the partitioner code just to make sure that all data with the same key go to the same reducer and data arrives at reducer is grouped by the natural key. "