MapReduce Wiki – Software Framework
Description of MapReduce Wiki
MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster.
A MapReduce framework is usually composed of 3 operations :
- Map : each worker node applies the map function to the local data, and writes the output to a temporary storage. A master node ensures that only one copy of redundant input data is processed.
- Shuffle : worker nodes redistribute data based on the output keys (produced by the map function), such that all data belonging to one key is located on the same worker node.
- Reduce: worker nodes now process each group of output data, per key, in parallel.
- MapReduce is useful in a wide range of applications, including distributed pattern-based searching, distributed sorting, web link-graph reversal, Singular Value Decomposition, document clustering and machine learning.
- At Google, MapReduce was used to completely regenerate Google’s index of the World Wide Web. It replaced the old ad hoc programs that updated the index and ran the various analyses.
- MapReduce model has been adapted to several computing environments like multi-core and many-core systems, desktop grids, multi-cluster, mobile environments, and high-performance computing environments.