MapReduce Basics (Part 1)

Hi there! I’m Shrijith Venkatrama, the founder of Hexmos. Right now, I’m building LiveAPI, a super-convenient tool that simplifies engineering workflows by generating awesome API docs from your code in minutes.

The MapReduce (MR) programming model wa…


This content originally appeared on DEV Community and was authored by Shrijith Venkatramana

Hi there! I'm Shrijith Venkatrama, the founder of Hexmos. Right now, I’m building LiveAPI, a super-convenient tool that simplifies engineering workflows by generating awesome API docs from your code in minutes.

  • The MapReduce (MR) programming model was invented to solve large-scale data processing problems
  • In particular, Google, eBay and some specialized academic communities such as particle physics needed petabyte scale (10^15 bytes) data processing on a daily basis
  • MapReduce is a set of principles for solving large-scaale data processing problems
  • The two main principles in action are: (1) Divide and Conquer (2) Parallelization
  • Divide and Conquer is about - how to decompose a large problem into smaller ones
  • Parallelization is about - how to solve each smaller subproblem in parallel and finally integrate them to get a consolidated solution to the original large problem
  • Most traditional solutions/frameworks for parallelization the developer has to worry about many details: how to decompose the problem, how to distribute compute across cores, machines, how to distribute data efficiently, how to deal with errors/failures in the distributed system, how to synchronize different workers, etc. The older approaches have big cognitive burden and due to that room for errors in implementation.
  • MapReduce provides solutions for handling petabyte data efficiently. Instead of moving data where computation will happen, MR brings computation to data.
  • MapReduce has roots in functional programming. In particular, it is rooted in two functions: map and fold. Given a list of values, the map function is applied to each element to get a transformed value. Map is inherently parallelizable. The next thing is the fold function. A fold function takes two values: initial value (or prev value) and the next value (which is the result of map, usually).

Map & Fold

  • In summary - map is the transformation operation, while fold is the aggregation operation
  • The fold operation requires at least two data elements to be "brought together" in a distributed setup (hence more tricky sometimes)
  • In real-world scenarios, many times fold is not required for all elements; rather fold happens in "groups", leading to higher parallelization.
  • For commutative and associative operations, fold can be made much faster through local aggregation and sensible reordering
  • MR is practically implemented at Google (proprietary) and also open sourced in the Hadoop project.

Next Steps

In the next part of the article series, I will explore:

  1. Mappers and Reducers
  2. Partitioners and Combiners


This content originally appeared on DEV Community and was authored by Shrijith Venkatramana


Print Share Comment Cite Upload Translate Updates
APA

Shrijith Venkatramana | Sciencx (2025-01-31T18:02:39+00:00) MapReduce Basics (Part 1). Retrieved from https://www.scien.cx/2025/01/31/mapreduce-basics-part-1/

MLA
" » MapReduce Basics (Part 1)." Shrijith Venkatramana | Sciencx - Friday January 31, 2025, https://www.scien.cx/2025/01/31/mapreduce-basics-part-1/
HARVARD
Shrijith Venkatramana | Sciencx Friday January 31, 2025 » MapReduce Basics (Part 1)., viewed ,<https://www.scien.cx/2025/01/31/mapreduce-basics-part-1/>
VANCOUVER
Shrijith Venkatramana | Sciencx - » MapReduce Basics (Part 1). [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/01/31/mapreduce-basics-part-1/
CHICAGO
" » MapReduce Basics (Part 1)." Shrijith Venkatramana | Sciencx - Accessed . https://www.scien.cx/2025/01/31/mapreduce-basics-part-1/
IEEE
" » MapReduce Basics (Part 1)." Shrijith Venkatramana | Sciencx [Online]. Available: https://www.scien.cx/2025/01/31/mapreduce-basics-part-1/. [Accessed: ]
rf:citation
» MapReduce Basics (Part 1) | Shrijith Venkatramana | Sciencx | https://www.scien.cx/2025/01/31/mapreduce-basics-part-1/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.