shuffle phase
play

Shuffle Phase Executed only in the case of one or more reducers - PowerPoint PPT Presentation

Shuffle Phase Executed only in the case of one or more reducers Transfers data between the mappers and reducers Groups records by their keys to ensure local processing in the reduce phase 01/23/2018 15 Shuffle Phase Map 1 Map 2 Map 3


  1. Shuffle Phase Executed only in the case of one or more reducers Transfers data between the mappers and reducers Groups records by their keys to ensure local processing in the reduce phase 01/23/2018 15

  2. Shuffle Phase … Map 1 Map 2 Map 3 Map M … Reduce 1 Reduce 2 Reduce N 01/23/2018 16

  3. Shuffle Phase (Map-side) Map i k A k v k v k v k v 0 0 0 0 k v k v k v k v k v k v k v k v k v k v Input Split Partition k v k v k v k v map k v 1 k v k v k v k v 1 1 k v k v k v k v 1 k v k v k v k v k v k v k v k v N-1 k v k v k v k v N-1 N-1 N-1 k v k Z k v k v k v … Reduce 1 Reduce 2 Reduce N 01/23/2018 17

  4. Shuffle Phase (Reduce-side) k v … k v Map 1 Map 2 Map 3 Map M k v Reduce j Copy part 1 part 2 part 3 part M Sort k v k v k v Reduce k v k v k v k v 01/23/2018 18

  5. Reduce Phase Apply the reduce function to each group of similar keys k 1 v reduce k 1 v k 2 v reduce k 2 v k 3 v k 3 v reduce k 3 v output reduce k … v k N v k N v k N v reduce k N v k N v 01/23/2018 19

  6. Output Writing Materializes the final output to disk All results are from one process (mapper/reducer) are stored in a subdirectory An OutputFormat is used to Create any files in the output directory Write the output records one-by-one to the output Merge the results from all the tasks (if needed) While the output writing runs in parallel, the final commit step runs on a single machine 01/23/2018 20

  7. MapReduce Examples Input: A log file Filter Aggregation Conversion 01/23/2018 21

  8. Advanced Issues Map failures Reduce failures Straggler problem Custom keys and values Efficient sorting on serialized data Pipeline MapReduce jobs 01/23/2018 22

Recommend


More recommend