CS 327E Class 6 October 14, 2019 1) PTransforms such as Pardo mutate - PowerPoint PPT Presentation

CS 327E Class 6 October 14, 2019

1) PTransforms such as Pardo mutate their input elements. A. True B. False

2) What kind of object does the ParDo transform expect? A. A DoFn subclass B. A DoFn super class C. A DoFn abstract class

3) Does ParDo support random access to PCollection elements? For example, is the highlighted code allowed? class ComputeWordLengthFn (beam . DoFn): A. Yes def process (self, element): B. No element0 = words[0] if len(element0) >= len(element): return [element0] word_lengths = words | beam . ParDo(ComputeWordLengthFn())

4) ParDo resembles which SQL operation? A. FROM clause B. WHERE clause C. ORDER BY clause D. JOIN clause

5) CoGroupByKey resembles which SQL operation? A. FROM clause B. WHERE clause C. ORDER BY clause D. JOIN clause

Recall: ParDo Transform ● Maps 1 input element to (1, 0, many) output elements ● Invokes a user-specified function on each of the elements of the input PCollection ● User code is implemented as a subclass of DoFn with a process(self, element) method ● Input elements are processed independently and in parallel ● Output elements are bundled into a new PCollection ● Typical usage: filtering, formatting, extracting parts of data, performing computations on data elements

GroupByKey Transform ● Takes a PCollection as input where each element is a (key, value) pair ● Groups the values by unique key ● Produces a PCollection as output where each element is a (key, list(value)) pair ● Resembles GROUP BY in SQL ('Nicole', ['100 Avenue A', '200 Avenue B']) ('Nicole', '100 Avenue A') ('Erik', '21 Guadalupe') GroupByKey ('Erik', '21 Guadalupe') ('Sameer', '7071 Hamilton') ('Nicole', '200 Avenue B') ('Sameer', '7071 Hamilton')

Demo: Student_single.py git pull origin master

Hands-on Exercise 1 Run Student_single.py

iClicker Question 1 How many records are in the resulting Student table? A. 0 B. 12 C. 15

Demo: Student_cluster.py Converting to Dataflow pipeline

Hands-on Exercise 2 Create Teacher_cluster.py from Teacher_single.py Run Teacher_cluster.py on Dataflow

iClicker Question 2 How many nodes are in the job’s execution graph? A. 3 B. 4 C. 9

ParDo Side Inputs ● A side input is an optional input passed to DoFn ● Passed as an extra argument to process method: process(self, element, side_input1) ● Side inputs can be ordinary values or entire PCollections ● DoFn reads side inputs while processing an individual element ● Multiple side inputs per DoFn are supported: process(self, element, side_input1, side_input2, xxxxxxxxside_input n )

Demo: Takes_single.py Show Side Inputs

Flatten Transform ● Takes a list of PCollections as input ● Produces a single PCollection as output ● Results contain all the elements from the input PCollections ● Note: Input PCollections must have matching schemas a_pcoll = p | 'Read File 1' >> ReadFromText('oscars_data_archive.tsv') b_pcoll = p | 'Read File 2' >> ReadFromText('oscars_data_2019.tsv') # Union the two PCollections c_pcoll = (a_pcoll, b_pcoll) | 'Merge PCollections' >> beam.Flatten()

CoGroupByKey Transform ● Takes two or more PCollections as input ● Every element in the input is a (key, value) pair ● Groups values from all input PCollections by common key ● Produces a PCollection as output where each element is a (key, value) pair ● Output value is a list of dictionaries containing all data associated with unique key ● Analogous to the FULL OUTER JOIN in SQL

CoGroupByKey Transform q1 = 'SELECT sid, cno, grade FROM college_modeled.Takes' q2 = 'SELECT cno, cname FROM college_modeled.Class' takes_pcoll = p | 'Run Q1' >> beam.io.Read(beam.io.BigQuerySource( query=q1)) class_pcoll = p | 'Run Q2' >> beam.io.Read(beam.io.BigQuerySource( query=q2)) takes_tuple = takes_pcoll | 'Takes Tuple' >> beam.ParDo(MakeTuple()) class_tuple = class_pcoll | 'Class Tuple' >> beam.ParDo(MakeTuple()) joined_pcoll = (takes_tuple, class_tuple) | 'Join' >> beam.CoGroupByKey()

Milestone 6 1) Requirements and rubric: assignment sheet 2) Debugging assistance: sign-up sheet

CS 327E Class 6 October 14, 2019 1) PTransforms such as Pardo mutate - PowerPoint PPT Presentation

CS 327E Class 6 October 14, 2019 1) PTransforms such as Pardo mutate their input elements. A. True B. False 2) What kind of object does the ParDo transform expect? A. A DoFn subclass B. A DoFn super class C. A DoFn abstract class 3) Does

CS 327E Class 7 October 21, 2019 Announcements Midterm is next class from 6pm - 7:30pm

CS 327E Class 9 November 19, 2018 Announcements What to expect from the next 3 milestones

CS 327E Class 10 November 26, 2018 Announcements Scheduling your group presentation for

CS 327E Class 11 November 25, 2019 Announcements Milestone 12: What: Group Presentations.

CS 327E Class 9 November 11, 2019 Grading update What to expect from remaining

CS 327E Class 9 April 8, 2019 No Quiz Today :) What to expect from upcoming Milestones:

CS 327E Class 12 December 2, 2019 Announcements CIS Survey: Your voice matters .

CS 327E Class 4 Sept 18, 2020 Announcements Rubric clarification Test 1 details Exam

CS 327E Class 7 Oct 16, 2020 Review session for Test 2 Test 2 details Exam rules:

CS 327E Class 7 November 5, 2018 Check your GCP Credits :) iClicker Question Are you running

CS 327E Class 2 September 16, 2019 1) Which is not a Data Manipulation Language construct? a)

CS 327E Class 8 Oct 30, 2020 Final Project Components Choose a primary and secondary

CS 327E Class 4 September 30, 2019 1) What type of relationship do we have between the Actor and

CS 327E Class 10 November 18, 2019 1) What is meant by the following usage pattern? A. The

CS 327E Class 10 April 15, 2019 1) What is meant by the following usage pattern? A. The

CS 327E Class 8 November 4, 2019 1) Does Q1 contain a subquery? Q1: SELECT * FROM Lineup WHERE

CSE 140 Discussion Session 1 Decoder A digital module that converts a binary address to the

Eclat: Automatic Generation and Classification of Test Inputs Carlos Pacheco and Michael Ernst

Keras input and dense layers ADVAN CED DEEP LEARN IN G W ITH K ERAS Zach Deane-Mayer Data

Transfer Request Broker: Resolving Input-Output Choice Oliver Faust, Bernhard H.C. Sputh,

CPSC 121: Models of Computation Trace the operation of a DFA (deterministic finite-state

Encoders Many 1-bit inputs, 1 asserted at a time. Output

Aestimo : A Feedback-Directed Optimization Evaluation Tool A Compiler Perspective on Input

Discussion of The I Theory of Money by M. Brunnermeier and Y. Sannikov Stavros Panageas 1 1

Sambuz

Useful Links

Newsletter

Mail Us