10/30/17 Today’s Objec3ves • Phil’s Talk Review • Amazon Web Services Ø Elas3c Map Reduce (EMR) Oct 30, 2017 Sprenkle - CSCI325 1 Phil’s Talk Oct 30, 2017 Sprenkle - CSCI325 2 1
10/30/17 AMAZON WEB SERVICES (AWS) Oct 30, 2017 Sprenkle - CSCI325 3 What is Amazon Web Services? • A collec3on of remote compu3ng services that together make up a cloud compu3ng plaTorm Ø offered over the Internet by Amazon.com • Grew out of Amazon’s need to rapidly provision and configure machines of standard configura3ons for its own business. http://aws.amazon.com Oct 30, 2017 Sprenkle - CSCI325 4 2
10/30/17 Amazon Web Services Architecture • AWS is located in 16 geographical Regions Ø Region: Geographic loca3on, price, laws, network locality. Ø wholly contained within a single country and all of its data and services stay within the designated Region. • Each region has mul3ple Availability Zones Ø dis3nct data centers providing AWS services Ø isolated from each other to prevent outages from spreading between Zones Ø 44 availability zones https://aws.amazon.com/about-aws/global-infrastructure/ Oct 30, 2017 Sprenkle - CSCI325 5 Terminology • Instance: One running virtual machine. • Instance Type: hardware configura3on - cores, memory, disk. • Instance Store Volume: Temporary disk associated with instance. • Image (AMI): Stored bits which can be turned into instances. • Key Pair: Creden3als used to access VM from command line. Oct 30, 2017 Sprenkle - CSCI325 6 3
10/30/17 The Amazon Web Services Universe Cross Service Features Management Pla>orm Services Interface Infrastructure Services Oct 30, 2017 Sprenkle - CSCI325 7 Management Interface http://aws.amazon.com/console/ Management Console CLI http://aws.amazon.com/tools/ Management SDKs, IDEs Interface SDK http://aws.amazon.com/cli/ Command-line interface Web Oct 30, 2017 Sprenkle - CSCI325 8 4
10/30/17 Infrastructure Services http://aws.amazon.com/ec2/ EC2 VPC Infrastructure http://aws.amazon.com/vpc/ Services S3 http://aws.amazon.com/s3/ EBS http://aws.amazon.com/ebs/ Oct 30, 2017 Sprenkle - CSCI325 9 PlaTorm Services EMR hcps://aws.amazon.com/emr/ RDS Pla>orm Services hcp://aws.amazon.com/rds/ DynamoDB hcp://aws.amazon.com/dynamodb/ Beanstalk hcp://aws.amazon.com/elas3cbeanstalk/ Oct 30, 2017 Sprenkle - CSCI325 10 5
10/30/17 Amazon Elas3c MapReduce (EMR) • Web service that makes it easy to quickly and cost-effec3vely process vast amounts of data using Hadoop • Distributes data and processing across a resizable cluster of Amazon EC2 instances • Can launch a persistent cluster that stays up indefinitely or a temporary cluster that terminates afer the analysis is complete Ø Probably want to terminate cluster Oct 30, 2017 Sprenkle - CSCI325 11 Amazon Elas3c MapReduce (EMR) • Supports a variety of Amazon EC2 instance types and Amazon EC2 pricing op3ons (On-Demand, Reserved, and Spot). • When launching an Amazon EMR cluster (also called a "job flow"), you choose how many and what type of Amazon EC2 Instances to provision. • The Amazon EMR price is in addi3on to the Amazon EC2 price. • Amazon EMR is used in a variety of applica3ons, including log analysis, web indexing, data warehousing, machine learning, financial analysis, scien3fic simula3on, and bioinforma3cs. Oct 30, 2017 Sprenkle - CSCI325 12 6
10/30/17 WordCount Mapper in Java public static public static class class TokenizerMapper TokenizerMapper extends Mapper<Object, Text, Text, extends Mapper<Object, Text, Text, IntWritable IntWritable> { > { private private final final static static IntWritable IntWritable one one = = new new IntWritable IntWritable(1); (1); private private Text Text word word = = new new Text(); Text(); public public void void map(Object map(Object key key, Text , Text value value, Context , Context context context) throws throws IOException IOException, , InterruptedException InterruptedException { { StringTokenizer itr = new new StringTokenizer StringTokenizer(value value.toString .toString()); ()); while while ( (itr itr.hasMoreTokens .hasMoreTokens()) { ()) { word.set(itr.nextToken()); context.write(word, one one); ); } } } Oct 30, 2017 Sprenkle - CSCI325 13 WordCount Reducer in Java public public static static class class IntSumReducer IntSumReducer extends extends Reducer<Text, Reducer<Text, IntWritable IntWritable, Text, , Text, IntWritable IntWritable> { > { private private IntWritable IntWritable result result = = new new IntWritable IntWritable(); (); public public void void reduce(Text reduce(Text key key, , Iterable Iterable<IntWritable IntWritable> > values values, Context , Context context context) throws throws IOException IOException, , InterruptedException InterruptedException { { int sum int sum = 0; = 0; for for ( (IntWritable IntWritable val val : : values values) { ) { sum += val.get(); } result.set(sum); context.write(key, result); } } Oct 30, 2017 Sprenkle - CSCI325 14 7
10/30/17 WordCount.java public public class class WordCount WordCount { { public static public static void void main(String[] main(String[] args args) ) throws throws Exception { Exception { Configuration conf = new new Configuration(); Configuration(); Job job = Job. getInstance(conf, "word count"); job.setJarByClass(WordCount.class class); ); job.setMapperClass(TokenizerMapper.class class); ); job.setCombinerClass(IntSumReducer.class class); ); job.setReducerClass(IntSumReducer.class class); ); job.setOutputKeyClass(Text.class class); ); job.setOutputValueClass(IntWritable.class class); ); FileInputFormat. addInputPath(job, new new Path( Path(args args[0])); [0])); FileOutputFormat. setOutputPath(job, new new Path(args Path( args[1])); [1])); System. exit(job.waitForCompletion(true true) ? 0 : 1); ) ? 0 : 1); } } Oct 30, 2017 Sprenkle - CSCI325 15 Nested Classes • Nested class: member of enclosing class • Non-sta3c nested classes/inner classes Ø Have access to members of enclosing class, even if private • Sta3c nested classes do not have access to (instance) members of enclosing class Oct 30, 2017 Sprenkle - CSCI325 16 8
10/30/17 Solu3ons • Original code given Ø All part of one Java class file • Alterna3ve: Ø Classes in separate Java class files/not inner classes Ø The way I organized your example code in GitHub so that you may have an easier 3me with sharing/ collabora3ng Oct 30, 2017 Sprenkle - CSCI325 17 Gelng Data To The Mapper Input file Input file InputSplit InputSplit InputSplit InputSplit InputFormat RecordReader RecordReader RecordReader RecordReader Mapper Mapper Mapper Mapper (intermediates) (intermediates) (intermediates) (intermediates) 9
10/30/17 Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT> • FileInputFormat: Key – offset of data in its file Oct 30, 2017 Sprenkle - CSCI325 19 Finally: Wri3ng The Output Reducer Reducer Reducer OutputFormat RecordWriter RecordWriter RecordWriter output file output file output file 10
10/30/17 Project 3 • Use MapReduce and Amazon clusters to create an inverted index Ø What is an inverted index? • Write mapper and reducer • Write query • Check out resources, run through the tutorials Ø Don’t get overwhelmed! Ø Important part of CS is learning tools, systems on your own Oct 30, 2017 Sprenkle - CSCI325 21 11
Recommend
More recommend