perl introduction
play

(PERL) Introduction What is PERL? Practical Report and Extraction - PDF document

Practical Report and Extraction Language (PERL) Introduction What is PERL? Practical Report and Extraction Language. It is an interpreted language optimized for scanning arbitrary text files, extracting information from them, and


  1. • Example: @color = qw (red, blue, green, black); $first = shift @color; # $first gets “red”, and @color becomes # (blue, green, black) unshift (@color, “white”); # @color becomes (white, blue, green, black) Internet & Web Based Technology 35

  2. pop and push • They operate on the bottom of the array. – ‘pop’ removes the last element of the array. – ‘push’ replaces the last element of the array. Internet & Web Based Technology 36

  3. • Example: @color = qw (red, blue, green, black); $first = pop @color; # $first gets “black”, and @color becomes # (red, blue, green) push (@color, “white”); # @color becomes (red, blue, green, white) Internet & Web Based Technology 37

  4. Reversing an Array • By using the ‘reverse’ keyword. @names = (“Mina”, “Tina”, ‘Rina”) @rev = reverse @names; # Reversed list stored in ‘rev’. @names = reverse @names; # Original array is reversed. Internet & Web Based Technology 38

  5. Printing an Array • Example: @colors = qw (red, green, blue); print @colors; # prints without spaces – redgreenblue print “@colors”; # prints with spaces – red green blue Internet & Web Based Technology 39

  6. Sort the Elements of an Array • Using the ‘sort’ keyword, by default we can sort the elements of an array lexicographically. – Elements considered as strings. @colors = qw (red blue green black); @sort_col = sort @colors # Array @sort_col is (black blue green red) Internet & Web Based Technology 40

  7. – Another example: @num = qw (10 2 5 22 7 15); @new = sort @num; # @new will contain (10 15 2 22 5 7) – How do sort numerically? @num = qw (10 2 5 22 7 15); @new = sort {$a <=> $b} @num; # @new will contain (2 5 7 10 15 22) Internet & Web Based Technology 41

  8. The ‘splice’ function • Arguments to the ‘splice’ function: – The first argument is an array. – The second argument is an offset (index number of the list element to begin splicing at). – Third argument is the number of elements to remove. @colors = (“red”, “green”, “blue”, “black”); @middle = splice (@colors, 1, 2); # @middle contains the elements removed Internet & Web Based Technology 42

  9. File Handling

  10. Interacting with the user • Read from the keyboard (standard input). – Use the file handle <STDIN>. – Very simple to use. print “Enter your name: ”; $name = <STDIN>; # Read from keyboard print “Good morning, $name. \n”; – $name also contains the newline character. • Need to chop it off. Internet & Web Based Technology 44

  11. The ‘chop’ Function • The ‘chop’ function removes the last character of whatever it is given to chop. • In the following example, it chops the newline. print “Enter your name: ”; chop ($name = <STDIN>); # Read from keyboard and chop newline print “Good morning, $name. \n”; • ‘chop’ removes the last character irrespective of whether it is a newline or not. – Sometimes dangerous. Internet & Web Based Technology 45

  12. Safe chopping: ‘chomp’ • The ‘chomp’ function works similar to ‘chop’, with the difference that it chops off the last character only if it is a newline. print “Enter your name: ”; chomp ($name = <STDIN>); # Read from keyboard and chomp newline print “Good morning, $name. \n”; Internet & Web Based Technology 46

  13. File Operations • Opening a file – The ‘open’ command opens a file and returns a file handle. – For standard input, we have a predefined handle <STDIN>. $fname = “/home/isg/report.txt”; open XYZ , $fname; while (<XYZ>) { print “Line number $. : $_”; } Internet & Web Based Technology 47

  14. – Checking the error code: $fname = “/home/isg/report.txt”; open XYZ, $fname or die “Error in open: $!”; while (<XYZ>) { print “Line number $. : $_”; } – $. returns the line number (starting at 1) – $_ returns the contents of last match – $i returns the error code/message Internet & Web Based Technology 48

  15. • Reading from a file: – The last example also illustrates file reading. – The angle brackets (< >) are the line input operators. • The data read goes into $_ Internet & Web Based Technology 49

  16. • Writing into a file: $out = “/home/isg/out.txt”; open XYZ , “>$out” or die “Error in write: $!”; for $i (1..20) { print XYZ “$i :: Hello, the time is”, scalar(localtime), “\n”; } Internet & Web Based Technology 50

  17. • Appending to a file: $out = “/home/isg/out.txt”; open XYZ , “>>$out” or die “Error in write: $!”; for $i (1..20) { print XYZ “$i :: Hello, the time is”, scalar(localtime), “\n”; } Internet & Web Based Technology 51

  18. • Closing a file: close XYZ; where XYZ is the file handle of the file being closed. Internet & Web Based Technology 52

  19. • Printing a file: – This is very easy to do in Perl. $input = “/home/isg/report.txt”; open IN, $input or die “Error in open: $!”; while (<IN>) { print; } close IN; Internet & Web Based Technology 53

  20. Command Line Arguments • Perl uses a special array called @ARGV. – List of arguments passed along with the script name on the command line. – Example: if you invoke Perl as: perl test.pl red blue green then @ARGV will be (red blue green). – Printing the command line arguments: foreach (@ARGV) { print “$_ \n”; } Internet & Web Based Technology 54

  21. Standard File Handles • <STDIN> – Read from standard input (keyboard). • <STDOUT> – Print to standard output (screen). • <STDERR> – For outputting error messages. • <ARGV> – Reads the names of the files from the command line and opens them all. Internet & Web Based Technology 55

  22. – @ARGV array contains the text after the program’s name in command line. • <ARGV> takes each file in turn. • If there is nothing specified on the command line, it reads from the standard input. – Since this is very commonly used, Perl provides an abbreviation for <ARGV>, namely, < > – An example is shown. Internet & Web Based Technology 56

  23. $lineno = 1; while (< >) { print $lineno ++; print “$lineno: $_”; } – In this program, the name of the file has to be given on the command line. perl list_lines.pl file1.txt perl list_lines.pl a.txt b.txt c.txt Internet & Web Based Technology 57

  24. Control Structures

  25. Introduction • There are many control constructs in Perl. – Similar to those in C. – Would be illustrated through examples. – The available constructs: • for • foreach • if/elseif/else • while • do, etc. Internet & Web Based Technology 59

  26. Concept of Block • A statement block is a sequence of statements enclosed in matching pair of { and }. if (year == 2000) { print “You have entered new millenium.\n”; } • Blocks may be nested within other blocks. Internet & Web Based Technology 60

  27. Definition of TRUE in Perl • In Perl, only three things are considered as FALSE: – The value 0 – The empty string (“ ”) – undef • Everything else in Perl is TRUE. Internet & Web Based Technology 61

  28. if .. else • General syntax: if (test expression) { # if TRUE, do this } else { # if FALSE, do this } Internet & Web Based Technology 62

  29. Examples: • if ($name eq ‘isg’) { print “Welcome Indranil. \n”; } else { print “You are somebody else. \n”; } if ($flag == 1) { print “There has been an error. \n”; } # The else block is optional Internet & Web Based Technology 63

  30. elseif Example: • print “Enter your id: ”; chomp ($name = <STDIN>); if ($name eq ‘isg’) { print “Welcome Indranil. \n”; } elseif ($name eq ‘bkd’) { print “Welcome Bimal. \n”; } elseif ($name eq ‘akm’) { print “Welcome Arun. \n”; } else { print “Sorry, I do not know you. \n”; } Internet & Web Based Technology 64

  31. while Example: (Guessing the correct word) • $your_choice = ‘ ‘; $secret_word = ‘India’; while ($your_choice ne $secret_word) { print “Enter your guess: \n”; chomp ($your_choice = <STDIN>); } print “Congratulations! Mera Bharat Mahan.” Internet & Web Based Technology 65

  32. for • Syntax same as in C. • Example: for ($i=1; $i<10; $i++) { print “Iteration number $i \n”; } Internet & Web Based Technology 66

  33. foreach • Very commonly used function that iterates over a list. • Example: @colors = qw (red blue green); foreach $name (@colors) { print “Color is $name. \n”; } • We can use ‘for’ in place of ‘foreach’. Internet & Web Based Technology 67

  34. • Example: Counting odd numbers in a list @xyz = qw (10 15 17 28 12 77 56); $count = 0; foreach $number (@xyz) { if (($number % 2) == 1) { print “$number is odd. \n”; $count ++; } print “Number of odd numbers is $count. \n”; } Internet & Web Based Technology 68

  35. Breaking out of a loop • The statement ‘last’, if it appears in the body of a loop, will cause Perl to immediately exit the loop. – Used with a conditional. last if (i > 10); Internet & Web Based Technology 69

  36. Skipping to end of loop • For this we use the statement ‘next’. – When executed, the remaining statements in the loop will be skipped, and the next iteration will begin. – Also used with a conditional. Internet & Web Based Technology 70

  37. Relational Operators

  38. The Operators Listed Comparison Numeric String Equal == eq Not equal != ne Greater than > gt Less than < lt Greater or equal >= ge Less or equal <= le Internet & Web Based Technology 72

  39. Logical Connectives • If $a and $b are logical expressions, then the following conjunctions are supported by Perl: – $a and $b $a && $b – $a or $b $a || $b – not $a ! $a • Both the above alternatives are equivalent; first one is more readable. Internet & Web Based Technology 73

  40. String Functions

  41. The Split Function ‘split’ is used to split a string into multiple pieces using a • delimiter, and create a list out of it. $_=‘Red:Blue:Green:White:255'; @details = split /:/, $_; foreach (@details) { print “$_\n”; } – The first parameter to ‘split’ is a regular expression that specifies what to split on. – The second specifies what to split. Internet & Web Based Technology 75

  42. • Another example: $_= “Indranil isg@iitkgp.ac.in 283496”; ($name, $email, $phone) = split / /, $_; • By default, ‘split’ breaks a string using space as delimiter. Internet & Web Based Technology 76

  43. The Join Function • ‘join’ is used to concatenate several elements into a single string, with a specified delimiter in between. $new = join ' ', $x1, $x2, $x3, $x4, $x5, $x6; $sep = ‘::’; $new = join $sep, $x1, $x2, $w3, @abc, $x4, $x5; Internet & Web Based Technology 77

  44. Regular Expressions

  45. Introduction • One of the most useful features of Perl. • What is a regular expression (RegEx)? – Refers to a pattern that follows the rules of syntax. – Basically specifies a chunk of text. – Very powerful way to specify string patterns. Internet & Web Based Technology 79

  46. An Example: without RegEx $found = 0; $_ = “Hello good morning everybody”; $search = “every”; foreach $word (split) { if ($word eq $search) { $found = 1; last; } } if ($found) { print “Found the word ‘every’ \n”; } Internet & Web Based Technology 80

  47. Using RegEx $_ = “Hello good morning everybody”; if ($_ =~ /every/) { print “Found the word ‘every’ \n”; } • Very easy to use. • The text between the forward slashes defines the regular expression. • If we use “!~” instead of “=~”, it means that the pattern is not present in the string. Internet & Web Based Technology 81

  48. • The previous example illustrates literal texts as regular expressions. – Simplest form of regular expression. • Point to remember: – When performing the matching, all the characters in the string are considered to be significant, including punctuation and white spaces. • For example, /every / will not match in the previous example. Internet & Web Based Technology 82

  49. Another Simple Example $_ = “Welcome to IIT Kharagpur, students”; if (/IIT K/) { print “’IIT K’ is present in the string\n”; { if (/Kharagpur students/) { print “This will not match\n”; } Internet & Web Based Technology 83

  50. Types of RegEx • Basically two types: – Matching • Checking if a string contains a substring. • The symbol ‘m’ is used (optional if forward slash used as delimiter). – Substitution • Replacing a substring by another substring. • The symbol ‘s’ is used. Internet & Web Based Technology 84

  51. Matching

  52. The =~ Operator • Tells Perl to apply the regular expression on the right to the value on the left. • The regular expression is contained within delimiters (forward slash by default). – If some other delimiter is used, then a preceding ‘m’ is essential. Internet & Web Based Technology 86

  53. Examples $string = “Good day”; if ($string =~ m/day/) { print “Match successful \n"; } if ($string =~ /day/) { print “Match successful \n"; } • Both forms are equivalent. • The ‘m’ in the first form is optional. Internet & Web Based Technology 87

  54. $string = “Good day”; if ($string =~ m@day@) { print “Match successful \n"; } if ($string =~ m[day[ ) { print “Match successful \n"; } • Both forms are equivalent. • The character following ‘m’ is the delimiter. Internet & Web Based Technology 88

  55. Character Class • Use square brackets to specify “any value in the list of possible values”. my $string = “Some test string 1234"; if ($string =~ /[0123456789]/) { print "found a number \n"; } if ($string =~ /[aeiou]/) { print "Found a vowel \n"; } if ($string =~ /[0123456789ABCDEF]/) { print "Found a hex digit \n"; } Internet & Web Based Technology 89

  56. Character Class Negation • Use ‘^’ at the beginning of the character class to specify “any single element that is not one of these values”. my $string = “Some test string 1234"; if ($string =~ /[^aeiou]/) { print "Found a consonant\n"; } Internet & Web Based Technology 90

  57. Pattern Abbreviations • Useful in common cases Anything except newline (\n) . \d A digit, same as [0-9] \w A word character, [0-9a-zA-Z_] \s A space character (tab, space, etc) \D Not a digit, same as [^0-9] \W Not a word character \S Not a space character Internet & Web Based Technology 91

  58. $string = “Good and bad days"; if ($string =~ /d..s/) { print "Found something like days\n"; } if ($string =~ /\w\w\w\w\s/) { print "Found a four-letter word!\n"; } Internet & Web Based Technology 92

  59. Anchors • Three ways to define an anchor: ^ :: anchors to the beginning of string $ :: anchors to the end of the string \b :: anchors to a word boundary Internet & Web Based Technology 93

  60. if ($string =~ /^\w/) :: does string start with a word character? if ($string =~ /\d$/) :: does string end with a digit? if ($string =~ /\bGood\b/) :: Does string contain the word “Good”? Internet & Web Based Technology 94

  61. Multipliers • There are three multiplier characters. * :: Find zero or more occurrences + :: Find one or more occurrences ? :: Find zero or one occurrence • Some example usages: $string =~ /^\w+/; $string =~ /\d?/; $string =~ /\b\w+\s+/; $string =~ /\w+\s?$/; Internet & Web Based Technology 95

  62. Substitution

  63. Basic Usage • Uses the ‘s’ character. • Basic syntax is: $new =~ s/pattern_to_match/new_pattern/; What this does? • Looks for pattern_to_match in $new and, if found, replaces it with new_pattern. • It looks for the pattern once. That is, only the first occurrence is replaced. • There is a way to replace all occurrences (to be discussed shortly). Internet & Web Based Technology 97

  64. Examples $xyz = “Rama and Lakshman went to the forest”; $xyz =~ s/Lakshman/Bharat/; $xyz =~ s/R\w+a/Bharat/; $xyz =~ s/[aeiou]/i/; $abc = “A year has 11 months \n”; $abc =~ s/\d+/12/; $abc =~ s /\n$/ /; Internet & Web Based Technology 98

  65. Common Modifiers Two such modifiers are defined: • /i :: ignore case /g :: match/substitute all occurrences $string = “Ram and Shyam are very honest"; if ($string =~ /RAM/i) { print “Ram is present in the string”; } $string =~ s/m/j/g; # Ram -> Raj, Shyam -> Shyaj Internet & Web Based Technology 99

  66. Use of Memory in RegEx • We can use parentheses to capture a piece of matched text for later use. – Perl memorizes the matched texts. – Multiple sets of parentheses can be used. • How to recall the captured text? – Use \1, \2, \3, etc. if still in RegEx. – Use $1, $2, $3 if after the RegEx. Internet & Web Based Technology 100

Recommend


More recommend