(PERL) Introduction What is PERL? Practical Report and Extraction - PDF document
Practical Report and Extraction Language (PERL) Introduction What is PERL? Practical Report and Extraction Language. It is an interpreted language optimized for scanning arbitrary text files, extracting information from them, and
• Example: @color = qw (red, blue, green, black); $first = shift @color; # $first gets “red”, and @color becomes # (blue, green, black) unshift (@color, “white”); # @color becomes (white, blue, green, black) Internet & Web Based Technology 35
pop and push • They operate on the bottom of the array. – ‘pop’ removes the last element of the array. – ‘push’ replaces the last element of the array. Internet & Web Based Technology 36
• Example: @color = qw (red, blue, green, black); $first = pop @color; # $first gets “black”, and @color becomes # (red, blue, green) push (@color, “white”); # @color becomes (red, blue, green, white) Internet & Web Based Technology 37
Reversing an Array • By using the ‘reverse’ keyword. @names = (“Mina”, “Tina”, ‘Rina”) @rev = reverse @names; # Reversed list stored in ‘rev’. @names = reverse @names; # Original array is reversed. Internet & Web Based Technology 38
Printing an Array • Example: @colors = qw (red, green, blue); print @colors; # prints without spaces – redgreenblue print “@colors”; # prints with spaces – red green blue Internet & Web Based Technology 39
Sort the Elements of an Array • Using the ‘sort’ keyword, by default we can sort the elements of an array lexicographically. – Elements considered as strings. @colors = qw (red blue green black); @sort_col = sort @colors # Array @sort_col is (black blue green red) Internet & Web Based Technology 40
– Another example: @num = qw (10 2 5 22 7 15); @new = sort @num; # @new will contain (10 15 2 22 5 7) – How do sort numerically? @num = qw (10 2 5 22 7 15); @new = sort {$a <=> $b} @num; # @new will contain (2 5 7 10 15 22) Internet & Web Based Technology 41
The ‘splice’ function • Arguments to the ‘splice’ function: – The first argument is an array. – The second argument is an offset (index number of the list element to begin splicing at). – Third argument is the number of elements to remove. @colors = (“red”, “green”, “blue”, “black”); @middle = splice (@colors, 1, 2); # @middle contains the elements removed Internet & Web Based Technology 42
File Handling
Interacting with the user • Read from the keyboard (standard input). – Use the file handle <STDIN>. – Very simple to use. print “Enter your name: ”; $name = <STDIN>; # Read from keyboard print “Good morning, $name. \n”; – $name also contains the newline character. • Need to chop it off. Internet & Web Based Technology 44
The ‘chop’ Function • The ‘chop’ function removes the last character of whatever it is given to chop. • In the following example, it chops the newline. print “Enter your name: ”; chop ($name = <STDIN>); # Read from keyboard and chop newline print “Good morning, $name. \n”; • ‘chop’ removes the last character irrespective of whether it is a newline or not. – Sometimes dangerous. Internet & Web Based Technology 45
Safe chopping: ‘chomp’ • The ‘chomp’ function works similar to ‘chop’, with the difference that it chops off the last character only if it is a newline. print “Enter your name: ”; chomp ($name = <STDIN>); # Read from keyboard and chomp newline print “Good morning, $name. \n”; Internet & Web Based Technology 46
File Operations • Opening a file – The ‘open’ command opens a file and returns a file handle. – For standard input, we have a predefined handle <STDIN>. $fname = “/home/isg/report.txt”; open XYZ , $fname; while (<XYZ>) { print “Line number $. : $_”; } Internet & Web Based Technology 47
– Checking the error code: $fname = “/home/isg/report.txt”; open XYZ, $fname or die “Error in open: $!”; while (<XYZ>) { print “Line number $. : $_”; } – $. returns the line number (starting at 1) – $_ returns the contents of last match – $i returns the error code/message Internet & Web Based Technology 48
• Reading from a file: – The last example also illustrates file reading. – The angle brackets (< >) are the line input operators. • The data read goes into $_ Internet & Web Based Technology 49
• Writing into a file: $out = “/home/isg/out.txt”; open XYZ , “>$out” or die “Error in write: $!”; for $i (1..20) { print XYZ “$i :: Hello, the time is”, scalar(localtime), “\n”; } Internet & Web Based Technology 50
• Appending to a file: $out = “/home/isg/out.txt”; open XYZ , “>>$out” or die “Error in write: $!”; for $i (1..20) { print XYZ “$i :: Hello, the time is”, scalar(localtime), “\n”; } Internet & Web Based Technology 51
• Closing a file: close XYZ; where XYZ is the file handle of the file being closed. Internet & Web Based Technology 52
• Printing a file: – This is very easy to do in Perl. $input = “/home/isg/report.txt”; open IN, $input or die “Error in open: $!”; while (<IN>) { print; } close IN; Internet & Web Based Technology 53
Command Line Arguments • Perl uses a special array called @ARGV. – List of arguments passed along with the script name on the command line. – Example: if you invoke Perl as: perl test.pl red blue green then @ARGV will be (red blue green). – Printing the command line arguments: foreach (@ARGV) { print “$_ \n”; } Internet & Web Based Technology 54
Standard File Handles • <STDIN> – Read from standard input (keyboard). • <STDOUT> – Print to standard output (screen). • <STDERR> – For outputting error messages. • <ARGV> – Reads the names of the files from the command line and opens them all. Internet & Web Based Technology 55
– @ARGV array contains the text after the program’s name in command line. • <ARGV> takes each file in turn. • If there is nothing specified on the command line, it reads from the standard input. – Since this is very commonly used, Perl provides an abbreviation for <ARGV>, namely, < > – An example is shown. Internet & Web Based Technology 56
$lineno = 1; while (< >) { print $lineno ++; print “$lineno: $_”; } – In this program, the name of the file has to be given on the command line. perl list_lines.pl file1.txt perl list_lines.pl a.txt b.txt c.txt Internet & Web Based Technology 57
Control Structures
Introduction • There are many control constructs in Perl. – Similar to those in C. – Would be illustrated through examples. – The available constructs: • for • foreach • if/elseif/else • while • do, etc. Internet & Web Based Technology 59
Concept of Block • A statement block is a sequence of statements enclosed in matching pair of { and }. if (year == 2000) { print “You have entered new millenium.\n”; } • Blocks may be nested within other blocks. Internet & Web Based Technology 60
Definition of TRUE in Perl • In Perl, only three things are considered as FALSE: – The value 0 – The empty string (“ ”) – undef • Everything else in Perl is TRUE. Internet & Web Based Technology 61
if .. else • General syntax: if (test expression) { # if TRUE, do this } else { # if FALSE, do this } Internet & Web Based Technology 62
Examples: • if ($name eq ‘isg’) { print “Welcome Indranil. \n”; } else { print “You are somebody else. \n”; } if ($flag == 1) { print “There has been an error. \n”; } # The else block is optional Internet & Web Based Technology 63
elseif Example: • print “Enter your id: ”; chomp ($name = <STDIN>); if ($name eq ‘isg’) { print “Welcome Indranil. \n”; } elseif ($name eq ‘bkd’) { print “Welcome Bimal. \n”; } elseif ($name eq ‘akm’) { print “Welcome Arun. \n”; } else { print “Sorry, I do not know you. \n”; } Internet & Web Based Technology 64
while Example: (Guessing the correct word) • $your_choice = ‘ ‘; $secret_word = ‘India’; while ($your_choice ne $secret_word) { print “Enter your guess: \n”; chomp ($your_choice = <STDIN>); } print “Congratulations! Mera Bharat Mahan.” Internet & Web Based Technology 65
for • Syntax same as in C. • Example: for ($i=1; $i<10; $i++) { print “Iteration number $i \n”; } Internet & Web Based Technology 66
foreach • Very commonly used function that iterates over a list. • Example: @colors = qw (red blue green); foreach $name (@colors) { print “Color is $name. \n”; } • We can use ‘for’ in place of ‘foreach’. Internet & Web Based Technology 67
• Example: Counting odd numbers in a list @xyz = qw (10 15 17 28 12 77 56); $count = 0; foreach $number (@xyz) { if (($number % 2) == 1) { print “$number is odd. \n”; $count ++; } print “Number of odd numbers is $count. \n”; } Internet & Web Based Technology 68
Breaking out of a loop • The statement ‘last’, if it appears in the body of a loop, will cause Perl to immediately exit the loop. – Used with a conditional. last if (i > 10); Internet & Web Based Technology 69
Skipping to end of loop • For this we use the statement ‘next’. – When executed, the remaining statements in the loop will be skipped, and the next iteration will begin. – Also used with a conditional. Internet & Web Based Technology 70
Relational Operators
The Operators Listed Comparison Numeric String Equal == eq Not equal != ne Greater than > gt Less than < lt Greater or equal >= ge Less or equal <= le Internet & Web Based Technology 72
Logical Connectives • If $a and $b are logical expressions, then the following conjunctions are supported by Perl: – $a and $b $a && $b – $a or $b $a || $b – not $a ! $a • Both the above alternatives are equivalent; first one is more readable. Internet & Web Based Technology 73
String Functions
The Split Function ‘split’ is used to split a string into multiple pieces using a • delimiter, and create a list out of it. $_=‘Red:Blue:Green:White:255'; @details = split /:/, $_; foreach (@details) { print “$_\n”; } – The first parameter to ‘split’ is a regular expression that specifies what to split on. – The second specifies what to split. Internet & Web Based Technology 75
• Another example: $_= “Indranil isg@iitkgp.ac.in 283496”; ($name, $email, $phone) = split / /, $_; • By default, ‘split’ breaks a string using space as delimiter. Internet & Web Based Technology 76
The Join Function • ‘join’ is used to concatenate several elements into a single string, with a specified delimiter in between. $new = join ' ', $x1, $x2, $x3, $x4, $x5, $x6; $sep = ‘::’; $new = join $sep, $x1, $x2, $w3, @abc, $x4, $x5; Internet & Web Based Technology 77
Regular Expressions
Introduction • One of the most useful features of Perl. • What is a regular expression (RegEx)? – Refers to a pattern that follows the rules of syntax. – Basically specifies a chunk of text. – Very powerful way to specify string patterns. Internet & Web Based Technology 79
An Example: without RegEx $found = 0; $_ = “Hello good morning everybody”; $search = “every”; foreach $word (split) { if ($word eq $search) { $found = 1; last; } } if ($found) { print “Found the word ‘every’ \n”; } Internet & Web Based Technology 80
Using RegEx $_ = “Hello good morning everybody”; if ($_ =~ /every/) { print “Found the word ‘every’ \n”; } • Very easy to use. • The text between the forward slashes defines the regular expression. • If we use “!~” instead of “=~”, it means that the pattern is not present in the string. Internet & Web Based Technology 81
• The previous example illustrates literal texts as regular expressions. – Simplest form of regular expression. • Point to remember: – When performing the matching, all the characters in the string are considered to be significant, including punctuation and white spaces. • For example, /every / will not match in the previous example. Internet & Web Based Technology 82
Another Simple Example $_ = “Welcome to IIT Kharagpur, students”; if (/IIT K/) { print “’IIT K’ is present in the string\n”; { if (/Kharagpur students/) { print “This will not match\n”; } Internet & Web Based Technology 83
Types of RegEx • Basically two types: – Matching • Checking if a string contains a substring. • The symbol ‘m’ is used (optional if forward slash used as delimiter). – Substitution • Replacing a substring by another substring. • The symbol ‘s’ is used. Internet & Web Based Technology 84
Matching
The =~ Operator • Tells Perl to apply the regular expression on the right to the value on the left. • The regular expression is contained within delimiters (forward slash by default). – If some other delimiter is used, then a preceding ‘m’ is essential. Internet & Web Based Technology 86
Examples $string = “Good day”; if ($string =~ m/day/) { print “Match successful \n"; } if ($string =~ /day/) { print “Match successful \n"; } • Both forms are equivalent. • The ‘m’ in the first form is optional. Internet & Web Based Technology 87
$string = “Good day”; if ($string =~ m@day@) { print “Match successful \n"; } if ($string =~ m[day[ ) { print “Match successful \n"; } • Both forms are equivalent. • The character following ‘m’ is the delimiter. Internet & Web Based Technology 88
Character Class • Use square brackets to specify “any value in the list of possible values”. my $string = “Some test string 1234"; if ($string =~ /[0123456789]/) { print "found a number \n"; } if ($string =~ /[aeiou]/) { print "Found a vowel \n"; } if ($string =~ /[0123456789ABCDEF]/) { print "Found a hex digit \n"; } Internet & Web Based Technology 89
Character Class Negation • Use ‘^’ at the beginning of the character class to specify “any single element that is not one of these values”. my $string = “Some test string 1234"; if ($string =~ /[^aeiou]/) { print "Found a consonant\n"; } Internet & Web Based Technology 90
Pattern Abbreviations • Useful in common cases Anything except newline (\n) . \d A digit, same as [0-9] \w A word character, [0-9a-zA-Z_] \s A space character (tab, space, etc) \D Not a digit, same as [^0-9] \W Not a word character \S Not a space character Internet & Web Based Technology 91
$string = “Good and bad days"; if ($string =~ /d..s/) { print "Found something like days\n"; } if ($string =~ /\w\w\w\w\s/) { print "Found a four-letter word!\n"; } Internet & Web Based Technology 92
Anchors • Three ways to define an anchor: ^ :: anchors to the beginning of string $ :: anchors to the end of the string \b :: anchors to a word boundary Internet & Web Based Technology 93
if ($string =~ /^\w/) :: does string start with a word character? if ($string =~ /\d$/) :: does string end with a digit? if ($string =~ /\bGood\b/) :: Does string contain the word “Good”? Internet & Web Based Technology 94
Multipliers • There are three multiplier characters. * :: Find zero or more occurrences + :: Find one or more occurrences ? :: Find zero or one occurrence • Some example usages: $string =~ /^\w+/; $string =~ /\d?/; $string =~ /\b\w+\s+/; $string =~ /\w+\s?$/; Internet & Web Based Technology 95
Substitution
Basic Usage • Uses the ‘s’ character. • Basic syntax is: $new =~ s/pattern_to_match/new_pattern/; What this does? • Looks for pattern_to_match in $new and, if found, replaces it with new_pattern. • It looks for the pattern once. That is, only the first occurrence is replaced. • There is a way to replace all occurrences (to be discussed shortly). Internet & Web Based Technology 97
Examples $xyz = “Rama and Lakshman went to the forest”; $xyz =~ s/Lakshman/Bharat/; $xyz =~ s/R\w+a/Bharat/; $xyz =~ s/[aeiou]/i/; $abc = “A year has 11 months \n”; $abc =~ s/\d+/12/; $abc =~ s /\n$/ /; Internet & Web Based Technology 98
Common Modifiers Two such modifiers are defined: • /i :: ignore case /g :: match/substitute all occurrences $string = “Ram and Shyam are very honest"; if ($string =~ /RAM/i) { print “Ram is present in the string”; } $string =~ s/m/j/g; # Ram -> Raj, Shyam -> Shyaj Internet & Web Based Technology 99
Use of Memory in RegEx • We can use parentheses to capture a piece of matched text for later use. – Perl memorizes the matched texts. – Multiple sets of parentheses can be used. • How to recall the captured text? – Use \1, \2, \3, etc. if still in RegEx. – Use $1, $2, $3 if after the RegEx. Internet & Web Based Technology 100
Recommend
More recommend
Explore More Topics
Stay informed with curated content and fresh updates.