awk awk
play

Awk, Awk Pattern matching and processing language Looks for - PowerPoint PPT Presentation

CSC209 Fall 2001 What is AWK? Awk, Awk Pattern matching and processing language Looks for pattern in file If pattern matches, do something Many details handled automatically very easy to one off (write and throw away)


  1. CSC209 Fall 2001 What is AWK? Awk, Awk � Pattern matching and processing language � Looks for pattern in file � If pattern matches, do something � Many details handled automatically � very easy to one off (write and throw away) What’s it good for? Features � data manipulation (omitting part of � awk is data-driven as opposed to procedural file, counting occurrences) � This means you think about the format of the data you’re trying to manipulate vs. what to � rapid prototyping do � converting file formats � Highly automated (record retrieval, break down into fields, type conversion) � no variable declarations � usual programming constructs Karen Reid 1

  2. CSC209 Fall 2001 History The commandline � Use as part of a pipeline � The name awk comes from the initials of its designers: Alfred V. Aho, Peter J. Weinberger � For simple things, specify directly on command line: and Brian W. Kernighan � example: ls –l | awk ‘{ print $2}’ prints the � created in 1977 at AT&T, Bell Labs. second column � 1985: nawk � for more complex things, dump to file and use � many versions: awk,nawk,POSIX awk, gawk –f (gawk is on cdf) � example: awk –f myscript inputfile | lp � awk reads from stdin and prints to stdout Patterns and Actions Example: � 3 main blocks: BEGIN, processing block, seawolf:~% head -5 file1 Baker, Chase 29 GMUP 56.28 57.79 END Frohlich, Jon 29 UTAH 49.10 49.20 Kittredge, Brad 25 TOC 45.05 46.22 � 2 parts to statements: pattern and action Liggett, Michael 27 DYNA 47.25 48.12 � patterns tell awk what to match Linderman, Ross 25 PNA 52.55 51.17 seawolf:~% head -5 file1 | awk '/^Kit/ {print $0}' � actions tell awk what to do if there is a match Kittredge, Brad 25 TOC 45.05 46.22 seawolf:~% head -5 file1 | awk '/.*/' � can omit either one, but not both Baker, Chase 29 GMUP 56.28 57.79 Frohlich, Jon 29 UTAH 49.10 49.20 � no pattern = match everything Kittredge, Brad 25 TOC 45.05 46.22 Liggett, Michael 27 DYNA 47.25 48.12 � no action = print Linderman, Ross 25 PNA 52.55 51.17 Karen Reid 2

  3. CSC209 Fall 2001 Input seawolf:~% head -5 file1 Baker, Chase 29 GMUP 56.28 57.79 Frohlich, Jon 29 UTAH 49.10 49.20 … seawolf:~% cat testawk � awk works with records , defaults to a single /.*/ {print $1} /.*/ {print $2} line seawolf:~% head -5 file1 | awk -f testawk Baker, � reading records is automatic, no read Chase statement Frohlich, Jon … � next tells awk to skip the current record seawolf:~% cat testawk2 /.*/ {print $1; next} � exit causes program to go to END record /.*/ {print $2} seawolf:~% head -5 file1 | awk -f testawk2 � exit within END causes awk to quit Baker, Frohlich, … Fields Example: � group of characters separated by the field seawolf:~% head -5 file1 Baker, Chase 29 GMUP 56.28 57.79 separator Frohlich, Jon 29 UTAH 49.10 49.20 � the variable FS holds the field separator Kittredge, Brad 25 TOC 45.05 46.22 Liggett, Michael 27 DYNA 47.25 48.12 � set it: BEGIN { FS = “,”} or –Fchar to Linderman, Ross 25 PNA 52.55 51.17 change it seawolf:~% head -5 file1 | awk -F, '{print $2}' Chase 29 GMUP 56.28 57.79 � likewise, OFS holds output field separator Jon 29 UTAH 49.10 49.20 � predefined: $1 = 1 st field, $2 = 2 nd field etc. Brad 25 TOC 45.05 46.22 Michael 27 DYNA 47.25 48.12 $0= entire record (line) Ross 25 PNA 52.55 51.17 Karen Reid 3

  4. CSC209 Fall 2001 format Patterns � mostly free format � 6 types in total: � if more than 1 statement per line, use ; to � BEGIN separate � END � good idea to just always use ; (like C) � Expressions � at least opening { of action must be on the � String Patterns same line as pattern � Range Patterns � comments use # � Compound Patterns seawolf:~% cat swimresults | wc -l BEGIN and END 15 seawolf:~% awk 'END { print("Total lines",NR); }' swimresults Total lines 15 seawolf:~% head -5 swimresults Stanford, Jeffrey 25 HIMA 47.07 46.32 � BEGIN always matches before 1 st input Liggett, Michael 27 DYNA 47.25 48.12 Baker, Chase 29 GMUP 56.28 57.79 record ... seawolf:~% cat testawk3 � used to initialize variables BEGIN { OFS=","; � must be 1 st pattern if used (some versions) print "First Name","Last Name"; print "----------","---------"; } � END always matches after last input record is { print $2,$1;} seawolf:~% head -5 swimresults | sed -e 's/,/ /g' | awk -f testawk3 read First Name,Last Name ----------,--------- � use it for things like printing totals Jeffrey,Stanford Michael,Liggett Chase,Baker � must be last pattern if used ... Karen Reid 4

  5. CSC209 Fall 2001 Expressions Automatic type conversion � expression = operator in awk and its operands � if using numerical operator: � can compare both numbers and strings � if both operands are numbers, then they will � type conversion is automatic be compared numerically � type of operand depends on operator � if both are strings, compare on collation order Operator Meaning � if 1 is number while the other is string treated == is equal to < less than as if both are strings > greater than <= less than or equal to >= greater than or equal to != not equal to ~ matched by !~ not matched by $ cat awktest1 String Matching $6 < $5 {print $1,$2,$5,$6;} $ cat swimresults Stanford, Jeffrey 25 HIMA 47.07 46.32 Liggett, Michael 27 DYNA 47.25 48.12 � 3 forms: Baker, Chase 29 GMUP 56.28 57.79 ... � /string/ - matches if string occurs anywhere $ awk -f awktest1 swimresults in the record Stanford, Jeffrey 47.07 46.32 ... � ~ and !~ can deal with more specific scope $ cat awktest2 $1 > "P" {print $1,$2;} � eg. $1 ~ /ttt*/ matches $ awk -f awktest2 swimresults Stanford, Jeffrey Richner, Thomas ... Liggett, Michael 27 DYNA 47.25 48.12 Kittredge, Brad 25 TOC 45.05 46.22 $3 > "A" {print $0} prints nothing. Why? Karen Reid 5

  6. CSC209 Fall 2001 Range Patterns Example: � 2 patterns separated by a comma (,) $ cat awktest3 $1 ~ /^B.*/, $1~/^M.*/ { print $0} � action is performed for all lines between 1st $ sort swimresults |awk -f awktest3 ocurrence of 1st pattern and 1st occurrence Baker, Chase 29 GMUP 56.28 57.79 Frohlich, Jon 29 UTAH 49.10 49.20 of second pattern Kittredge, Brad 25 TOC 45.05 46.22 � if 2nd occurrence not found, matches Liggett, Michael 27 DYNA 47.25 48.12 Linderman, Ross 25 PNA 52.55 51.17 everything from 1st occurrence on McCormick, Aaron 27 RMM 49.00 49.30 Compound patterns Actions � can use logical operators to combine � Tells awk what to do when a pattern is found patterns � surrounded by {} � Includes: � !, ||, && � variables � example: � loops � data structures (arrays) $ awk '$6 < 47 && $3 >=28 {print $0}' swimresults Wanie, Lee 28 TOC 46.00 46.39 Karen Reid 6

  7. CSC209 Fall 2001 Variables Example: � 3 types: user defined, field variables, � calculate the average final swim time (recall predefined 6th column was final time): � no declaration � "you use it, therefore it is" BEGIN { totalTime=0; } � auto-initialized to 0, but alway initialize { totalTime+=$6} � predefined variables are all uppercase END { print "average time:",totalTime/NR;} � case sensitive � no type declaration, conversion is automatic $ awk -f avgtime.awk swimresults � if conversion fails, gives value of 0 average time: 50.7953 Built in variables (awk) Example � FILENAME - name of current input file $ cat starlight Star light, star bright, � FS - field separator, defaults to space First star I see tonight, � OFS - output field separator, default space I wish I may, I wish I might, Get to play Halflife2 in the coming nights � ORS - output record separator, default new $ awk '{ print "line",NR,NF,"words:",$0}' starlight line line 1 4 words: Star light, star bright, line 2 5 words: First star I see tonight, � NR - number of records read thus far line 3 8 words: I wish I may, I wish I might, � NF - number of fields line 4 8 words: Get to play Halflife2 in the coming nights Karen Reid 7

Recommend


More recommend