Introawk
Introawk
Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks.
Awk
Works well on record-type data Reads input file(s) a line at a time Parses each line into fields Performs user-defined tests against each line, performs actions on matches
Every record have same # of fields? Do values make sense (negative time, hourly wage > $100, etc.)?
Invocation
Can write little one-liners on the command line (very handy):
print the 3rd field of every line: $ awk '{ print $3 }' input.txt
Or, use this sha-bang as the first line, and give your script execute permissions:
#!/bin/awk -f
BEGIN true before any records are read END true at end of input (after all records have been read)
Awk Features
Patterns can be regular expressions or C like conditions. Each line of the input is matched against the patterns, one after the next. If a match occurs the corresponding action is performed. Input lines are parsed and split into fields, which are accessed by $1,,$NF, where NF is a variable set to the number of fields. The variable $0 contains the entire line, and by default lines are split by white space (blanks, tabs)
Variables
Not declared, nor typed No character type
Example
Print those employees who actually worked $ awk $3>0 {print $1, $2*$3} emp.data
Kathy Mark Mary Susie 40 100 121 76.5 $ cat emp.data Beth 4.00 0 Dan 3.75 0 Kathy 4.00 10 Mark 5.00 20 Mary 5.50 22 Susie 4.25 18
$ getEmails.awk students.csv john's email is: js12@school.edu fred's email is: fj84@school.edu sue's email is: sb23@school.edu ralph's email is: rf86@school.edu jim's email is: jj22@school.edu nancy's email is: nc54@school.edu anna's email is: ab67@school.edu sam's email is: sr77@school.edu lisa's email is: guitarHottie@schoo
Flow Control
Awk syntax is much like C Same loops, if statements, etc. AWK: Aho, Weinberger, Kernighan Kernighan and Ritchie wrote the C language
Associative Arrays
Awk also supports arrays that can be indexed by arbitrary strings. They are implemented using hash tables.
Total[Sue] = 100;
It is possible to loop over all indices that have currently been assigned values.
for (name in Total) print name, Total[name];
Useful one-liners
Line count:
awk 'END {print NR}'
grep
awk '/pat/'
head
awk 'NR<=10'
Many more. See the resources tab on the course webpage for links to more examples.