Thursday, February 7, 2008

AWK Programming

AWK

The AWK utility, with its own self-contained language, is one of the most powerful data processing engines in existence — not only in Linux, but anywhere (named for the last initials of its creators, Alfred Aho, Peter Weinberger, and Brian Kernighan). It allows you to create short programs that read input files, sort data, process it, perform arithmetic on the input, and generate reports, among myriad other functions.

What Is AWK?

To put it the simplest way possible, AWK is a programming-language tool used to manipulate text. The language of the AWK utility resembles the shell-programming language in many areas, although AWK's syntax is very much its own. When first created, AWK was designed to work in the text-processing arena, and the language is based on executing a series of instructions whenever a pattern is matched in the input data. The utility scans each line of a file, looking for patterns that match those given on the command line. If a match is found, it takes the next programming step. If no match is found, it then proceeds to the next line.

While the operations can get complex, the syntax for the command is always:

awk '{pattern + action}' {filenames}

where pattern represents what AWK is looking for in the data, and action is a series of commands executed when a match is found. Curly brackets ({}) are not always required around your program, but they are used to group a series of instructions based on a specific pattern.

Understanding Fields

The utility separates each input line into records and fields. A record is a single line of input, and each record consists of several fields. The default-field separator is a space or a tab, and the record separator is a new line. Although both tabs and spaces are perceived as field separators by default (multiple blank spaces still count as one delimiter), the delimiter can be changed from white space to any other character.

To illustrate, look at the following employee-list file saved as emp_names:

46012   DULANEY     EVAN        MOBILE   AL
46013   DURHAM      JEFF        MOBILE   AL
46015   STEEN       BILL        MOBILE   AL

As AWK reads the input, the entire record is assigned to the variable $0. Each field, as split with the field separator, is assigned to the variables $1, $2, $3, and so on. A line contains essentially an unlimited number of fields, with each field being accessed by the field number. Thus, the command

awk '{print $1,$2,$3,$4,$5}' names

would result in a printout of

46012 DULANEY EVAN MOBILE AL
46013 DURHAM JEFF MOBILE AL
46015 STEEN BILL MOBILE AL
 
$ awk '{print $2,$3}' emp_names
 
DULANEY EVAN
DURHAM JEFF
STEEN BILL

Working with Patterns

You can select the action to take place only on certain records, and not on all records, by including a pattern that must be matched. The simplest form of pattern matching is that of a search, wherein the item to be matched is included in slashes (/pattern/). For example, to perform the earlier action only on those employees who live in Alabama:

$ awk '/AL/ {print $3,$2}' emp_names
EVAN DULANEY
JEFF DURHAM
BILL STEEN

Braces and Field Separators

The field separator differentiating one field from another need not always be white space; it can be any discernible character. To illustrate, assume the emp_names file separated the fields with colons instead of tabs:

$ cat emp_names
46012:DULANEY:EVAN:MOBILE:AL
46013:DURHAM:JEFF:MOBILE:AL
46015:STEEN:BILL:MOBILE:AL

If you attempted to print the last name by specifying that you wanted the second field with

$ awk '{print $2}' emp_names

you would end up with twelve blank lines. Because there are no spaces in the file, there are no discernible fields beyond the first one. To solve the problem, AWK must be told that a character other than white space is the delimiter, and there are two methods by which to inform AWK of the new field separator: Use the command-line parameter -F, or specify the variable FS within the program. Both strategies work equally well, with one exception, as illustrated by the following example:

$ awk '{FS=":"}{print $2}' emp_names
 
DURHAM
STEEN
FELDMAN
 
Replace the content or modify 
 
First, suppose you have a file called 'file1' that has 2 columns of numbers, and you want to make a new file called 'file2' that has columns 1 and 2 as before, but also adds a third column which is the ratio of the numbers in columns 1 and 2. Suppose you want the new 3-column file (file2) to contain only those lines with column 1 smaller than column 2. Either of the following two commands does what you want:



awk '$1 < $2 {print $0, $1/$2}' file1 > file2



-- or --



cat file1 | awk '$1 < $2 {print $0, $1/$2}' > file2
 
This is some of the basic functionality. You can get more content on awk through this links.
 
1.  http://sparky.rice.edu/~hartigan/awk.html
2.  http://www.computing.net/unix/wwwboard/forum/6189.html
3.  http://scitsc.wlv.ac.uk/cgi-bin/mansec?1+awk
 
 
Regards,
Pinky
 
Custom Search