Pinky's Linux Learnings: February 2008

AWK

The AWK utility, with its own self-contained language, is one of the most powerful data processing engines in existence — not only in Linux, but anywhere (named for the last initials of its creators, Alfred Aho, Peter Weinberger, and Brian Kernighan). It allows you to create short programs that read input files, sort data, process it, perform arithmetic on the input, and generate reports, among myriad other functions.

What Is AWK?

To put it the simplest way possible, AWK is a programming-language tool used to manipulate text. The language of the AWK utility resembles the shell-programming language in many areas, although AWK's syntax is very much its own. When first created, AWK was designed to work in the text-processing arena, and the language is based on executing a series of instructions whenever a pattern is matched in the input data. The utility scans each line of a file, looking for patterns that match those given on the command line. If a match is found, it takes the next programming step. If no match is found, it then proceeds to the next line.

While the operations can get complex, the syntax for the command is always:

awk '{pattern + action}' {filenames}

where pattern represents what AWK is looking for in the data, and action is a series of commands executed when a match is found. Curly brackets ({}) are not always required around your program, but they are used to group a series of instructions based on a specific pattern.

Understanding Fields

The utility separates each input line into records and fields. A record is a single line of input, and each record consists of several fields. The default-field separator is a space or a tab, and the record separator is a new line. Although both tabs and spaces are perceived as field separators by default (multiple blank spaces still count as one delimiter), the delimiter can be changed from white space to any other character.

To illustrate, look at the following employee-list file saved as emp_names:

46012   DULANEY     EVAN        MOBILE   AL

46013   DURHAM      JEFF        MOBILE   AL

46015   STEEN       BILL        MOBILE   AL

As AWK reads the input, the entire record is assigned to the variable $0. Each field, as split with the field separator, is assigned to the variables $1, $2, $3, and so on. A line contains essentially an unlimited number of fields, with each field being accessed by the field number. Thus, the command

awk '{print $1,$2,$3,$4,$5}' names

would result in a printout of

46012 DULANEY EVAN MOBILE AL

46013 DURHAM JEFF MOBILE AL

46015 STEEN BILL MOBILE AL

$ awk '{print $2,$3}' emp_names

DULANEY EVAN

DURHAM JEFF

STEEN BILL

Working with Patterns

You can select the action to take place only on certain records, and not on all records, by including a pattern that must be matched. The simplest form of pattern matching is that of a search, wherein the item to be matched is included in slashes (/pattern/). For example, to perform the earlier action only on those employees who live in Alabama:

$ awk '/AL/ {print $3,$2}' emp_names

EVAN DULANEY

JEFF DURHAM

BILL STEEN

Braces and Field Separators

The field separator differentiating one field from another need not always be white space; it can be any discernible character. To illustrate, assume the emp_names file separated the fields with colons instead of tabs:

$ cat emp_names

46012:DULANEY:EVAN:MOBILE:AL

46013:DURHAM:JEFF:MOBILE:AL

46015:STEEN:BILL:MOBILE:AL

If you attempted to print the last name by specifying that you wanted the second field with

$ awk '{print $2}' emp_names

you would end up with twelve blank lines. Because there are no spaces in the file, there are no discernible fields beyond the first one. To solve the problem, AWK must be told that a character other than white space is the delimiter, and there are two methods by which to inform AWK of the new field separator: Use the command-line parameter -F, or specify the variable FS within the program. Both strategies work equally well, with one exception, as illustrated by the following example:

$ awk '{FS=":"}{print $2}' emp_names

DURHAM

STEEN

FELDMAN

Replace the content or modify

First, suppose you have a file called 'file1' that has 2 columns of numbers, and you want to make a new file called 'file2' that has columns 1 and 2 as before, but also adds a third column which is the ratio of the numbers in columns 1 and 2. Suppose you want the new 3-column file (file2) to contain only those lines with column 1 smaller than column 2. Either of the following two commands does what you want:



awk '$1 < $2 {print $0, $1/$2}' file1 > file2



-- or --



cat file1 | awk '$1 < $2 {print $0, $1/$2}' > file2

This is some of the basic functionality. You can get more content on awk through this links.

1.  http://sparky.rice.edu/~hartigan/awk.html

2.  http://www.computing.net/unix/wwwboard/forum/6189.html

3.  http://scitsc.wlv.ac.uk/cgi-bin/mansec?1+awk

Regards,

Pinky

Pinky's Linux Learnings

Thursday, February 7, 2008

AWK Programming

Contributors

Blog Archive