awk

Overview

Like sed, the awk utility can apply a set of commands to one or more datafiles. Run from the command line, or shell script, awk code will look something like this:

awk 'BEGIN {
    statement
    statement
    statement
}
/regular expression/ {
    statement
    statement
    statement
}
END {
    statement
    statement
    statement
} inputfile.xyz | statement(s) > outputfile.xyz

BEGIN and END are executed once. regular expression receives the data stream, searching for matching terms, which, if found, have statements applied against them.

Here is a snippet of code that will parse various parameters from a WebLogic config.xml file:

 ###
 ### Get WebLogic configuration information 
 ### /usr/local/username/config/domain/config.xml"
 ###
 
 awk 'BEGIN{ctr=0}{
     FS="<"
     RS=">"
     for ( i=1; i<=NF; i++ )
     {
         if ($i ~ /^Server /)
         {
            #print $0
            num = split ($0,word," ")
            for (x=num; x >= 0; x--)
            {
               #print word[x]
               if(word[x] ~ /Machine/)
               {
                  Machine=word[x]
               }
               if(word[x] ~ /Name/)
               {
                  Name=word[x]
               }
               if(word[x] ~ /ListenPort/)
               {
                  ListenPort=word[x]
               }
               if(word[x] ~ /ListenAddress/)
               {
                  ListenAddress=word[x]
               }
 
               if(word[x] ~ /<Server/)
                   print Name ":" ListenAddress ":" ListenPort ":" Machine
            }
         }
     }
 }' $BASE/config.xml | sort > $LOGDIR/domaininfo.out

Hello World:

###
### hello.sh
###
### Hello world
###

awk 'BEGIN {
    print "hello world"
}'

Sort contents of a file

###
### print1.sh
###
### Print 2nd and 4th fields
###
### Usage:
###    print1.sh filename
###

xfile=$1 

awk '{
    print $2, $4
}' $xfile | sort

Derive number of records in a file

###
### recnt.sh
###
### Usage:
###    recnt.sh filename
###
### Derive number of fields in a file
###

xfile=$1

awk 'BEGIN {
    print
    print "Number of records"
    print
}
{
    print $2, $1
}
END {
    print
    print "Number of records: "  NR
}' $xfile

Derive ratios between columns

###
### ratio.sh
###
### Usage:
###
###    ratio.sh filename
###
### filename has 2 columns of numbers. 
### Print columns 1 and 2 as well as a  
### third column which is the ratio of 
### the numbers in columns 1 and 2. 
###

xfile=$1 

awk '$1 < $2 {
    print $0, $1/$2
}' $xfile

If pattern does not match

###
### notmatch.sh
###
### Usage:
###
###    notmatch.sh filename
###
###
file=$1

awk '$0 !~ /a href/ {
        print $0
    }' $file

Print hyperlinks in an html file

###
### htmlpattern.sh
###
### Usage:
###
###    htmlpattern.sh filename.html
###
### Print hyperlinks in a file
###
file=$1

awk 'BEGIN {
        RS="<"
        FS=">"
    } $1 ~ /a href/ {
        print "Found " $1
    }' $file

Increment

###
### relational.sh
###
### Usage:
###
###    relational.sh filename
###
###
file=$1

awk '$0 ~ /a href/ {
        num++
        xyz += num
        print $num ", " $xyz
    }' $file

Logical

###
### logical.sh
###
### Usage:
###
###    logical.sh /etc/passwd
###
###  If the third field is both greater than or equal to 100
###  and less than 200, print the record
###
file=$1

awk 'BEGIN { 
         FS=":" 
     }
     $3 >= 100 && $3 < 200 {
         print $0
     }' $file

Print Arguments

###
### logical.sh
###
### Usage:
###
###    logical.sh /etc/passwd
###
###  If the third field is both greater than or equal to 100
###  and less than 200, print the record
###
awk 'BEGIN {
     for (i = 1; i < ARGC; i++)
        printf "%s ", ARGV[i]
        printf "\n"
     }' $*

Variables

###
### variables.sh
###
### Usage:
###
###    variables.sh filename keyword1 keyword2
###
### Print value of various variables
###

awk -v key1=$2 -v key2=$3 ' 

     $0 ~ /key phrase/ {

     printf "\n\n---------------------------------------------\n\n"

     printf "Current input record: 			\n\n\t\"%s\"", $0

     printf "\n\n"

     print "Key phrase: " key1 ", " key2 

     printf "\n\nRecords read so far (NR):			\t\t\"%s\"", 	NR
     printf "\nFields in the current record (NF):		\t\t\"%s\"", 	NF
     printf "\nValue of 1st field ($1):				\t\"%s\"", 	$1
     printf "\nValue of 2nd field ($2):				\t\"%s\"", 	$2
     printf "\nOutput field separator (OF):			\t\t\"%s\"", 	OF
     printf "\nOutput record separator (OR):			\t\t\"%s\"", 	OR
     printf "\nFilename of current input file (FILENAME):	\t\t\"%s\"", 	FILENAME
     printf "\nPrint format for floating point (OFMT):		\t\t\"%s\"", 	OFMT
     printf "\nNumber of command-line arguments (ARGC):		\t\"%s\"", 	ARGC
     printf "\nFirst command-line argument (ARGV[1]):		\t\t\"%s\"", 	ARGV[1]
     printf "\nRecord number in current file (FNR):		\t\t\"%s\"", 	FNR
     printf "\nLength of string matched (RLENGTH):		\t\t\"%s\"", 	RLENGTH
     printf "\nStart of string matched (RSTART):		\t\t\"%s\"", 	RSTART
     printf "\nSubscript separator (SUBSEP):			\t\t\"%s\"", 	SUBSEP
     printf "\nHOME Env Variable (ENVIRON[\"HOME\"]): 		\t\t\"%s\"", 	ENVIRON["HOME"]
     printf "\nHOSTNAME Env Variable (ENVIRON[\"HOSTNAME\"]):	\t\t\"%s\"", 	ENVIRON["HOSTNAME"]
     printf "\nInput field separator (FS):			\t\t\"%s\"", 	FS
     printf "\nInput record separator (RS):			\t\t\"%s\"", 	RS

     printf "\n\n"

}' $1

Built-in variables

NR Number of records read so far

NF Number of fields in the current record

FS Input field separator. Default is whitespace

RS Input record separator. Default is newline

$0 Current input record

$n Value of nth field of current input record

$1 Value of 1st field of current input record

$2 Value of 2nd field of current input record

OF Output field separator. Default is a space.

OR Output record separator. Default is a newline.

FILENAME Filename of current input file

OFMT Output print format for floating point. Default is %.6g, which outputs a value having six digits to the right of the decimal point.

ARGC Number of command-line arguments

ARGV Array of command-line arguments

FNR Record number in current file

RLENGTH Length of string matched by match function

RSTART Start of string matched by match function

SUBSEP Subscript separator

Operators

+ Add

- Subtract

* Multiply

/ Divide

% Modulo

== Equality

!= Not equal

> Greater than

< Less than

>= Greater than or equal

<= Less than or equal

++ Increase by one

++ Decrease by one

+= Add and assign

-= Subtract and assign

*= Multiply and assign

/= Divide and assign

%= Modulo and assign

&& Both expressions must be true

|| Either expression can be true

NR	Number of records read so far
NF	Number of fields in the current record
FS	Input field separator. Default is whitespace
RS	Input record separator. Default is newline
$0	Current input record
$n	Value of nth field of current input record
$1	Value of 1st field of current input record
$2	Value of 2nd field of current input record
OF	Output field separator. Default is a space.
OR	Output record separator. Default is a newline.
FILENAME	Filename of current input file
OFMT	Output print format for floating point. Default is %.6g, which outputs a value having six digits to the right of the decimal point.
ARGC	Number of command-line arguments
ARGV	Array of command-line arguments
FNR	Record number in current file
RLENGTH	Length of string matched by match function
RSTART	Start of string matched by match function
SUBSEP	Subscript separator

+	Add
-	Subtract
*	Multiply
/	Divide
%	Modulo
==	Equality
!=	Not equal
>	Greater than
<	Less than
>=	Greater than or equal
<=	Less than or equal
++	Increase by one
++	Decrease by one
+=	Add and assign
-=	Subtract and assign
*=	Multiply and assign
/=	Divide and assign
%=	Modulo and assign
&&	Both expressions must be true
\|\|	Either expression can be true