OReilly Sed And Awk 2nd Edition May 1997 ISBN 1565922255 pdf
By Dale Dougherty & Arnold Robbins; ISBN 1-56592-225-5, 432 pages. Second Edition, March 1997. for this book.) Index Table of Contents
. All Rights Reserved.
Preface Preface Contents:
sed and awk. These utilities have many things This book is about a set of oddly named UNIX utilities, in common, including the use of regular expressions for pattern matching. Since pattern matching is such an important part of their use, this book explains UNIX regular expression syntax very thoroughly.
Because there is a natural progression in learning from grep to sed to awk, we will be covering all three programs, although the focus is on sed and awk.
Sed and awk are tools used by users, programmers, and system administrators - anyone working with text files. Sed, so called because it is a stream editor, is perfect for applying a series of edits to a number of files. Awk, named after its developers Aho, Weinberger, and Kernighan, is a programming language that permits easy manipulation of structured data and the generation of formatted reports. This book emphasizes the POSIX definition of awk. In addition, the book briefly describes the original version of awk, before discussing three freely available versions of awk and two commercial ones, all of which implement POSIX awk.
The focus of this book is on writing scripts for sed and awk that quickly solve an assortment of problems for the user. Many of these scripts could be called "quick-fixes." In addition, we'll cover scripts that solve larger problems that require more careful design and development.
Scope of This Handbook
a progression in functionality from sed to awk. Both share a similar command-line syntax, accepting user instructions in the form of a script.
, describes UNIX regular expression syntax in full
detail. New users are often intimidated by these strange expressions, used for pattern matching. It is important to master regular expression syntax to get the most from sed and awk. The pattern-matching examples in this chapter largely rely on grep and egrep.
elements of writing a sed script using only a few sed commands. It also presents a shell script that simplifies invoking sed scripts.
, divide the sed command set
into basic and advanced commands. The basic commands are commands that parallel manual editing actions, while the advanced commands introduce simple programming capabilities. Among the advanced commands are those that manipulate the hold space, a set-aside temporary buffer.
, begins a five-chapter section on awk. This chapter presents the
primary features of this scripting language. A number of scripts are explained, including one that modifies the output of the ls command.
awk (gawk) from the Free Software Foundation, and mawk, by Michael Brennan. The latter three all have freely available source code. This chapter also describes two commercial implementations, MKS awk and Thomson Automation awk ( tawk), as well as VSAwk, which brings awk-like capabilities to the Visual Basic environment.
, presents two longer, more complex awk scripts that together second script processes and formats the index for a book or a master index for a set of books.
, presents the full listings for the spellcheck.awk script and
the masterindex shell script described in .
Availability of sed and awk
Index: Symbols and Numbers
& (ampersand) && (logical AND) operator :
in replacement text
- (asterisk)
- = (assignment) operator :
- = (assignment) operator :
as metacharacter
multiplication operator :
\ (backslash)
\<, \> escape sequences
\`, \' escape sequences :
as metacharacter
in replacement text
{} (braces) \{\} metacharacters
in awk
grouping sed commands in
[] (brackets) metacharacters
[..] metacharacters :
^ (circumflex)
character classes and
as metacharacter
in multiline pattern space :
: (colon) for labels :
$ (dollar sign) as end-of-line metacharacter
for last input line :
in multiline pattern space :
$0, $1, $2, ...
. (dot) metacharacter
= (equal sign)
! (exclamation point)
!= (not equal to) operator :
!~ (does not match) operator
branch command versus :
> (greater than sign) >= (greater than or equal to) operator :
for redirection
relational operator :
- (hyphen)
< (less than sign) <= (less than or equal to) operator :
relational operator :
# for comments
#n for suppressing output :
#!, invoking awk with
() (parentheses)
with replacing text :
% (percent sign)
for format specifications :
- (plus)
- = (assignment) operator :
- (increment) operator :
addition operator :
as metacharacter
? (question mark) ?: (conditional) operator
as metacharacter
; (semicolon)
' (single quotes)
/ (slash) /= (assignment) operator :
// as delimiter
in ed commands :
pattern addressing
~ (match) operator
| (vertical bar) || (logical OR) operator :
as metacharacter
All Rights Reserved.
Index: A
abort statement (tawk) :
addition (+) operator :
addresses, line
addressing by pattern
adj script (example) : alignment of output fields :
ampersand (&) && (logical AND) operator :
in replacement text
anchors
AND (&&) operator :
)
ARGI variable (tawk) :
arithmetic functions
arithmetic operators, awk :
arrays
deleting elements of
multidimensional
parsing strings into elements
splitting :
system variables that are :
assigning input to variables :
assignment operators, awk :
asterisk (*)
- = (assignment) operator :
- = (assignment) operator :
as metacharacter
multiplication operator :
awk
)
built-in variables
command-line syntax
commands
(see also under specific command) documentation for :
) invoking with #!
obtaining : operators
POSIX standards for :
programming model :
quick reference :
with sed :
system variables : versions of
writing scripts in :
AWKPATH variable (gawk) :
Index: B
\B escape sequence :
backreferences : (see backslash (\)
\<, \> escape sequences
\`, \' escape sequences :
as metacharacter
in replacement text
bang (!) : (see
BEGINFILE procedure (tawk) :
beginning ) of word : (see
Bell Labs awk :
BITFTP :
variables as Boolean patterns :
breaking lines :
built-in functions
built-in variables
branch command : (see branching :
braces {} : \{\} metacharacters
in awk
grouping sed commands in
as metacharacters
bracket expressions :
brackets []
[..] metacharacters :
All Rights Reserved.
Index: C
capitalization, converting
case sensitivity
character classes :
characters
matching at word start/end :
measured span of
metacharacters : (see )
circumflex (^)
character classes and
as metacharacter
in multiline pattern space :
close()
closing files/pipes
colon (:) for labels :
combine script (example) : "command garbled" message
command-line options, gawk :
command-line parameters array of :
passing into script :
command-line syntax
commands
(see also under specific command)
grouping
multiple :
order of :
sed
case :
CONVFMT variable
cos() :
cross-referencing scheme :
csh shell
curly braces : (see customizing functions :
converting :
comments
in awk scripts :
comparing
concatenation
conditional statements
constants : constants, hexadecimal (tawk) :
All Rights Reserved.
Index: D
lines
deleting array elements
delete command (sed) : (see ) delete statement (awk)
defining functions :
d command (ed) :
) debugging :
with P and N commands :
D command (sed)
d command (sed)
delimiters
awk
FIELDWIDTHS variable (gawk) :
FS variable
for regular expressions
/dev files
division (/) operator :
for last input line :
as metacharacter :
in multiline pattern space :
dot (.) metacharacter
All Rights Reserved.
Index: E
e (constant) :
- e option (sed)
egrep program :
) end
ENDFILE procedure (tawk) :
"Ending delimiter missing" error :
ENVIRON variable
environment variables :
equal sign (=)
error messages
"command garbled"
sed :
errors debugging :
spelling, finding (example) :
escape sequences, awk
escaping : (see exchange command : (see
!= (not equal to) operator :
!~ (does not match) operator
exponentiation
^ operator :
expressions
executing as commands :
regular : (see
extensions common awk :
extent of matching :
4.4.3. Extracting Contents of a File
All Rights Reserved.
- f option (awk)
- f option (sed)
- F option (awk)
extracting contents from :
editing multiple :
files closing
FIELDSWIDTHS variable (gawk) :
Index: F
field separators : (see fields for awk records
fflush() :
factorials :
getting information on :
multiple :
multiple edits to :
nextfile statement :
reading from
retrieving information from :
scripts as : (see
special gawk :
writing to
fixed strings :
g (global)
numeric :
w (write) :
flow control
branching :
n command :
flushing buffers :
FNR variable
for loop
formatting awk output :
FPAT variable (tawk) :
FS variable
functions
arithmetic
built-in
creating library of :
scope control (tawk) :
string-related :
time-related (gawk) :
user-defined :
All Rights Reserved.
Index: G
g command (ed) :
G command (sed)
g flag
gawk (GNU awk)
built-in functions :
multiple files and :
generating random numbers :
gensub() :
gent script : )
getline function
global addressing :
global command : (see ) glossary program (example) :
) GNU project : GNU sed, error messages :
greater than sign (>)
>= (greater than or equal to) operator :
for redirection
relational operator :
grep utility
grouping sed commands
gsub()
All Rights Reserved.
All Rights Reserved.
hold command
hyphen (-)
hold space
Index: H
H command :
h command (sed)
Index: I
i command (sed) :
I/O : (see
in operator
increment (++) operator :
index()
index, array
input :
insert command : (see instructions, awk :
int() :
Interleaf files, converting :
All Rights Reserved.
Index: K
Korn shell :
All Rights Reserved.
Index: L
deleting
line editors :
lines continuing after breaks :
matching over multiple :
matching start/end of
line addresses
l command (sed) :
labels :
length()
length, string :
less than sign (<) <= (less than or equal to) operator :
relational operator :
library of functions :
list command : (see
logical AND (&&) operator :
logical NOT (!) operator :
logical OR (||) operator :
loops
arrays and :
main input loop :
lowercase : (see ) ls command (UNIX) :
All Rights Reserved.
[] (brackets)
menu-based command generator (example) :
\{\}
\ (backslash)
metacharacters
Index: M
mawk (Michael's awk)
match()
match (~) operator
m1 script (example) : main input loop :
- (asterisk)
^ (circumflex)
$ (dollar sign)
. (dot)
- (plus)
? (question mark)
| (vertical bar)
\> (end of word)
\< (start of word)
awk regular expression :
- mf option (awk) :
Michael's awk : (see
MKS Toolkit : modularization :
- mr option (awk) :
multidimensional arrays :
multiline ) matching :
multiple
conditional statements :
edits to one file :
files
All Rights Reserved.
Index: N
with P and D commands :
- n option (sed)
names filenames, special
script files :
nested conditional statements :
newline characters
. (dot) and :
ORS variable for
in replacement strings :
RS variable for
newsgroup, awk :
) next statement (awk)
) nextfile statement :
NF variable
non-English characters :
NR variable
nroff, stripping non-printable characters :
numbered replacement strings :
numbers arithmetic functions
converting to strings :
factorials :
hexadecimal
output precision :
random :
truncating :
All Rights Reserved.
Index: O
obtaining commercial awk versions
examples :
sample programs : octal numbers :
octothorpe (#) for comments
#n for suppressing output :
OFMT variable
arithmetic :
assignment :
Boolean :
postfix versus prefix :
relational :
options
(see also under specific option)
combining :
sed :
OR (||) operator :
commands in scripts :
ORS variable
output :
buffer, flushing :
directing to files
suppressing
to terminal :
All Rights Reserved.
Index: P
P command (sed) :
with N and D commands :
parameters, function :
parentheses ()
with replacing text :
referencing fields
strings into array elements
pattern addressing
pattern matching
in awk :
extent of match :
fixed strings :
metacharacters for : (see over multiple lines :
phrases :
at start/end of words :
pattern space
) insert and append commands and :
patterns
percent sign (%)
for format specifications :
modulo operator :
phonebill script (example) :
phrases :
pipelined edits :
pipes
reading input from :
plus (+)
- = (assignment) operator :
- (increment) operator :
addition operator :
as metacharacter
POSIX character class additions
option conbinations, standard for :
- posix option (gawk) :
11.2.3.1. Command line options
precedence, operation
precision modifier :
prefix operators :
print command (sed) : (see ) print statement (awk)
printerr()
printf statement
printing
line addresses with = :
procedures
PROCINFO array (gawk) :
programming
sed, tips for :
prompts :
All Rights Reserved.
Index: Q
question mark (?) ?: (conditional) operator
as metacharacter
quotation marks (')
All Rights Reserved.
- re-interval option (gawk) :
arrays and :
NR variable
RT variable (gawk) :
separators for : (see redirection
records, awk
Index: R
r command (sed) :
)
read command : (see reading :
from files
from pipe :
RECLEN variable (tawk) :
referencing fields
regular expressions
delimiters for
dynamic, faking :
ed and :
examples of :
metacharacters for : (see
RS variable as
tawk and :
union of
relational expressions :
replacement metacharacters :
numbered saves :
replacing text
return statement
RS variable :
regular expression for
RT variable (gawk) :
rules, pattern-matching : (see )
All Rights Reserved.
Index: S
s command (sed) :
sample programs : saving output
scope control (tawk) :
scripts
debugging :
examples of :
modularizing :
stopping, sed : (see ) search path for awk files :
search-and-replace :
matching extent :
sed
with awk :
commands
(see also under specific command) syntax of :
obtaining : options
(see also under specific option) programming tips for :
quick reference : semicolon (;)
shells
(see also under specific shell name) sin() :
single quotation marks (')
slash (/) /= (assignment) operator :
// as delimiter
in ed commands :
spell checker program (example) :
spellcheck.awk program : split()
sprintf() :
srand() :
start ) of word : (see stream editor
) special filenames
sort program :
)
space characters
special characters
as awk delimiters
as string concatenator :
span, character
strftime() :
strings
comparing :
concatenating :
fixed : (see ) functions for :
parsing into array elements
substitution functions (awk)
substrings :
stripping non-printable characters :
sub()
SUBSEP variable :
) substituting text (sed)
relacement metacharacters :
substitution functions (awk)
substr()
substrings : (see ) subtraction (-) operator :
syntax command-line
sed commands :
system variables
All Rights Reserved.
Index: T
t command (sed) :
terminal, user :
terminator, record :
) testing
output :
text blocks :
tilde (~) operator
time management :
tolower()
toupper()
trigonometric functions :
All Rights Reserved.
Index: U
union of regular expressions
uppercase : (see
All Rights Reserved.
Index: V
- v option (awk)
variables
as Boolean patterns :
built-in
environment :
scope control (tawk) :
system : (see versions, awk
vertical bar (|) as metacharacter
vi editor :
Videosoft VSAwk :