OReilly Sed And Awk 2nd Edition May 1997 ISBN 1565922255 pdf

  By Dale Dougherty & Arnold Robbins; ISBN 1-56592-225-5, 432 pages. Second Edition, March 1997. for this book.) Index Table of Contents

   . All Rights Reserved.

  Preface Preface Contents:

  sed and awk. These utilities have many things This book is about a set of oddly named UNIX utilities, in common, including the use of regular expressions for pattern matching. Since pattern matching is such an important part of their use, this book explains UNIX regular expression syntax very thoroughly.

  Because there is a natural progression in learning from grep to sed to awk, we will be covering all three programs, although the focus is on sed and awk.

  Sed and awk are tools used by users, programmers, and system administrators - anyone working with text files. Sed, so called because it is a stream editor, is perfect for applying a series of edits to a number of files. Awk, named after its developers Aho, Weinberger, and Kernighan, is a programming language that permits easy manipulation of structured data and the generation of formatted reports. This book emphasizes the POSIX definition of awk. In addition, the book briefly describes the original version of awk, before discussing three freely available versions of awk and two commercial ones, all of which implement POSIX awk.

  The focus of this book is on writing scripts for sed and awk that quickly solve an assortment of problems for the user. Many of these scripts could be called "quick-fixes." In addition, we'll cover scripts that solve larger problems that require more careful design and development.

  Scope of This Handbook

  

  a progression in functionality from sed to awk. Both share a similar command-line syntax, accepting user instructions in the form of a script.

  

, describes UNIX regular expression syntax in full

  detail. New users are often intimidated by these strange expressions, used for pattern matching. It is important to master regular expression syntax to get the most from sed and awk. The pattern-matching examples in this chapter largely rely on grep and egrep.

  

  elements of writing a sed script using only a few sed commands. It also presents a shell script that simplifies invoking sed scripts.

   , divide the sed command set

  into basic and advanced commands. The basic commands are commands that parallel manual editing actions, while the advanced commands introduce simple programming capabilities. Among the advanced commands are those that manipulate the hold space, a set-aside temporary buffer.

   , begins a five-chapter section on awk. This chapter presents the

  primary features of this scripting language. A number of scripts are explained, including one that modifies the output of the ls command.

  

  awk (gawk) from the Free Software Foundation, and mawk, by Michael Brennan. The latter three all have freely available source code. This chapter also describes two commercial implementations, MKS awk and Thomson Automation awk ( tawk), as well as VSAwk, which brings awk-like capabilities to the Visual Basic environment.

   , presents two longer, more complex awk scripts that together second script processes and formats the index for a book or a master index for a set of books.

   , presents the full listings for the spellcheck.awk script and

  the masterindex shell script described in .

  Availability of sed and awk

   Index: Symbols and Numbers

  & (ampersand) && (logical AND) operator :

  

  in replacement text

  

  

  • (asterisk)

  

  • = (assignment) operator :

  

  • = (assignment) operator :

  

  as metacharacter

  

  

  multiplication operator :

  

  \ (backslash)

  

   \<, \> escape sequences

  

  

  \`, \' escape sequences :

  

  

  

  as metacharacter

  

  

  in replacement text

  

  

  {} (braces) \{\} metacharacters

  

  in awk

  

  

  

  grouping sed commands in

  

  

  [] (brackets) metacharacters

  

  

  

  

  [..] metacharacters :

  

  

  

  ^ (circumflex)

  

  character classes and

  

  

  

  

  as metacharacter

  

  

  in multiline pattern space :

  

  : (colon) for labels :

  

  $ (dollar sign) as end-of-line metacharacter

  

  

  for last input line :

  

  in multiline pattern space :

   $0, $1, $2, ...

  

  

  . (dot) metacharacter

  

  

  

  = (equal sign)

  

  

  

  ! (exclamation point)

  

  != (not equal to) operator :

  

  !~ (does not match) operator

  

  

  branch command versus :

  

  

  

  

  

  > (greater than sign) >= (greater than or equal to) operator :

  

  for redirection

  

  

  

  relational operator :

  

  • (hyphen)

  

  

  

  

  

  

  < (less than sign) <= (less than or equal to) operator :

  

  relational operator :

  

  # for comments

  

  

  #n for suppressing output :

  

  #!, invoking awk with

  

  () (parentheses)

  

  

  

  with replacing text :

  

  % (percent sign)

  

  for format specifications :

  

  • (plus)
  • = (assignment) operator :

  

  • (increment) operator :

  

  addition operator :

  

  

  

  as metacharacter

  

  

  ? (question mark) ?: (conditional) operator

  

  

  as metacharacter

  

  

  ; (semicolon)

  

  

  ' (single quotes)

  

  

  / (slash) /= (assignment) operator :

  

  // as delimiter

  

  

  

  

  in ed commands :

  

  pattern addressing

  

  ~ (match) operator

  

  

  | (vertical bar) || (logical OR) operator :

  

  as metacharacter

  

  

  

  

   All Rights Reserved.

   Index: A

  

  

  abort statement (tawk) :

  

  

  

  addition (+) operator :

  

  addresses, line

  

  

  

  addressing by pattern

  

  

  

  adj script (example) : alignment of output fields :

  

  ampersand (&) && (logical AND) operator :

  

  in replacement text

  

  

  anchors

  

  

  AND (&&) operator :

  

   )

  

  ARGI variable (tawk) :

  

  

  

  

  

  

  

  

  

  arithmetic functions

  

  

  arithmetic operators, awk :

  

  arrays

  

  

  deleting elements of

  

  

  

  multidimensional

  

  

  parsing strings into elements

  

  

  

  

  splitting :

  

  system variables that are :

  

  

  

  assigning input to variables :

  

  assignment operators, awk :

  

  

  

  asterisk (*)

  

  • = (assignment) operator :

  

  • = (assignment) operator :

  

  as metacharacter

  

  

  multiplication operator :

  

  

  

  awk

  

  

  

  

   )

  built-in variables

  

  

  command-line syntax

  

  commands

  

  (see also under specific command) documentation for :

  

  

  

   ) invoking with #!

  

  

  

  obtaining : operators

  

  

  

  POSIX standards for :

  

  programming model :

  

  quick reference :

  

   with sed :

  

  system variables : versions of

  

  

  writing scripts in :

  

  AWKPATH variable (gawk) :

  

   Index: B

  

  

  \B escape sequence :

  

  backreferences : (see backslash (\)

  

   \<, \> escape sequences

  

  

  \`, \' escape sequences :

  

  

  

  as metacharacter

  

  

  in replacement text

  

  

  bang (!) : (see

  

  

  

  

  

  

  

  BEGINFILE procedure (tawk) :

  

  beginning ) of word : (see

  Bell Labs awk :

  

  BITFTP :

  

  

  

  variables as Boolean patterns :

  

  

  

  

  breaking lines :

  

  

  

  

  

  built-in functions

  

  

  

  

  

  built-in variables

  

  branch command : (see branching :

  

  

  

  braces {} : \{\} metacharacters

  

  

  in awk

  

  

  

  grouping sed commands in

  

  as metacharacters

  bracket expressions :

  

  brackets []

  

  [..] metacharacters :

  

  

  

   All Rights Reserved.

   Index: C

  

  

  capitalization, converting

  

  

  

  

  case sensitivity

  

  

  

  

  

  

  

  

   character classes :

  

  characters

  

  matching at word start/end :

  

  measured span of

  

  

  metacharacters : (see )

  

  circumflex (^)

  

  character classes and

  

  

  

  

  as metacharacter

  

  

  in multiline pattern space :

  

  close()

  

  

  closing files/pipes

  

  

  

  

  

  

  colon (:) for labels :

  

  

  

  combine script (example) : "command garbled" message

  

  

  command-line options, gawk :

  

  command-line parameters array of :

  

  passing into script :

  

  command-line syntax

  

  commands

  

  (see also under specific command)

  

  grouping

  

  

  

  

  multiple :

  

  order of :

  

  sed

  

  

  

  case :

  

  

  

  CONVFMT variable

  

  

  

  cos() :

  

  

  

  

  cross-referencing scheme :

  

  csh shell

  

  

  curly braces : (see customizing functions :

  

  converting :

   comments

  

  

  

  in awk scripts :

  

  

  

  comparing

  

  

  concatenation

  

  

  

  conditional statements

  

  

  constants : constants, hexadecimal (tawk) :

  

  

  

  

  

  

  

   All Rights Reserved.

   Index: D

  

  

  lines

  

  

  deleting array elements

  delete command (sed) : (see ) delete statement (awk)

  

  

  

  

  

  defining functions :

  

  d command (ed) :

  

  

  

   ) debugging :

  

  with P and N commands :

  

  D command (sed)

  

  

  

  

  d command (sed)

  

  delimiters

  

  

  awk

  

  

  

  

  FIELDWIDTHS variable (gawk) :

  

  FS variable

  

  

  

  

  

  for regular expressions

  

  

  

  

  /dev files

  

  

  

  

  division (/) operator :

  

  

  

  

  

  for last input line :

  

  as metacharacter :

  

  in multiline pattern space :

  

   dot (.) metacharacter

  

  

  

  

  

   All Rights Reserved.

   Index: E

  e (constant) :

  

  • e option (sed)

  

  

  

  

  

  

  egrep program :

  

   ) end

  

  

  ENDFILE procedure (tawk) :

  

  "Ending delimiter missing" error :

  

  ENVIRON variable

  

  

  environment variables :

  

  equal sign (=)

  

  

  

  

  

  

  

  

  

  error messages

  

  

  

  

  "command garbled"

  

  

  sed :

  

  errors debugging :

  

  spelling, finding (example) :

  

  

  

  escape sequences, awk

  

  

  escaping : (see exchange command : (see

  

  != (not equal to) operator :

  

  !~ (does not match) operator

  

  

  

  

  

  

  

  

  

  

  

  

  exponentiation

  

  

  

  

  ^ operator :

  

  expressions

  

  

  

  executing as commands :

  

  regular : (see

  

  extensions common awk :

  

  

  

  

   extent of matching :

  

4.4.3. Extracting Contents of a File

   All Rights Reserved.

  • f option (awk)
  • f option (sed)
  • F option (awk)

  

  

  

  extracting contents from :

  

  editing multiple :

  

  

   files closing

  

  FIELDSWIDTHS variable (gawk) :

  

  

  

  

  

   Index: F

  field separators : (see fields for awk records

  

   fflush() :

  

  

  

  factorials :

  

  

  

  

  

  

  

   getting information on :

  

  

  

  multiple :

  

  multiple edits to :

  

  nextfile statement :

  

  reading from

  

  

  retrieving information from :

  

  scripts as : (see

  

  special gawk :

  

  writing to

  

  

  

  fixed strings :

  

  

  

  g (global)

  

  

  numeric :

  

  

  

  w (write) :

  

  flow control

  

  

  

  branching :

  

  

  

  n command :

  

  flushing buffers :

  

  FNR variable

  

  

  for loop

  

  

  formatting awk output :

  

  

  

  

  

  FPAT variable (tawk) :

  

   FS variable

  

  

  

  

  

  

  functions

  

  arithmetic

  

  

  built-in

  

  

  

  creating library of :

  

  scope control (tawk) :

  

  string-related :

  

  time-related (gawk) :

  

  

  

  user-defined :

   All Rights Reserved.

   Index: G

  g command (ed) :

  

  G command (sed)

  

  

  g flag

  

  

  gawk (GNU awk)

  

  

  built-in functions :

  

  

  

  multiple files and :

  

  generating random numbers :

  

  gensub() :

  

  gent script : )

  

  getline function

  

  

  global addressing :

  

  

  

  global command : (see ) glossary program (example) :

  

   ) GNU project : GNU sed, error messages :

  

  greater than sign (>)

  >= (greater than or equal to) operator :

  for redirection

  

  

  relational operator :

  

  grep utility

  

  

  

   grouping sed commands

  

  gsub()

  

   All Rights Reserved.

   All Rights Reserved.

  hold command

  

  

  

  

  

   hyphen (-)

  

   hold space

  

  

   Index: H

  

  

  

  

  

  

  H command :

  

  

  h command (sed)

   Index: I

  i command (sed) :

  

  I/O : (see

  

  

  

  in operator

  

  

  increment (++) operator :

  

  index()

  

  

  index, array

  

  

  

  

  

  

  input :

  

  

  

  

  

  insert command : (see instructions, awk :

  

  int() :

  

  Interleaf files, converting :

   All Rights Reserved.

   Index: K

  Korn shell :

   All Rights Reserved.

   Index: L

  deleting

  

  

  

  line editors :

  

  lines continuing after breaks :

  

  

  

  

  matching over multiple :

  

  matching start/end of

  

  

  

  

  

  line addresses

  l command (sed) :

  

  

  labels :

  

  length()

  

  

  length, string :

  less than sign (<) <= (less than or equal to) operator :

  

  

  relational operator :

  

  library of functions :

  

  

  

  

  

  

  list command : (see

  

  

  

  

  

  logical AND (&&) operator :

  

  logical NOT (!) operator :

  

  logical OR (||) operator :

  

  loops

  

  

  arrays and :

  

  main input loop :

  

  lowercase : (see ) ls command (UNIX) :

  

All Rights Reserved.

  [] (brackets)

  menu-based command generator (example) :

  

  

  \{\}

  

  

  \ (backslash)

  

  

  

  

  metacharacters

  

  

   Index: M

  

  

  

   mawk (Michael's awk)

  

  

  match()

  

  

   match (~) operator

  

  m1 script (example) : main input loop :

  • (asterisk)

  

  

  ^ (circumflex)

  

  

  $ (dollar sign)

  

  

  . (dot)

  

  

  

  • (plus)

  

  

  

  ? (question mark)

  

  

  | (vertical bar)

  

  

  \> (end of word)

  

  

  \< (start of word)

  

  

  awk regular expression :

  

  

  

  

  • mf option (awk) :

  

  Michael's awk : (see

  

  MKS Toolkit : modularization :

  

  

  

  • mr option (awk) :

  

  multidimensional arrays :

   multiline ) matching :

  

  

  

  

  

  multiple

  

  

  

  conditional statements :

  

  edits to one file :

  

  files

  

  

  

   All Rights Reserved.

   Index: N

  

  

  

  

  with P and D commands :

  

  

  

  • n option (sed)

  

  

  names filenames, special

  

  

  script files :

  

  

  

  nested conditional statements :

  

  newline characters

  

  

  

  

  . (dot) and :

  

  

  

  ORS variable for

  

  

  in replacement strings :

  

  RS variable for

  

  

  

  newsgroup, awk :

   ) next statement (awk)

  

  

   ) nextfile statement :

  

  NF variable

  

  

  non-English characters :

  

  

  

  NR variable

  

  

  

  nroff, stripping non-printable characters :

  

  numbered replacement strings :

  

  numbers arithmetic functions

  

  

  

  

  converting to strings :

  

  factorials :

  

  hexadecimal

  

  

  

  

  

  

  output precision :

  

  random :

  

  truncating :

  

  

   All Rights Reserved.

   Index: O

  obtaining commercial awk versions

  

  

  

  examples :

  

  

  

  sample programs : octal numbers :

  

  octothorpe (#) for comments

  

  

  #n for suppressing output :

  

  OFMT variable

  

  

  

  

   arithmetic :

  

  assignment :

  

  Boolean :

  

  postfix versus prefix :

  

  relational :

  

  options

  

  

  (see also under specific option)

  

  combining :

  

  sed :

  

  OR (||) operator :

  

  

  

  commands in scripts :

  

  ORS variable

  

  

  output :

  

  buffer, flushing :

  

  directing to files

  

  

  

  

  

  

  

  

  

  

  suppressing

  

  

  

  

  to terminal :

   All Rights Reserved.

   Index: P

  

  

  P command (sed) :

  

  with N and D commands :

  

  

  

  parameters, function :

  

  

  

  parentheses ()

  

  

  

  with replacing text :

  

  

  

  referencing fields

  

  

  strings into array elements

  

  

  pattern addressing

  

  pattern matching

  

  

  

   in awk :

  

  

  

  extent of match :

  

  fixed strings :

   metacharacters for : (see over multiple lines :

  

  phrases :

  

  at start/end of words :

  

  pattern space

  

  

   ) insert and append commands and :

  

  

  

  patterns

  

  

   percent sign (%)

  

  for format specifications :

  

  modulo operator :

  

  phonebill script (example) :

  

  phrases :

  

  pipelined edits :

  

  pipes

  

  

  

  

  

  reading input from :

  

   plus (+)

  • = (assignment) operator :

  

  • (increment) operator :

  

  addition operator :

  

  

  

  as metacharacter

  

   POSIX character class additions

  

  

  

  option conbinations, standard for :

  

  

  

  

  • posix option (gawk) :

11.2.3.1. Command line options

  

  precedence, operation

  

  precision modifier :

  

  prefix operators :

  

  print command (sed) : (see ) print statement (awk)

  

  

  

  printerr()

  printf statement

  

  

  printing

  

  line addresses with = :

  

  procedures

  

  

  PROCINFO array (gawk) :

  programming

  

  sed, tips for :

  

  prompts :

  

  

  

   All Rights Reserved.

   Index: Q

  

  

  question mark (?) ?: (conditional) operator

  

  

  as metacharacter

  

  

   quotation marks (')

  

   All Rights Reserved.

  • re-interval option (gawk) :

  

  arrays and :

  

  

  

  

  

  

  NR variable

  

  

  

  

  RT variable (gawk) :

  

  separators for : (see redirection

  

  

  records, awk

   Index: R

  

  r command (sed) :

  

  

  

  

  

   )

  read command : (see reading :

  

  

  from files

  

  

  from pipe :

  

   RECLEN variable (tawk) :

  

  

   referencing fields

  

  

  regular expressions

  

  

  

  

  delimiters for

  

  

  dynamic, faking :

  

  ed and :

  

  examples of :

  

  

  

  metacharacters for : (see

  

  RS variable as

  

  

  tawk and :

  

  union of

  

  

  

  

  relational expressions :

  

  replacement metacharacters :

  

  numbered saves :

  

  replacing text

  

  

  

  

  return statement

  

   RS variable :

  regular expression for

  

  

  RT variable (gawk) :

  

  rules, pattern-matching : (see )

   All Rights Reserved.

   Index: S

  

  

  s command (sed) :

  

  sample programs : saving output

  

  

  

  scope control (tawk) :

  

  scripts

  

  

  

  debugging :

  

  examples of :

  

  

  

   modularizing :

  

  

  

  

  

  

  

  stopping, sed : (see ) search path for awk files :

  

  search-and-replace :

  

  matching extent :

  

  sed

  

  

  with awk :

  

  

  commands

  

  

  

  (see also under specific command) syntax of :

  

  

  

  

  

  obtaining : options

  

  

  (see also under specific option) programming tips for :

  

  quick reference : semicolon (;)

  

  

   shells

  

  

  (see also under specific shell name) sin() :

  

  single quotation marks (')

  

  

  slash (/) /= (assignment) operator :

  

  // as delimiter

  

  

  

  

  in ed commands :

  

  

  

  

  spell checker program (example) :

  

  spellcheck.awk program : split()

  

  

  sprintf() :

  

  

  

  

  srand() :

  

  

  

  start ) of word : (see stream editor

   ) special filenames

  

  

  

  

  sort program :

  

  

  

   )

  

  space characters

  

  special characters

  as awk delimiters

  

  

  as string concatenator :

  

  span, character

  

  

  

  

  strftime() :

  

  strings

  

  

  comparing :

  

  concatenating :

  

  

  

  fixed : (see ) functions for :

  

  

  

  

  

  parsing into array elements

  

  

  substitution functions (awk)

  

  

  substrings :

  

  stripping non-printable characters :

  

  sub()

  

  

  

  SUBSEP variable :

  

   ) substituting text (sed)

  

  

   relacement metacharacters :

  

  substitution functions (awk)

  

  

  substr()

  

  

  

  substrings : (see ) subtraction (-) operator :

  

  syntax command-line

  

  sed commands :

  

  

  

  

  system variables

  

  

  

  

   All Rights Reserved.

   Index: T

  t command (sed) :

  

  

  

  terminal, user :

  

   terminator, record :

  

   ) testing

  

  output :

  

  text blocks :

  

  

  

  tilde (~) operator

  

  

  time management :

  

  tolower()

  

  

  toupper()

  

  

  

   trigonometric functions :

  

  

  

   All Rights Reserved.

   Index: U

  

  union of regular expressions

  uppercase : (see

  

   All Rights Reserved.

   Index: V

  • v option (awk)

  

  

  variables

  

  

  

  as Boolean patterns :

  

  built-in

  

  

  

  

  environment :

  

  

  

  scope control (tawk) :

  

  system : (see versions, awk

  

  

  

  vertical bar (|) as metacharacter

  

  

  

  

  vi editor :

  

  Videosoft VSAwk :