The Tomes of Delphi Algorithm Free ebook download

  TE AM FL Y

   The Tomes of Delphi Algorithms and Data Structures

  Julian Bucknall Wordware Publishing, Inc.

  Library of Congress Cataloging-in-Publication Data Bucknall, Julian Tomes of Delphi: algorithms and data structures / by Julian Bucknall. p. cm. Includes bibliographical references and index.

  ISBN 1-55622-736-1 (pbk. : alk. paper) 1. Computer software—Development.

  2. Delphi (Computer file).

  3. Computer algorithms.

  4. Data structures (Computer science) I. Title. QA76.76.D47 .B825 2001 2001033258 005.1--dc21 CIP

  

© 2001, Wordware Publishing, Inc.

  

Code © 2001, Julian Bucknall

All Rights Reserved

2320 Los Rios Boulevard

Plano, Texas 75074

  

No part of this book may be reproduced in any form or by

any means without permission in writing from

Wordware Publishing, Inc.

  

Printed in the United States of America

  ISBN 1-55622-736-1 10 9 8 7 6 5 4 3 2 1 0105 Delphi is a trademark of Inprise Corporation.

  

Other product names mentioned are used for identification purposes only and may be trademarks of their respective companies.

  

All inquiries for volume purchases of this book should be addressed to Wordware Publishing, Inc., at the

above address. Telephone inquiries may be made by calling:

(972) 423-0090

  For Donna and the Greek cats

  Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x

  Chapter 1 What is an Algorithm? . . . . . . . . . . . . . . . . . . . . . . . . 1 What is an Algorithm? . . . . . . . . . . . . . . . . . . . . . . . . . 1 Analysis of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 3 The Big-Oh Notation . . . . . . . . . . . . . . . . . . . . . . . . . 6 Best, Average, and Worst Cases . . . . . . . . . . . . . . . . . 8 Algorithms and the Platform . . . . . . . . . . . . . . . . . . . . . . 8 Virtual Memory and Paging . . . . . . . . . . . . . . . . . . . . . 9 Thrashing . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Locality of Reference . . . . . . . . . . . . . . . . . . . . . 11 The CPU Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Data Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Space Versus Time Tradeoffs . . . . . . . . . . . . . . . . . . . . 14 Long Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Use const . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Be Wary of Automatic Conversions . . . . . . . . . . . . . . . . . 17 Debugging and Testing . . . . . . . . . . . . . . . . . . . . . . . . 18 Assertions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Coverage Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 23 Unit Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Chapter 2 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Array Types in Delphi . . . . . . . . . . . . . . . . . . . . . . . . . 28 Standard Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Dynamic Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 New-style Dynamic Arrays . . . . . . . . . . . . . . . . . . . . . 40 TList Class, an Array of Pointers. . . . . . . . . . . . . . . . . . . . 41 Overview of the TList Class . . . . . . . . . . . . . . . . . . . . . 41 TtdObjectList Class . . . . . . . . . . . . . . . . . . . . . . . . . 43

  Arrays on Disk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

  Chapter 3 Linked Lists, Stacks, and Queues . . . . . . . . . . . . . . . . . 63 Singly Linked Lists. . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Linked List Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Creating a Singly Linked List . . . . . . . . . . . . . . . . . . . . 65 Inserting into and Deleting from a Singly Linked List . . . . . . . 65 Traversing a Linked List . . . . . . . . . . . . . . . . . . . . . . 68 Efficiency Considerations . . . . . . . . . . . . . . . . . . . . . . 69 Using a Head Node . . . . . . . . . . . . . . . . . . . . . . 69 Using a Node Manager . . . . . . . . . . . . . . . . . . . . 70 The Singly Linked List Class . . . . . . . . . . . . . . . . . . . . 76 Doubly Linked Lists . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Inserting and Deleting from a Doubly Linked List . . . . . . . . . 85 Efficiency Considerations . . . . . . . . . . . . . . . . . . . . . . 88 Using Head and Tail Nodes . . . . . . . . . . . . . . . . . . 88 Using a Node Manager . . . . . . . . . . . . . . . . . . . . 88 The Doubly Linked List Class . . . . . . . . . . . . . . . . . . . . 88 Benefits and Drawbacks of Linked Lists . . . . . . . . . . . . . . . . 96 Stacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Stacks Using Linked Lists . . . . . . . . . . . . . . . . . . . . . . 97 Stacks Using Arrays . . . . . . . . . . . . . . . . . . . . . . . . 100 Example of Using a Stack . . . . . . . . . . . . . . . . . . . . . 103 Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Queues Using Linked Lists . . . . . . . . . . . . . . . . . . . . 106 Queues Using Arrays . . . . . . . . . . . . . . . . . . . . . . . 109 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Chapter 4 Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Compare Routines . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Sequential Search . . . . . . . . . . . . . . . . . . . . . . . . . . 118 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 Linked Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Binary Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Linked Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 Inserting into Sorted Containers . . . . . . . . . . . . . . . . . 129 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Chapter 5 Sorting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Sorting Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Shuffling a TList . . . . . . . . . . . . . . . . . . . . . . . . . . 136 Sort Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Slowest Sorts . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Bubble Sort. . . . . . . . . . . . . . . . . . . . . . . . . . 138

  Shaker Sort. . . . . . . . . . . . . . . . . . . . . . . . . . 140 Selection Sort . . . . . . . . . . . . . . . . . . . . . . . . 142 Insertion Sort. . . . . . . . . . . . . . . . . . . . . . . . . 144

  Fast Sorts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Shell Sort. . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Comb Sort . . . . . . . . . . . . . . . . . . . . . . . . . . 150

  Fastest Sorts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 Merge Sort . . . . . . . . . . . . . . . . . . . . . . . . . . 152 Quicksort . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

  Merge Sort with Linked Lists . . . . . . . . . . . . . . . . . . . 176 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

  Chapter 6 Randomized Algorithms . . . . . . . . . . . . . . . . . . . . . 183 Random Number Generation . . . . . . . . . . . . . . . . . . . . 184 Chi-Squared Tests . . . . . . . . . . . . . . . . . . . . . . . . . 185 Middle-Square Method . . . . . . . . . . . . . . . . . . . . . . 188 Linear Congruential Method . . . . . . . . . . . . . . . . . . . 189 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 The Uniformity Test . . . . . . . . . . . . . . . . . . . . . 195 The Gap Test . . . . . . . . . . . . . . . . . . . . . . . . . 195 The Poker Test . . . . . . . . . . . . . . . . . . . . . . . . 197 The Coupon Collector’s Test . . . . . . . . . . . . . . . . . 198 Results of Applying Tests . . . . . . . . . . . . . . . . . . . . . 200 Combining Generators . . . . . . . . . . . . . . . . . . . . 201 Additive Generators . . . . . . . . . . . . . . . . . . . . . 203 Shuffling Generators . . . . . . . . . . . . . . . . . . . . . 205 Summary of Generator Algorithms . . . . . . . . . . . . . . . . 207 Other Random Number Distributions . . . . . . . . . . . . . . . . 208 Skip Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 Searching through a Skip List . . . . . . . . . . . . . . . . . . . 211 Insertion into a Skip List . . . . . . . . . . . . . . . . . . . . . 215 Deletion from a Skip List . . . . . . . . . . . . . . . . . . . . . 218 Full Skip List Class Implementation. . . . . . . . . . . . . . . . 219 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 Chapter 7 Hashing and Hash Tables . . . . . . . . . . . . . . . . . . . . . 227 Hash Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 Simple Hash Function for Strings . . . . . . . . . . . . . . . . . 230 The PJW Hash Functions . . . . . . . . . . . . . . . . . . . . . 230 Collision Resolution with Linear Probing . . . . . . . . . . . . . . 232 Advantages and Disadvantages of Linear Probing . . . . . . . . 233 Deleting Items from a Linear Probe Hash Table . . . . . . . . . 235 The Linear Probe Hash Table Class . . . . . . . . . . . . . . . . 237 Other Open-Addressing Schemes . . . . . . . . . . . . . . . . . . 245 Quadratic Probing . . . . . . . . . . . . . . . . . . . . . . . . . 246

  Pseudorandom Probing . . . . . . . . . . . . . . . . . . . . . . 246 Double Hashing . . . . . . . . . . . . . . . . . . . . . . . . . . 247

  Collision Resolution through Chaining . . . . . . . . . . . . . . . 247 Advantages and Disadvantages of Chaining . . . . . . . . . . . 248 The Chained Hash Table Class . . . . . . . . . . . . . . . . . . 249

  Collision Resolution through Bucketing . . . . . . . . . . . . . . . 259 Hash Tables on Disk . . . . . . . . . . . . . . . . . . . . . . . . . 260

  Extendible Hashing . . . . . . . . . . . . . . . . . . . . . . . . 261 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276

  Chapter 8 Binary Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 Creating a Binary Tree . . . . . . . . . . . . . . . . . . . . . . . . 279 Insertion and Deletion with a Binary Tree . . . . . . . . . . . . . . 279 Navigating through a Binary Tree . . . . . . . . . . . . . . . . . . 281 Pre-order, In-order, and Post-order Traversals . . . . . . . . . . 282 Level-order Traversals . . . . . . . . . . . . . . . . . . . . . . . 288 Class Implementation of a Binary Tree . . . . . . . . . . . . . . . 289 Binary Search Trees . . . . . . . . . . . . . . . . . . . . . . . . . 295 Insertion with a Binary Search Tree. . . . . . . . . . . . . . . . 298 Deletion from a Binary Search Tree . . . . . . . . . . . . . . . . 300 Class Implementation of a Binary Search Tree . . . . . . . . . . 303 Binary Search Tree Rearrangements . . . . . . . . . . . . . . . 304 Splay Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308 Class Implementation of a Splay Tree. . . . . . . . . . . . . . . 309 Red-Black Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 Insertion into a Red-Black Tree . . . . . . . . . . . . . . . . . . 314 Deletion from a Red-Black Tree . . . . . . . . . . . . . . . . . . 319 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 Chapter 9 Priority Queues and Heapsort . . . . . . . . . . . . . . . . . . 331 The Priority Queue . . . . . . . . . . . . . . . . . . . . . . . . . . 331 First Simple Implementation . . . . . . . . . . . . . . . . . . . 332 Second Simple Implementation . . . . . . . . . . . . . . . . . . 335 The Heap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 Insertion into a Heap . . . . . . . . . . . . . . . . . . . . . . . 338 Deletion from a Heap . . . . . . . . . . . . . . . . . . . . . . . 338 Implementation of a Priority Queue with a Heap. . . . . . . . . 340 Heapsort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345 Floyd’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 345 Completing Heapsort . . . . . . . . . . . . . . . . . . . . . . . 346 Extending the Priority Queue . . . . . . . . . . . . . . . . . . . . 348 Re-establishing the Heap Property . . . . . . . . . . . . . . . . 349 Finding an Arbitrary Item in the Heap . . . . . . . . . . . . . . 350 Implementation of the Extended Priority Queue . . . . . . . . . 350 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356

  Chapter 10 State Machines and Regular Expressions . . . . . . . . . . . . 357 State Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 Using State Machines: Parsing . . . . . . . . . . . . . . . . . . 357 Parsing Comma-Delimited Files . . . . . . . . . . . . . . . 363 Deterministic and Non-deterministic State Machines. . . . . . . 366 Regular Expressions . . . . . . . . . . . . . . . . . . . . . . . . . 378 Using Regular Expressions . . . . . . . . . . . . . . . . . . . . 380 Parsing Regular Expressions . . . . . . . . . . . . . . . . . 380 Compiling Regular Expressions . . . . . . . . . . . . . . . 387 Matching Strings to Regular Expressions. . . . . . . . . . . 399 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 Chapter 11 Data Compression . . . . . . . . . . . . . . . . . . . . . . . . 409 Representations of Data . . . . . . . . . . . . . . . . . . . . . . . 409 Data Compression . . . . . . . . . . . . . . . . . . . . . . . . . . 410 Types of Compression . . . . . . . . . . . . . . . . . . . . . . . 410 Bit Streams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411 Minimum Redundancy Compression. . . . . . . . . . . . . . . . . 415 Shannon-Fano Encoding . . . . . . . . . . . . . . . . . . . . . 416 Huffman Encoding . . . . . . . . . . . . . . . . . . . . . . . . 421 Splay Tree Encoding. . . . . . . . . . . . . . . . . . . . . . . . 435 Dictionary Compression . . . . . . . . . . . . . . . . . . . . . . . 445 LZ77 Compression Description . . . . . . . . . . . . . . . . . . 445 Encoding Literals Versus Distance/Length Pairs . . . . . . . 448 LZ77 Decompression . . . . . . . . . . . . . . . . . . . . . 449 LZ77 Compression . . . . . . . . . . . . . . . . . . . . . . 456 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467 Chapter 12 Advanced Topics. . . . . . . . . . . . . . . . . . . . . . . . . . 469 Readers-Writers Algorithm . . . . . . . . . . . . . . . . . . . . . . 469 Producers-Consumers Algorithm. . . . . . . . . . . . . . . . . . . 478 Single Producer, Single Consumer Model . . . . . . . . . . . . . 478 Single Producer, Multiple Consumer Model. . . . . . . . . . . . 486 Finding Differences between Two Files . . . . . . . . . . . . . . . 496 Calculating the LCS of Two Strings . . . . . . . . . . . . . . . . 497 Calculating the LCS of Two Text Files. . . . . . . . . . . . . . . 511 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514 Epilogue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 518

  Introduction

  You’ve just picked this book up in the bookshop, or you’ve bought it, taken it home and opened it, and now you’re wondering…

  Why a Book on Delphi Algorithms? Y

  Although there are numerous books on algorithms in the bookstores, few of them go beyond the standard Computer Science 101 course to approach algo- rithms from a practical perspective. The code that is shown in the book is to

  FL

  illustrate the algorithm in question, and generally no consideration is given to real-life, drop-in-and-use application of the technique being discussed. Even worse, from the viewpoint of the commercial programmer, many are text-

  AM

  books to be used in a college or university course and hence some of the more interesting topics are left as exercises for the reader, with little or no answers.

  TE

  Of course, the vast majority of them don’t use Delphi, Kylix, or Pascal. Some use pseudocode, some C, some C++, some the language du jour; and the most celebrated and referenced algorithms book uses an assembly language that doesn’t even exist (the MIX assembly language in The Art of Computer

  [11,12,13]—see the references section). Indeed, those books

Programming

  that do have the word “practical” in their titles are for C, C++, or Java. Is that such a problem? After all, an algorithm is an algorithm is an algorithm; surely, it doesn’t matter how it’s demonstrated, right? Why bother buying and reading one based on Delphi? Delphi is, I contend, unique amongst the languages and environments used in application development today. Firstly, like Visual Basic, Delphi is an environ- ment for developing applications rapidly, for either 16-bit or 32-bit Windows, or, using Kylix, for Linux. With dexterous use of the mouse, components rain on forms like rice at a wedding. Many double-clicks later, together with a lit- tle typing of code, the components are wedded together, intricately and intimately, with event handlers, hopefully producing a halfway decent-looking application.

  Secondly, like C++, Delphi can get close to the metal, easily accessing the various operating system APIs. Sometimes, Borland produces units to access APIs and sells them with Delphi itself; sometimes, programmers have to pore over C header files in an effort to translate them into Delphi (witness the Jedi project at http://www.delphi-jedi.org). In either case, Delphi can do the job and manipulate the OS subsystems to its own advantage. Delphi programmers do tend to split themselves into two camps: applications programmers and systems programmers. Sometimes you’ll find programmers who can do both jobs. The link between the two camps that both sets of pro- grammers must come into contact with and be aware of is the world of algorithms. If you program for any length of time, you’ll come to the point where you absolutely need to code a binary search. Of course, before you reach that point, you’ll need a sort routine to get the data in some kind of order for the binary search to work properly. Eventually, you might start using a profiler, identify a problem bottleneck in TStringList, and wonder what other data structure could do the job more efficiently.

  Algorithms are the lifeblood of the work we do as programmers. Beginner programmers are often afraid of formal algorithms; I mean, until you are used to it, even the word itself can seem hard to spell! But consider this: a program can be defined as an algorithm for getting information out of the user and producing some kind of output for her. The standard algorithms have been developed and refined by computer scien- tists for use in the programming trenches by the likes of you and me. Mastering the basic algorithms gives you a handle on your craft and on the language you use. For example, if you know about hash tables, their strengths and weaknesses, what they are used for and why, and have an implementa- tion you could use at a moment’s notice, then you will look at the design of the subsystem or application you’re currently working on in a new light, and identify places where you could profitably use one. If sorts hold no terrors for you, you understand how they work, and you know when to use a selection sort versus a quicksort, then you’ll be more likely to code one in your applica- tion, rather than try and twist a standard Delphi component to your needs (for example, a modern horror story: I remember hearing about someone who used a hidden TListBox component, adding a bunch of strings, and then setting the Sorted property to true to get them in order). “OK,” I hear you say, “writing about algorithms is fine, but why bother with

Delphi or Kylix?”

  By the way, let’s set a convention early on; otherwise I shall be writing the phrase “Delphi or Kylix” an awful lot. When I say “Delphi,” I really mean either Delphi or Kylix. Kylix was, after all, known for much of its pre-release

life as “Delphi” for Linux. In this book, then, “Delphi” means either Delphi for Windows or Kylix for Linux. So, why Delphi? Well, two reasons: the Object Pascal language and the oper- ating system. Delphi’s language has several constructs that are not available in other languages, constructs that make encapsulating efficient algorithms and data structures easier and more natural. Things like properties, for exam- ple. Exceptions for when unforeseen errors occur. Although it is perfectly possible to code standard algorithms in Delphi without using these Delphi- specific language constructs, it is my contention that we miss out on the beauty and efficiency of the language if we do. We miss out on the ability to learn about the ins and outs of the language. In this book, we shall deliber- ately be using the breadth of the Object Pascal language in Delphi—I’m not concerned that Java programmers who pick up this book may have difficulty translating the code. The cover says Delphi, and Delphi it will be.

  And the next thing to consider is that algorithms, as traditionally taught, are generic, at least as far as CPUs and operating systems are concerned. They can certainly be optimized for the Windows environment, or souped up for Linux. They can be made more efficient for the various varieties of Pentium processor we use, with the different types of memory caches we have, with the virtual memory subsystem in the OS, and so on. This book pays particular attention to these efficiency gains. We won’t, however, go as far as coding everything in Assembly language, optimized for the pipelined architecture of modern processors—I have to draw the line somewhere! So, all in all, the Delphi community does have need for an algorithms book, and one geared for their particular language, operating system, and proces- sor. This is such a book. It was not translated from another book for another language; it was written from scratch by an author who works with Delphi every day of his life, someone who writes library software for a living and knows about the intricacies of developing commercial ready-to-run routines, classes, and tools.

What Should I Know?

  This book does not attempt to teach you Delphi programming. You will need to know the basics of programming in Delphi: creating new projects, how to write code, compiling, debugging, and so on. I warn you now: there are no components in this book. You must be familiar with classes, procedure and method references, untyped pointers, the ubiquitous TList, and streams as encapsulated by Delphi’s TStream family. You must have some understanding of object-oriented concepts such as encapsulation, inheritance, polymor- phism, and delegation. The object model in Delphi shouldn’t scare you! Having said that, a lot of the concepts described in this book are simple in the extreme. A beginner programmer should find much in the book to teach him or her the basics of standard algorithms and data structures. Indeed, looking at the code should teach such a programmer many tips and tricks of the advanced programmer. The more advanced structures can be left for a rainy day, or when you think you might need them.

  So, essentially, you need to have been programming in Delphi for a while. Every now and then you need some kind of data structure beyond what TList and its family can give you, but you’re not sure what’s available, or even how to use it if you found one. Or, you want a simple sort routine, but the only reference book you can find has code written in C++, and to be honest you’d rather watch paint dry than translate it. Or, you want to read an algorithms book where performance and efficiency are just as prominent as the descrip- tion of the algorithm. This book is for you.

Which Delphi Do I Need?

  Are you ready for this? Any version. With the exception of the section discuss- ing dynamic arrays using Delphi 4 or above and Kylix in Chapter 2, and parts of Chapter 12, and little pieces here and there, the code will compile and run with any version of Delphi. Apart from the small amount of the version- specific code I have just mentioned, I have tested all code in this book with all versions of Delphi and with Kylix.

  You can therefore assume that all code printed in this book will work with every version of Delphi. Some code listings are version-specific though, and have been so noted.

What Will I Find, and Where? This book is divided into 12 chapters and a reference section

  Chapter 1 lays out some ground rules. It starts off by discussing performance. We’ll look at measurement of the efficiency of algorithms, starting out with the big-Oh notation, continuing with timing of the actual run time of algo- rithms, and finishing with the use of profilers. We shall discuss data representation efficiency in regard to modern processors and operating sys- tems, especially memory caches, paging, and virtual memory. After that, the chapter will talk about testing and debugging, topics that tend to be glossed over in many books, but that are, in fact, essential to all programmers.

  Chapter 2 covers arrays. We’ll look at the standard language support for arrays, including dynamic arrays; we’ll discuss the TList class; and we’ll cre- ate a class that encapsulates an array of records. Another specialized array is the string, so we’ll take a look at that too. Chapter 3 introduces linked lists, both the singly and doubly linked varieties. We’ll see how to create stacks and queues by implementing them with both singly linked lists and arrays. Chapter 4 talks about searching algorithms, especially the sequential and the binary search algorithms. We’ll see how binary search helps us to insert items into a sorted array or linked list.

  Chapter 5 covers sorting algorithms. We will look at various types of sorting methods: bubble, shaker, selection, insertion, Shell sort, quicksort, and merge sort. We’ll also sort arrays and linked lists.

  Chapter 6 discusses algorithms that create or require random numbers. We’ll see pseudorandom number generators (PRNGs) and show a remarkable sorted data structure called a skip list, which uses a PRNG in order to help balance the structure.

  Chapter 7 considers hashing and hash tables, why they’re used, and what benefits and drawbacks they have. Several standard hashing algorithms are introduced. One problem that occurs with hash tables is collisions; we shall see how to resolve this by using a couple of types of probing and also by chaining.

  Chapter 8 presents binary trees, a very important data structure in wide gen- eral use. We’ll look at how to build and maintain a binary tree and how to traverse the nodes in the tree. We’ll also address its unbalanced trees created by inserting data in sorted order. A couple of balancing algorithms will be shown: splay trees and red-black trees.

  Chapter 9 deals with priority queues and, in doing so, shows us the heap structure. We’ll consider the important heap operations, bubble up and trickle down, and look at how the heap structure gives us a sort algorithm for free: the heapsort.

  Chapter 10 provides information about state machines and how they can be used to solve a certain class of problems. After some introductory examples with finite deterministic state machines, the chapter considers regular expres- sions, how to parse them and compile them to a finite non-deterministic state machine, and then apply the state machine to accept or reject strings.

  Chapter 11 squeezes in some data compression techniques. Algorithms such as Shannon-Fano, Huffman, Splay, and LZ77 will be shown. Chapter 12 includes a variety of advanced topics that may whet your appetite for researching algorithms and structures. Of course, they still will be useful to your programming requirements.

  Finally, there is a reference section listing references to help you find out more about the algorithms described in this book; these references not only include other algorithms books but also academic papers and articles.

  What Are the Typographical Conventions?

  Normal text is written in this font, at this size. Normal text is used for discus- sions, descriptions, and diversions.

  Code listings are written in this font, at this size.

  Emphasized words or phrases, new words about to be defined, and variables will appear in italic. Dotted throughout the text are World Wide Web URLs and e-mail addresses which are italicized and underlined, like this: http://www.boyet.com/dads.

  Every now and then there will be a note like this. It’s designed to bring out some important point in the narrative, a warning, or a caution.

  What Are These Bizarre $IFDEFs in the Code?

  The code for this book has been written, with certain noted exceptions, to compile with Delphi 1, 2, 3, 4, 5, and 6, as well as with Kylix 1. (Later com- pilers will be supported as and when they come out; please see

  http://www.boyet.com/dads for the latest information.) Even with my best

  efforts, there are sometimes going to be differences in my code between the different versions of Delphi and Kylix. The answer is, of course, to $IFDEF the code, to have certain blocks compile with certain compilers but not others. Borland supplied us with the official WINDOWS, WIN32, and LINUX compiler defines for the platform, and the VERnnn compiler defined for the compiler version. To solve this problem, every source file for this book has an include at the top:

  {$I TDDefine.inc}

This include file defines human-legible compiler defines for the various com- pilers. Here’s the list:

  

DelphiN define for a particular Delphi version, N = 1,2,3,4,5,6 DelphiNPlus define for a particular Delphi version or later, N = 1,2,3,4,5,6 KylixN define for a particular Kylix version, N = 1 KylixNPlus define for a particular Kylix version or later, N = 1 HasAssert define if compiler supports Assert I also make the assumption that every compiler except Delphi 1 has support for long strings.

What about Bugs?

  This book is a book of human endeavor, written, checked, and edited by human beings. To quote Alexander Pope in An Essay on Criticism, “To err is human, to forgive, divine.” This book will contain misstatements of facts, grammatical errors, spelling mistakes, bugs, whatever, no matter how hard I try going over it with Fowler’s Modern English Usage, a magnifying glass, and a fine-toothed comb. For a technical book like this, which presents hard facts permanently printed on paper, this could be unforgivable. Hence, I shall be maintaining an errata list on my Web site, together with any bug fixes to the code. Also on the site you’ll find other articles that go into greater depth on certain topics than this book. You can always find the latest errata and fixes at http://www.boyet.com/dads. If you do find an error, I would be grateful if you would send me the details by e-mail to . I can then fix it and update the Web site.

  [email protected]

  Acknowledgments

  There are several people without whom this book would never have been completed. I’d like to present them in what might be termed historical order, the order of their influence on me. The first two are a couple of gentlemen I’ve never met or spoken to, and yet who managed to open my eyes to and kindle my enthusiasm for the world of algorithms. If they hadn’t, who knows where I might be now and what I might be doing. I’m speaking of Donald Knuth (http://www-cs-staff.stanford.

  

edu/~knuth/ ) and Robert Sedgewick (http://www.cs.princeton.edu/~rs/). In

  fact, it was the latter’s Algorithms [20] that started me off, it being the first algorithms book I ever bought, back when I was just getting into Turbo Pascal. Donald Knuth needs no real introduction. His masterly The Art of Com-

  puter Programming [11,12,13] remains at the top of the algorithms tree; I

  first used it at Kings College, University of London while working toward my B.Sc. Mathematics degree. Fast forwarding a few years, Kim Kokkonen is the next person I would like to thank. He gave me my job at TurboPower Software (http://www.turbo-

  

power.com ) and gave me the opportunity to learn more computer science than

  I’d ever dreamt of before. A big thank you, of course, to all TurboPower’s employees and those TurboPower customers I’ve gotten to know over the years. I’d also like to thank Robert DelRossi, our president, for encouraging me in this endeavor.

  Next is a small company, now defunct, called Natural Systems. In 1993, they produced a product called Data Structures for Turbo Pascal. I bought it, and, in my opinion, it wasn’t very good. Oh, it worked fine, but I just didn’t agree with its design or implementation and it just wasn’t fast enough. It drove me to write my freeware EZSTRUCS library for Borland Pascal 7, from which I derived EZDSL, my well-known freeware data structures library for Delphi. This effort was the first time I’d really gotten to understand data structures, since sometimes it is only through doing that you get to learn.

  Thanks also to Chris Frizelle, the editor and owner of The Delphi Magazine (http://www.thedelphimagazine.com). He had the foresight to allow me to pontificate on various algorithms in his inestimable magazine, finally succumbing to giving me my own monthly column: Algorithms Alfresco. With- out him and his support, this book might have been written, but it certainly wouldn’t have been as good. I certainly recommend a subscription to The

  Delphi Magazine

  , as it remains, in my view, the most in-depth, intelligent ref- erence for Delphi programmers. Thanks to all my readers, as well, for their suggestions and comments on the column. Next to last, thanks to all the people at Wordware (http://www.word-

  ware.com

  ), including my editors, publisher Jim Hill, and developmental edi- tor Wes Beckwith. Jim was a bit dubious at first when I proposed publishing a book on algorithms, but he soon came round to my way of thinking and has been very supportive during its gestation. I’d also like to give my warmest thanks to my tech editors: Steve Teixeira, the co-author of the tome on how to get the best out of Delphi, Delphi n Developer’s Guide (where, at the time of writing, n = 5), and my friend Anton Parris. Finally, my thanks and my love go to my wife, Donna (she chivvied me to write this book in the first place). Without her love, enthusiasm, and encour- agement, I’d have given up ages ago. Thank you, sweetheart. Here’s to the next one! Julian M. Bucknall Colorado Springs, April 1999 to February 2001 going to be discussing. As we’ll see, one of the main reasons for understand- ing and researching algorithms is to make our applications faster. Oh, I’ll agree that sometimes we need algorithms that are more space efficient rather than speed efficient, but in general, it’s performance we crave.

  Although this book is about algorithms and data structures and how to imple- ment them in code, we should also discuss some of the procedural algorithms as well: how to write our code to help us debug it when it goes wrong, how to test our code, and how to make sure that changes in one place don’t break something elsewhere.

What is an Algorithm? What is an Algorithm?

  As it happens, we use algorithms all the time in our programming careers, but we just don’t tend to think of them as algorithms: “They’re not algorithms, it’s just the way things are done.” An algorithm is a step-by-step recipe for performing some calculation or pro- cess. This is a pretty loose definition, but once you understand that algorithms are nothing to be afraid of per se, you’ll recognize and use them without further thought.

  Go back to your elementary school days, when you were learning addition. The teacher would write on the board a sum like this:

  45 17 + and then ask you to add them up. You had been taught how to do this: start with the units column and add the 5 and the 7 to make 12, put the 2 under the units column, and then carry 1 above the 4.

  1

  45 17 +

  then write underneath the tens column. And, you’d have arrived at the con- centrated answer: 62. Notice that what you had been taught was an algorithm to perform this and

  Y

  any similar addition. You were not taught how to add 45 and 17 specifically but were instead taught a general way of adding two numbers. Indeed, pretty soon, you could add many numbers, with lots of digits, by applying the same FL algorithm. Of course, in those days, you weren’t told that this was an algo- rithm; it was just how you added up numbers.

  AM

  In the programming world we tend to think of algorithms as being complex methods to perform some calculation. For example, if we have an array of customer records and we want to find a particular one (say, John Smith), we

  TE

  might read through the entire array, element by element, until we either found the John Smith one or reached the end of the array. This seems an obvious way of doing it and we don’t think of it being an algorithm, but it is—it’s known as a sequential search.

  There might be other ways of finding “John Smith” in our hypothetical array. For example, if the array were sorted by last name, we could use the binary

  search algorithm to find John Smith. We look at the middle element in the

  array. Is it John Smith? If so, we’re done. If it is less than John Smith (by “less than,” I mean earlier in alphabetic sequence), then we can assume that John Smith is in the first half of the array. If greater than, it’s in the latter half of the array. We can then do the same thing again, that is, look at the middle item and select the portion of the array that should have John Smith, slicing and dicing the array into smaller and smaller parts, until we either find it or the bit of the array we have left is empty. Well, that algorithm certainly seems much more complicated than our origi- nal sequential search. The sequential search could be done with a nice simple For loop with a call to Break at the right moment; the code for the binary search would need a lot more calculations and local variables. So it might seem that sequential search is faster, just because it’s simpler to code.

  Enter the world of algorithm analysis where we do experiments and try and formulate laws about how different algorithms actually work.

  Analysis of Algorithms

  Let’s look at the two possible searches for “John Smith” in an array: the sequential search and the binary search. We’ll implement both algorithms and then play with them in order to ascertain their performance attributes. Listing 1.1 is the simple sequential search.

  Listing 1.1: Sequential search for a name in an array

  function SeqSearch(aStrs : PStringArray; aCount : integer; const aName : string5) : integer; var i : integer; begin for i := 0 to pred(aCount) do if CompareText(aStrs^[i], aName) = 0 then begin

  Result := i; Exit; end ;

  Result := -1; end ;

  Listing 1.2 shows the more complex binary search. (At the present time we won’t go into what is happening in this routine—we discuss the binary search algorithm in detail in Chapter 4.) Listing 1.2: Binary search for a name in an array

  function BinarySearch(aStrs : PStringArray; aCount : integer; const aName : string5) : integer; var

  L, R, M : integer; CompareResult : integer; begin

  L := 0; R := pred(aCount); while (L <= R) do begin

  M := (L + R) div 2; CompareResult := CompareText(aStrs^[M], aName); if (CompareResult = 0) then begin

  Result := M; Exit; end else if (CompareResult < 0) then

  L := M + 1 else R := M - 1; end ; Result := -1; end ;

  Just by looking at both routines it’s very hard to make a judgment about performance. In fact, this is a philosophy that we should embrace whole- heartedly: it can be very hard to tell how speed efficient some code is just by looking at it. The only way we can truly find out how fast code is, is to run it. Nothing else will do. Whenever we have a choice between algorithms, as we do here, we should test and time the code under different environments, with different inputs, in order to ascertain which algorithm is better for our needs. The traditional way to do this timing is with a profiler. The profiler program loads up our test application and then accurately times the various routines we’re interested in. My advice is to use a profiler as a matter of course in all your programming projects. It is only with a profiler that you can truly deter- mine where your application spends most of its time, and hence which routines are worth your spending time on optimization tasks.

  The company I work for, TurboPower Software Company, has a professional profiler in its Sleuth QA Suite product. I’ve tested all of the code in this book under both StopWatch (the name of the profiling program in Sleuth QA Suite) and under CodeWatch (the resource and memory leak debugger in the suite). However, even if you do not have a profiler, you can still experiment and time routines; it’s just a little more awkward, since you have to embed calls to time routines in your code. Any profiler worth buying does not alter your code; it does its magic by modifying the executable in memory at run time.

  For this experiment with searching algorithms, I wrote the test program to do its own timing. Essentially, the code grabs the system time at the start of the code being timed and gets it again at the end. From these two values it can calculate the time taken to perform the task. Actually, with modern faster machines and the low resolution of the PC clock, it’s usually beneficial to time several hundred calls to the routine, from which we can work out an average. (By the way, this program was written for 32-bit Delphi and will not compile with Delphi 1 since it allocates arrays on the heap that are greater than Delphi 1’s 64 KB limit.) I ran the performance experiments in several different forms. First, I timed how long it took to find “Smith” in arrays containing 100, 1,000, 10,000, and 100,000 elements, using both algorithms and making sure that a “Smith” ele- ment was present. For the next series of tests, I timed how long it took to find

  “Smith” in the same set of arrays with both algorithms, but this time I ensured that “Smith” was not present. Table 1.1 shows the results of my tests. Table 1.1: Timing sequential and binary searches

Fail Success Sequential

Binary

  taken to perform a sequential search is proportional to the number of ele- ments in the array. We say that the execution characteristics of sequential search are linear. However, the binary search statistics are somewhat more difficult to charac- terize. Indeed, it even seems as if we’re falling into a timing resolution problem because the algorithm is so fast. The relationship between the time taken and the number of elements in the array is no longer a simple linear one. It seems to be something much less than this, and something that is not brought out by these tests.

  the number of elements tenfold results in a run time that’s increased the time

  

2.41

Here we get a much more impressive set of data. You can see that increasing

  2.50

  

2.06

100,000

  2.06

  

1.46

10,000

  1.47

  

0.57

1,000

  0.89

  100

  I reran the tests and scaled the binary timings by a factor of 100. Table 1.2: Retiming binary searches

  

0.02

As you can see, the timings make for some very interesting reading. The time

  100

  0.03

  

0.02

100,000

  0.02

  

0.01

10,000

  0.01

  

0.01

1,000

  0.01

  100

  10.84 100,000 149.42 106.35

  15.28

  

1.05

10,000

  1.44

  

0.10

1,000

  0.14