Reverse Engineering to Understand Data

30.3.2 Reverse Engineering to Understand Data

Reverse engineering of data occurs at different levels of abstraction. At the program level, internal program data structures must often be reverse engineered as part of an overall reengineering effort. At the system level, global data structures (e.g., files, databases) are often reengineered to accommodate new database management par- adigms (e.g., the move from flat file to relational or object-oriented database systems). Reverse engineering of the current global data structures sets the stage for the intro- duction of a new systemwide database.

Internal data structures. Reverse engineering techniques for internal program data focus on the definition of classes of objects. 4 This is accomplished by examin- ing the program code with the intent of grouping related program variables. In many cases, the data organization within the code identifies abstract data types. For exam- ple, record structures, files, lists, and other data structures often provide an initial indicator of classes.

3 Often, specifications written early in the life history of a program are never updated. As changes are made, the code no longer conforms to the specification. 4 For a complete discussion of these object-oriented concepts, see Part Four of this book.

Breuer and Lano [BRE91] suggest the following approach for reverse engineering of classes:

Relatively insignificant compromises in data

1. Identify flags and local data structures within the program that record impor- structures can lead to

tant information about global data structures (e.g., a file or database). potentially catastrophic problems in future

2. Define the relationship between flags and local data structures and the years. Consider the

global data structures. For example, a flag may be set when a file is empty; a Y2K problem as an

local data structure may serve as a buffer that contains the last 100 records example.

acquired from a central database.

3. For every variable (within the program) that represents an array or file, list all other variables that have a logical connection to it.

These steps enable a software engineer to identify classes within the program that interact with the global data structures.

Database structure. Regardless of its logical organization and physical structure,

a database allows the definition of data objects and supports some method for estab- lishing relationships among the objects. Therefore, reengineering one database schema into another requires an understanding of existing objects and their relationships.

The following steps [PRE94] may be used to define the existing data model as a precursor to reengineering a new database model:

? 1. Build an initial object model. The classes defined as part of the model

What steps

can be

may be acquired by reviewing records in a flat file database or tables in a

applied to reverse engineer an relational schema. The items contained in records or tables become attrib-

existing database

utes of a class.

structure?

2. Determine candidate keys. The attributes are examined to determine whether they are used to point to another record or table. Those that serve as pointers become candidate keys.

3. Refine the tentative classes. Determine whether similar classes can be combined into a single class.

4. Define generalizations. Examine classes that have many similar attributes to determine whether a class hierarchy should be constructed with a general- ization class at its head.

5. Discover associations. Use techniques that are analogous to the CRC approach (Chapter 21) to establish associations among classes.

Once information defined in the preceding steps is known, a series of transforma- tions [PRE94] can be applied to map the old database structure into a new database structure.