Maintaining the Data Repository

Maintaining the Data Repository

Consider a large and active Internet database application, such as those used by e-commerce companies—for instance, an application that is used by a company that sells clothing over the Internet. Such a system may involve data from several different databases, dozens of different Web pages, and hundreds, or even thousands, of users.

Suppose that the company using this application decides to expand its product line to include the sale of sporting goods. Senior management of this company might ask the DBA to develop an estimate of the time and other resources required to modify the database application to support this new product line.

To respond to this request, the DBA needs accurate metadata about the database, about the database applications and application components, about the users and their rights and privileges, and about other system elements. The database does carry some of this metadata in system tables, but this metadata is inadequate to answer the questions posed by senior management. The DBA needs additional metadata about COM and ActiveX objects, script procedures and functions, Active Server Pages (ASPs), style sheets, document type definitions,

Figure 9-19

• Generate database application performance Summary of the DBA’s

reports

Responsibilities for Managing the DMBS

• Investigate user performance complaints • Assess need for changes in database structure

or application design • Modify database structure • Evaluate and implement new DBMS features • Tune the DBMS

Part 4 Multiuser Database Processing

and the like. Furthermore, although DBMS security mechanisms document users, groups, and privileges, they do so in a highly structured, and often inconvenient, form.

For all of these reasons, many organizations develop and maintain data repositories, which are collections of metadata about databases, database applications, Web pages, users, and other application components. The repository may be virtual in that it is composed of metadata from many different sources: the DBMS, version-control software, code libraries, Web page generation and editing tools, and so forth. Or, the data repository may be an integrated product from a CASE tool vendor or from a company such as Microsoft or Oracle.

Either way, the time for the DBA to think about constructing such a facility is long before senior management asks questions. In fact, the repository should be constructed as the system is developed and should be considered an important part of the system deliverables. If such a facility is not constructed, the DBA will always be playing catch-up—trying to maintain the existing applications, adapting them to new needs, and somehow gathering together the metadata to form a repository.

The best repositories are active repositories—they are part of the systems development process in that metadata is created automatically as the system components are created. Less desirable, but still effective, are passive repositories, which are filled only when someone takes the time to generate the needed metadata and place it in the repository.

The Internet has created enormous opportunities for businesses to expand their customer bases and increase their sales and profitability. The databases and database applica- tions that support these companies are an essential element of that success. Unfortunately, the growth of some organizations will be stymied by their inability to grow their applications or adapt them to changing needs. Often, building a new system is easier than adapting an existing one. Building a new system that integrates with an old one while it replaces that old one can be very difficult.

Distributed Database Processing

A distributed database is a database that is stored and processed on more than one computer. Depending on the type of database and the processing that is allowed, distributed databases can present significant problems. Let us consider the types of distributed databases.