Automated Database Design Tools
10.5 Automated Database Design Tools
The database design activity predominantly spans Phase 2 (conceptual design), Phase 4 (data model mapping, or logical design), and Phase 5 (physical database design) in the design process that we discussed in Section 10.2. Discussion of Phase
5 is deferred to Chapter 20 after we present storage and indexing techniques, and query optimization. We discussed Phases 2 and 4 in detail with the use of the UML notation in Section 10.3 and pointed out the features of the tool Rational Rose, which supports these phases, in Section 10.4. As we mentioned, Rational Rose is more than just a database design tool. It is a software development tool and does database modeling and schema design in the form of class diagrams as part of its overall object-oriented application development methodology. In this section, we summarize the features and shortcomings of the set of commercial tools that are focused on automating the process of conceptual, logical, and physical design of databases.
When database technology was first introduced, most database design was carried out manually by expert designers, who used their experience and knowledge in the design process. However, at least two factors indicated that some form of automa- tion had to be utilized if possible:
1. As an application involves more and more complexity of data in terms of relationships and constraints, the number of options or different designs to
10.5 Automated Database Design Tools 343
EMPLOYEE DEPARTMENT
1..n
WORKS_FOR
add_employee()
Ssn
n+supervisee
no_of_employee()
Bdate
change_major
Sex MANAGES 1 0..n
Start_date
Address Salary age()
+supervisor
1..n
change_department() LOCATION
Controls
change_projects() Name
1..n
WORKS ON
Dependent name
Hours
1 0..n
PROJECT 1..n Name
add_employee()
Figure 10.17
Birth_date
add_project()
The COMPANY data- base class diagram
Relationship
change_manager()
(Figure 7.16) drawn in Rational Rose.
model the same information keeps increasing rapidly. It becomes difficult to deal with this complexity and the corresponding design alternatives manually.
2. The sheer size of some databases runs into hundreds of entity types and rela- tionship types, making the task of manually managing these designs almost
impossible. The meta information related to the design process we described in Section 10.2 yields another database that must be created, maintained, and queried as a database in its own right.
The above factors have given rise to many tools that come under the general cate- gory of CASE (computer-aided software engineering) tools for database design. Rational Rose is a good example of a modern CASE tool. Typically these tools con- sist of a combination of the following facilities:
1. Diagramming. This allows the designer to draw a conceptual schema dia- gram in some tool-specific notation. Most notations include entity types
(classes), relationship types (associations) that are shown either as separate boxes or simply as directed or undirected lines, cardinality constraints
344 Chapter 10 Practical Database Design Methodology and Use of UML Diagrams
shown alongside the lines or in terms of the different types of arrowheads or min/max constraints, attributes, keys, and so on. 10 Some tools display inher- itance hierarchies and use additional notation for showing the partial- versus-total and disjoint-versus-overlapping nature of the specialization/ generalization. The diagrams are internally stored as conceptual designs and are available for modification as well as generation of reports, cross- reference listings, and other uses.
2. Model mapping. This implements mapping algorithms similar to the ones we presented in Sections 9.1 and 9.2. The mapping is system-specific—most
tools generate schemas in SQL DDL for Oracle, DB2, Informix, Sybase, and other RDBMSs. This part of the tool is most amenable to automation. The designer can further edit the produced DDL files if needed.
3. Design normalization. This utilizes a set of functional dependencies that are supplied at the conceptual design or after the relational schemas are pro-
duced during logical design. Then, design decomposition algorithms (see Chapter 16) are applied to decompose existing relations into higher normal- form relations. Generally, many of these tools lack the approach of generat- ing alternative 3NF or BCNF designs (described in Chapter 15) and allowing the designer to select among them based on some criteria like the minimum number of relations or least amount of storage.
Most tools incorporate some form of physical design including the choice of indexes. A whole range of separate tools exists for performance monitoring and measurement. The problem of tuning a design or the database implementation is still mostly handled as a human decision-making activity. Out of the phases of design described in this chapter, one area where there is hardly any commercial tool support is view integration (see Section 10.2.2).
We will not survey database design tools here, but only mention the following char- acteristics that a good design tool should possess:
1. An easy-to-use interface. This is critical because it enables designers to focus on the task at hand, not on understanding the tool. Graphical and
point-and-click interfaces are commonly used. A few tools like the SECSI design tool use natural language input. Different interfaces may be tailored to beginners or to expert designers.
2. Analytical components. Tools should provide analytical components for tasks that are difficult to perform manually, such as evaluating physical
design alternatives or detecting conflicting constraints among views. This area is weak in most current tools.
3. Heuristic components. Aspects of the design that cannot be precisely quantified can be automated by entering heuristic rules in the design tool to
evaluate design alternatives.
10.6 Summary 345
4. Trade-off analysis.
A tool should present the designer with adequate com- parative analysis whenever it presents multiple alternatives to choose from. Tools should ideally incorporate an analysis of a design change at the con- ceptual design level down to physical design. Because of the many alterna- tives possible for physical design in a given system, such tradeoff analysis is difficult to carry out and most current tools avoid it.
5. Display of design results. Design results, such as schemas, are often dis- played in diagrammatic form. Aesthetically pleasing and well laid out dia-
grams are not easy to generate automatically. Multipage design layouts that are easy to read are another challenge. Other types of results of design may
be shown as tables, lists, or reports that should be easy to interpret.
6. Design verification. This is a highly desirable feature. Its purpose is to ver- ify that the resulting design satisfies the initial requirements. Unless the
requirements are captured and internally represented in some analyzable form, the verification cannot be attempted.
Currently there is increasing awareness of the value of design tools, and they are becoming a must for dealing with large database design problems. There is also an increasing awareness that schema design and application design should go hand in hand, and the current trend among CASE tools is to address both areas. The popu- larity of tools such as Rational Rose is due to the fact that it approaches the two arms of the design process shown in Figure 10.1 concurrently, approaching database design and application design as a unified activity. After the acquisition of Rational by IBM in 2003, the Rational suite of tools have been enhanced as XDE (extended development environment) tools. Some vendors like Platinum (CA) provide a tool for data modeling and schema design (ERwin), and another for process modeling and functional design (BPwin). Other tools (for example, SECSI) use expert system technology to guide the design process by including design expertise in the form of rules. Expert system technology is also useful in the requirements collection and analysis phase, which is typically a laborious and frustrating process. The trend is to use both meta-data repositories and design tools to achieve better designs for com- plex databases. Without a claim of being exhaustive, Table 10.1 lists some popular database design and application modeling tools. Companies in the table are listed alphabetically.
Parts
» Fundamentals_of_Database_Systems,_6th_Edition
» Characteristics of the Database Approach
» Advantages of Using the DBMS Approach
» A Brief History of Database Applications
» Schemas, Instances, and Database State
» The Three-Schema Architecture
» The Database System Environment
» Centralized and Client/Server Architectures for DBMSs
» Classification of Database Management Systems
» Domains, Attributes, Tuples, and Relations
» Key Constraints and Constraints on NULL Values
» Relational Databases and Relational Database Schemas
» Integrity, Referential Integrity, and Foreign Keys
» Update Operations, Transactions, and Dealing with Constraint Violations
» SQL Data Definition and Data Types
» Specifying Constraints in SQL
» The SELECT-FROM-WHERE Structure of Basic SQL Queries
» Ambiguous Attribute Names, Aliasing, Renaming, and Tuple Variables
» Substring Pattern Matching and Arithmetic Operators
» INSERT, DELETE, and UPDATE Statements in SQL
» Comparisons Involving NULL and Three-Valued Logic
» Nested Queries, Tuples, and Set/Multiset Comparisons
» The EXISTS and UNIQUE Functions in SQL
» Joined Tables in SQL and Outer Joins
» Grouping: The GROUP BY and HAVING Clauses
» Discussion and Summary of SQL Queries
» Specifying General Constraints as Assertions in SQL
» Introduction to Triggers in SQL
» Specification of Views in SQL
» View Implementation, View Update, and Inline Views
» Schema Change Statements in SQL
» Sequences of Operations and the RENAME Operation
» The UNION, INTERSECTION, and MINUS Operations
» The CARTESIAN PRODUCT (CROSS PRODUCT) Operation
» Variations of JOIN: The EQUIJOIN and NATURAL JOIN
» Additional Relational Operations
» Examples of Queries in Relational Algebra
» The Tuple Relational Calculus
» The Domain Relational Calculus
» Using High-Level Conceptual Data Models
» Entity Types, Entity Sets, Keys, and Value Sets
» Relationship Types, Relationship Sets, Roles, and Structural Constraints
» ER Diagrams, Naming Conventions, and Design Issues
» Example of Other Notation: UML Class Diagrams
» Relationship Types of Degree Higher than Two
» Subclasses, Superclasses, and Inheritance
» Constraints on Specialization and Generalization
» Specialization and Generalization Hierarchies
» Modeling of UNION Types Using Categories
» A Sample UNIVERSITY EER Schema, Design Choices, and Formal Definitions
» Data Abstraction, Knowledge Representation, and Ontology Concepts
» ER-to-Relational Mapping Algorithm
» Discussion and Summary of Mapping for ER Model Constructs
» Mapping EER Model Constructs
» The Role of Information Systems
» The Database Design and Implementation Process
» Use of UML Diagrams as an Aid to Database Design Specification 6
» Rational Rose: A UML-Based Design Tool
» Automated Database Design Tools
» Introduction to Object-Oriented Concepts and Features
» Object Identity, and Objects versus Literals
» Complex Type Structures for Objects and Literals
» Encapsulation of Operations and Persistence of Objects
» Type Hierarchies and Inheritance
» Other Object-Oriented Concepts
» Object-Relational Features: Object Database Extensions to SQL
» Overview of the Object Model of ODMG
» Built-in Interfaces and Classes in the Object Model
» Atomic (User-Defined) Objects
» Extents, Keys, and Factory Objects
» The Object Definition Language ODL
» Differences between Conceptual Design of ODB and RDB
» Mapping an EER Schema to an ODB Schema
» Query Results and Path Expressions
» Overview of the C++ Language Binding in the ODMG Standard
» Structured, Semistructured, and Unstructured Data
» XML Hierarchical (Tree) Data Model
» Well-Formed and Valid XML Documents and XML DTD
» XPath: Specifying Path Expressions in XML
» XQuery: Specifying Queries in XML
» Extracting XML Documents from
» Database Programming: Techniques
» Retrieving Single Tuples with Embedded SQL
» Retrieving Multiple Tuples with Embedded SQL Using Cursors
» Specifying Queries at Runtime Using Dynamic SQL
» SQLJ: Embedding SQL Commands in Java
» Retrieving Multiple Tuples in SQLJ Using Iterators
» Database Programming with SQL/CLI Using C
» JDBC: SQL Function Calls for Java Programming
» Database Stored Procedures and SQL/PSM
» PHP Variables, Data Types, and Programming Constructs
» Overview of PHP Database Programming
» Imparting Clear Semantics to Attributes in Relations
» Redundant Information in Tuples and Update Anomalies
» Normal Forms Based on Primary Keys
» General Definitions of Second and Third Normal Forms
» Multivalued Dependency and Fourth Normal Form
» Join Dependencies and Fifth Normal Form
» Inference Rules for Functional Dependencies
» Minimal Sets of Functional Dependencies
» Properties of Relational Decompositions
» Dependency-Preserving Decomposition
» Dependency-Preserving and Nonadditive (Lossless) Join Decomposition into 3NF Schemas
» Problems with NULL Values and Dangling Tuples
» Discussion of Normalization Algorithms and Alternative Relational Designs
» Further Discussion of Multivalued Dependencies and 4NF
» Other Dependencies and Normal Forms
» Memory Hierarchies and Storage Devices
» Hardware Description of Disk Devices
» Magnetic Tape Storage Devices
» Placing File Records on Disk
» Files of Unordered Records (Heap Files)
» Files of Ordered Records (Sorted Files)
» External Hashing for Disk Files
» Hashing Techniques That Allow Dynamic File Expansion
» Other Primary File Organizations
» Parallelizing Disk Access Using RAID Technology
» Types of Single-Level Ordered Indexes
» Some General Issues Concerning Indexing
» Algorithms for External Sorting
» Implementing the SELECT Operation
» Implementing the JOIN Operation
» Algorithms for PROJECT and Set
» Notation for Query Trees and Query Graphs
» Heuristic Optimization of Query Trees
» Catalog Information Used in Cost Functions
» Examples of Cost Functions for SELECT
» Examples of Cost Functions for JOIN
» Example to Illustrate Cost-Based Query Optimization
» Factors That Influence Physical Database Design
» Physical Database Design Decisions
» An Overview of Database Tuning in Relational Systems
» Transactions, Database Items, Read and Write Operations, and DBMS Buffers
» Why Concurrency Control Is Needed
» Transaction and System Concepts
» Desirable Properties of Transactions
» Serial, Nonserial, and Conflict-Serializable Schedules
» Testing for Conflict Serializability of a Schedule
» How Serializability Is Used for Concurrency Control
» View Equivalence and View Serializability
» Types of Locks and System Lock Tables
» Guaranteeing Serializability by Two-Phase Locking
» Dealing with Deadlock and Starvation
» Concurrency Control Based on Timestamp Ordering
» Multiversion Concurrency Control Techniques
» Validation (Optimistic) Concurrency
» Granularity of Data Items and Multiple Granularity Locking
» Using Locks for Concurrency Control in Indexes
» Other Concurrency Control Issues
» Recovery Outline and Categorization of Recovery Algorithms
» Caching (Buffering) of Disk Blocks
» Write-Ahead Logging, Steal/No-Steal, and Force/No-Force
» Transaction Rollback and Cascading Rollback
» NO-UNDO/REDO Recovery Based on Deferred Update
» Recovery Techniques Based on Immediate Update
» The ARIES Recovery Algorithm
» Recovery in Multidatabase Systems
» Introduction to Database Security Issues 1
» Discretionary Access Control Based on Granting and Revoking Privileges
» Mandatory Access Control and Role-Based Access Control for Multilevel Security
» Introduction to Statistical Database Security
» Introduction to Flow Control
» Encryption and Public Key Infrastructures
» Challenges of Database Security
» Distributed Database Concepts 1
» Types of Distributed Database Systems
» Distributed Database Architectures
» Data Replication and Allocation
» Example of Fragmentation, Allocation, and Replication
» Query Processing and Optimization in Distributed Databases
» Overview of Transaction Management in Distributed Databases
» Overview of Concurrency Control and Recovery in Distributed Databases
» Current Trends in Distributed Databases
» Distributed Databases in Oracle 13
» Generalized Model for Active Databases and Oracle Triggers
» Design and Implementation Issues for Active Databases
» Examples of Statement-Level Active Rules
» Time Representation, Calendars, and Time Dimensions
» Incorporating Time in Relational Databases Using Tuple Versioning
» Incorporating Time in Object-Oriented Databases Using Attribute Versioning
» Temporal Querying Constructs and the TSQL2 Language
» Spatial Database Concepts 24
» Multimedia Database Concepts
» Clausal Form and Horn Clauses
» Datalog Programs and Their Safety
» Evaluation of Nonrecursive Datalog Queries
» Introduction to Information Retrieval
» Types of Queries in IR Systems
» Evaluation Measures of Search Relevance
» Web Analysis and Its Relationship to Information Retrieval
» Analyzing the Link Structure of Web Pages
» Approaches to Web Content Analysis
» Trends in Information Retrieval
» Data Mining as a Part of the Knowledge
» Goals of Data Mining and Knowledge Discovery
» Types of Knowledge Discovered during Data Mining
» Market-Basket Model, Support, and Confidence
» Frequent-Pattern (FP) Tree and FP-Growth Algorithm
» Other Types of Association Rules
» Approaches to Other Data Mining Problems
» Commercial Data Mining Tools
» Data Modeling for Data Warehouses
» Difficulties of Implementing Data Warehouses
» Grouping, Aggregation, and Database Modification in QBE
Show more