Dependency-Preserving Decomposition
16.3.1 Dependency-Preserving Decomposition
into 3NF Schemas
Algorithm 16.4 creates a dependency-preserving decomposition D = {R 1 ,R 2 , ..., R m } of a universal relation R based on a set of functional dependencies F, such that each R i in D is in 3NF. It guarantees only the dependency-preserving property; it does not guarantee the nonadditive join property. The first step of Algorithm 16.4 is to find a minimal cover G for F; Algorithm 16.2 can be used for this step. Note that multiple minimal covers may exist for a given set F (as we illustrate later in the example after Algorithm 16.4). In such cases the algorithms can potentially yield multiple alternative designs.
Algorithm 16.4. Relational Synthesis into 3NF with Dependency Preservation
Input:
A universal relation R and a set of functional dependencies F on the attributes of R.
1. Find a minimal cover G for F (use Algorithm 16.2);
2. For each left-hand-side X of a functional dependency that appears in G, cre- ate a relation schema in D with attributes {X ∪ {A 1 } ∪ {A 2 } ... ∪ {A k } },
where X → A 1 ,X→A 2 , ..., X → A k are the only dependencies in G with X as
the left-hand-side (X is the key of this relation);
3. Place any remaining attributes (that have not been placed in any relation) in
a single relation schema to ensure the attribute preservation property. Example of Algorithm 16.4. Consider the following universal relation:
U( Emp_ssn , Pno , Esal , Ephone , Dno , Pname , Plocation ) Emp_ssn , Esal , Ephone refer to the Social Security number, salary, and phone number
of the employee. Pno , Pname , and Plocation refer to the number, name, and location of the project. Dno is department number.
The following dependencies are present:
FD1: Emp_ssn → { Esal , Ephone , Dno } FD2: Pno → { Pname , Plocation } FD3: Emp_ssn , Pno → { Esal , Ephone , Dno , Pname , Plocation }
By virtue of FD3, the attribute set { Emp_ssn , Pno } represents a key of the universal relation. Hence F, the set of given FDs includes { Emp_ssn → Esal , Ephone , Dno ; Pno → Pname , Plocation ; Emp_ssn , Pno → Esal , Ephone , Dno , Pname , Plocation }.
By applying the minimal cover Algorithm 16.2, in step 3 we see that Pno is a redun- dant attribute in Emp_ssn , Pno → Esal , Ephone , Dno . Moreover, Emp_ssn is redun- dant in Emp_ssn , Pno → Pname , Plocation . Hence the minimal cover consists of FD1 and FD2 only (FD3 being completely redundant) as follows (if we group attributes with the same left-hand side into one FD):
Minimal cover G: { Emp_ssn → Esal , Ephone , Dno ; Pno → Pname , Plocation }
16.3 Algorithms for Relational Database Schema Design 559
By applying Algorithm 16.4 to the above Minimal cover G, we get a 3NF design con- sisting of two relations with keys Emp_ssn and Pno as follows:
R 1 ( Emp_ssn , Esal , Ephone , Dno ) R 2 ( Pno , Pname , Plocation )
An observant reader would notice easily that these two relations have lost the original information contained in the key of the universal relation U (namely, that there are certain employees working on certain projects in a many-to-many relationship). Thus, while the algorithm does preserve the original dependencies, it makes no guar- antee of preserving all of the information. Hence, the resulting design is a lossy design.
Claim 3. Every relation schema created by Algorithm 16.4 is in 3NF. (We will not provide a formal proof here; 6 the proof depends on G being a minimal set of dependencies.)
It is obvious that all the dependencies in G are preserved by the algorithm because each dependency appears in one of the relations R i in the decomposition D. Since G is equivalent to F, all the dependencies in F are either preserved directly in the decomposition or are derivable using the inference rules from Section 16.1.1 from those in the resulting relations, thus ensuring the dependency preservation prop- erty. Algorithm 16.4 is called a relational synthesis algorithm, because each rela- tion schema R i in the decomposition is synthesized (constructed) from the set of functional dependencies in G with the same left-hand-side X.
Parts
» Fundamentals_of_Database_Systems,_6th_Edition
» Characteristics of the Database Approach
» Advantages of Using the DBMS Approach
» A Brief History of Database Applications
» Schemas, Instances, and Database State
» The Three-Schema Architecture
» The Database System Environment
» Centralized and Client/Server Architectures for DBMSs
» Classification of Database Management Systems
» Domains, Attributes, Tuples, and Relations
» Key Constraints and Constraints on NULL Values
» Relational Databases and Relational Database Schemas
» Integrity, Referential Integrity, and Foreign Keys
» Update Operations, Transactions, and Dealing with Constraint Violations
» SQL Data Definition and Data Types
» Specifying Constraints in SQL
» The SELECT-FROM-WHERE Structure of Basic SQL Queries
» Ambiguous Attribute Names, Aliasing, Renaming, and Tuple Variables
» Substring Pattern Matching and Arithmetic Operators
» INSERT, DELETE, and UPDATE Statements in SQL
» Comparisons Involving NULL and Three-Valued Logic
» Nested Queries, Tuples, and Set/Multiset Comparisons
» The EXISTS and UNIQUE Functions in SQL
» Joined Tables in SQL and Outer Joins
» Grouping: The GROUP BY and HAVING Clauses
» Discussion and Summary of SQL Queries
» Specifying General Constraints as Assertions in SQL
» Introduction to Triggers in SQL
» Specification of Views in SQL
» View Implementation, View Update, and Inline Views
» Schema Change Statements in SQL
» Sequences of Operations and the RENAME Operation
» The UNION, INTERSECTION, and MINUS Operations
» The CARTESIAN PRODUCT (CROSS PRODUCT) Operation
» Variations of JOIN: The EQUIJOIN and NATURAL JOIN
» Additional Relational Operations
» Examples of Queries in Relational Algebra
» The Tuple Relational Calculus
» The Domain Relational Calculus
» Using High-Level Conceptual Data Models
» Entity Types, Entity Sets, Keys, and Value Sets
» Relationship Types, Relationship Sets, Roles, and Structural Constraints
» ER Diagrams, Naming Conventions, and Design Issues
» Example of Other Notation: UML Class Diagrams
» Relationship Types of Degree Higher than Two
» Subclasses, Superclasses, and Inheritance
» Constraints on Specialization and Generalization
» Specialization and Generalization Hierarchies
» Modeling of UNION Types Using Categories
» A Sample UNIVERSITY EER Schema, Design Choices, and Formal Definitions
» Data Abstraction, Knowledge Representation, and Ontology Concepts
» ER-to-Relational Mapping Algorithm
» Discussion and Summary of Mapping for ER Model Constructs
» Mapping EER Model Constructs
» The Role of Information Systems
» The Database Design and Implementation Process
» Use of UML Diagrams as an Aid to Database Design Specification 6
» Rational Rose: A UML-Based Design Tool
» Automated Database Design Tools
» Introduction to Object-Oriented Concepts and Features
» Object Identity, and Objects versus Literals
» Complex Type Structures for Objects and Literals
» Encapsulation of Operations and Persistence of Objects
» Type Hierarchies and Inheritance
» Other Object-Oriented Concepts
» Object-Relational Features: Object Database Extensions to SQL
» Overview of the Object Model of ODMG
» Built-in Interfaces and Classes in the Object Model
» Atomic (User-Defined) Objects
» Extents, Keys, and Factory Objects
» The Object Definition Language ODL
» Differences between Conceptual Design of ODB and RDB
» Mapping an EER Schema to an ODB Schema
» Query Results and Path Expressions
» Overview of the C++ Language Binding in the ODMG Standard
» Structured, Semistructured, and Unstructured Data
» XML Hierarchical (Tree) Data Model
» Well-Formed and Valid XML Documents and XML DTD
» XPath: Specifying Path Expressions in XML
» XQuery: Specifying Queries in XML
» Extracting XML Documents from
» Database Programming: Techniques
» Retrieving Single Tuples with Embedded SQL
» Retrieving Multiple Tuples with Embedded SQL Using Cursors
» Specifying Queries at Runtime Using Dynamic SQL
» SQLJ: Embedding SQL Commands in Java
» Retrieving Multiple Tuples in SQLJ Using Iterators
» Database Programming with SQL/CLI Using C
» JDBC: SQL Function Calls for Java Programming
» Database Stored Procedures and SQL/PSM
» PHP Variables, Data Types, and Programming Constructs
» Overview of PHP Database Programming
» Imparting Clear Semantics to Attributes in Relations
» Redundant Information in Tuples and Update Anomalies
» Normal Forms Based on Primary Keys
» General Definitions of Second and Third Normal Forms
» Multivalued Dependency and Fourth Normal Form
» Join Dependencies and Fifth Normal Form
» Inference Rules for Functional Dependencies
» Minimal Sets of Functional Dependencies
» Properties of Relational Decompositions
» Dependency-Preserving Decomposition
» Dependency-Preserving and Nonadditive (Lossless) Join Decomposition into 3NF Schemas
» Problems with NULL Values and Dangling Tuples
» Discussion of Normalization Algorithms and Alternative Relational Designs
» Further Discussion of Multivalued Dependencies and 4NF
» Other Dependencies and Normal Forms
» Memory Hierarchies and Storage Devices
» Hardware Description of Disk Devices
» Magnetic Tape Storage Devices
» Placing File Records on Disk
» Files of Unordered Records (Heap Files)
» Files of Ordered Records (Sorted Files)
» External Hashing for Disk Files
» Hashing Techniques That Allow Dynamic File Expansion
» Other Primary File Organizations
» Parallelizing Disk Access Using RAID Technology
» Types of Single-Level Ordered Indexes
» Some General Issues Concerning Indexing
» Algorithms for External Sorting
» Implementing the SELECT Operation
» Implementing the JOIN Operation
» Algorithms for PROJECT and Set
» Notation for Query Trees and Query Graphs
» Heuristic Optimization of Query Trees
» Catalog Information Used in Cost Functions
» Examples of Cost Functions for SELECT
» Examples of Cost Functions for JOIN
» Example to Illustrate Cost-Based Query Optimization
» Factors That Influence Physical Database Design
» Physical Database Design Decisions
» An Overview of Database Tuning in Relational Systems
» Transactions, Database Items, Read and Write Operations, and DBMS Buffers
» Why Concurrency Control Is Needed
» Transaction and System Concepts
» Desirable Properties of Transactions
» Serial, Nonserial, and Conflict-Serializable Schedules
» Testing for Conflict Serializability of a Schedule
» How Serializability Is Used for Concurrency Control
» View Equivalence and View Serializability
» Types of Locks and System Lock Tables
» Guaranteeing Serializability by Two-Phase Locking
» Dealing with Deadlock and Starvation
» Concurrency Control Based on Timestamp Ordering
» Multiversion Concurrency Control Techniques
» Validation (Optimistic) Concurrency
» Granularity of Data Items and Multiple Granularity Locking
» Using Locks for Concurrency Control in Indexes
» Other Concurrency Control Issues
» Recovery Outline and Categorization of Recovery Algorithms
» Caching (Buffering) of Disk Blocks
» Write-Ahead Logging, Steal/No-Steal, and Force/No-Force
» Transaction Rollback and Cascading Rollback
» NO-UNDO/REDO Recovery Based on Deferred Update
» Recovery Techniques Based on Immediate Update
» The ARIES Recovery Algorithm
» Recovery in Multidatabase Systems
» Introduction to Database Security Issues 1
» Discretionary Access Control Based on Granting and Revoking Privileges
» Mandatory Access Control and Role-Based Access Control for Multilevel Security
» Introduction to Statistical Database Security
» Introduction to Flow Control
» Encryption and Public Key Infrastructures
» Challenges of Database Security
» Distributed Database Concepts 1
» Types of Distributed Database Systems
» Distributed Database Architectures
» Data Replication and Allocation
» Example of Fragmentation, Allocation, and Replication
» Query Processing and Optimization in Distributed Databases
» Overview of Transaction Management in Distributed Databases
» Overview of Concurrency Control and Recovery in Distributed Databases
» Current Trends in Distributed Databases
» Distributed Databases in Oracle 13
» Generalized Model for Active Databases and Oracle Triggers
» Design and Implementation Issues for Active Databases
» Examples of Statement-Level Active Rules
» Time Representation, Calendars, and Time Dimensions
» Incorporating Time in Relational Databases Using Tuple Versioning
» Incorporating Time in Object-Oriented Databases Using Attribute Versioning
» Temporal Querying Constructs and the TSQL2 Language
» Spatial Database Concepts 24
» Multimedia Database Concepts
» Clausal Form and Horn Clauses
» Datalog Programs and Their Safety
» Evaluation of Nonrecursive Datalog Queries
» Introduction to Information Retrieval
» Types of Queries in IR Systems
» Evaluation Measures of Search Relevance
» Web Analysis and Its Relationship to Information Retrieval
» Analyzing the Link Structure of Web Pages
» Approaches to Web Content Analysis
» Trends in Information Retrieval
» Data Mining as a Part of the Knowledge
» Goals of Data Mining and Knowledge Discovery
» Types of Knowledge Discovered during Data Mining
» Market-Basket Model, Support, and Confidence
» Frequent-Pattern (FP) Tree and FP-Growth Algorithm
» Other Types of Association Rules
» Approaches to Other Data Mining Problems
» Commercial Data Mining Tools
» Data Modeling for Data Warehouses
» Difficulties of Implementing Data Warehouses
» Grouping, Aggregation, and Database Modification in QBE
Show more