Variations of JOIN: The EQUIJOIN and NATURAL JOIN
6.3.2 Variations of JOIN: The EQUIJOIN and NATURAL JOIN
The most common use of JOIN involves join conditions with equality comparisons only. Such a JOIN , where the only comparison operator used is =, is called an
EQUIJOIN . Both previous examples were EQUIJOINs . Notice that in the result of an EQUIJOIN we always have one or more pairs of attributes that have identical values in every tuple. For example, in Figure 6.6, the values of the attributes Mgr_ssn and Ssn are identical in every tuple of DEPT_MGR (the EQUIJOIN result) because the
equality join condition specified on these two attributes requires the values to be identical in every tuple in the result. Because one of each pair of attributes with identical values is superfluous, a new operation called NATURAL JOIN —denoted by *—was created to get rid of the second (superfluous) attribute in an EQUIJOIN con-
dition. 6 The standard definition of NATURAL JOIN requires that the two join attrib- utes (or each pair of join attributes) have the same name in both relations. If this is not the case, a renaming operation is applied first.
Suppose we want to combine each PROJECT tuple with the DEPARTMENT tuple that controls the project. In the following example, first we rename the Dnumber attribute of DEPARTMENT to Dnum —so that it has the same name as the Dnum attribute in PROJECT —and then we apply NATURAL JOIN :
PROJ_DEPT ← PROJECT * ρ ( Dname , Dnum , Mgr_ssn , Mgr_start_date ) ( DEPARTMENT ) The same query can be done in two steps by creating an intermediate table DEPT as
follows: DEPT ← ρ ( Dname , Dnum , Mgr_ssn , Mgr_start_date ) ( DEPARTMENT )
PROJ_DEPT ← PROJECT * DEPT The attribute Dnum is called the join attribute for the NATURAL JOIN operation,
because it is the only attribute with the same name in both relations. The resulting relation is illustrated in Figure 6.7(a). In the PROJ_DEPT relation, each tuple com- bines a PROJECT tuple with the DEPARTMENT tuple for the department that con- trols the project, but only one join attribute value is kept.
If the attributes on which the natural join is specified already have the same names in both relations, renaming is unnecessary. For example, to apply a natural join on the Dnumber attributes of DEPARTMENT and DEPT_LOCATIONS , it is sufficient to write
DEPT_LOCS ← DEPARTMENT * DEPT_LOCATIONS The resulting relation is shown in Figure 6.7(b), which combines each department
with its locations and has one tuple for each location. In general, the join condition for NATURAL JOIN is constructed by equating each pair of join attributes that have the same name in the two relations and combining these conditions with AND . There can be a list of join attributes from each relation, and each corresponding pair must have the same name.
160 Chapter 6 The Relational Algebra and Relational Calculus
(a)
PROJ_DEPT Pname
Mgr_start_date ProductX
Mgr_ssn
333445555 1988-05-22 ProductY
1 Bellaire
5 Research
333445555 1988-05-22 ProductZ
2 Sugarland
5 Research
333445555 1988-05-22 Computerization
3 Houston
5 Research
987654321 1995-01-01 Reorganization
10 Stafford
4 Administration
888665555 1981-06-19 Newbenefits
987654321 1995-01-01
(b)
DEPT_LOCS Dname
Dnumber
Mgr_ssn
Mgr_start_date
1981-06-19
1995-01-01
1988-05-22
1988-05-22
1988-05-22
Houston
Figure 6.7
Results of two NATURAL JOIN operations. (a) PROJ_DEPT ← PROJECT * DEPT. (b) DEPT_LOCS ← DEPARTMENT * DEPT_LOCATIONS.
A more general, but nonstandard definition for NATURAL JOIN is Q←R * (< list1 >),(< list2 >) S
In this case, <list1> specifies a list of i attributes from R, and <list2> specifies a list of i attributes from S. The lists are used to form equality comparison conditions between pairs of corresponding attributes, and the conditions are then AND ed together. Only the list corresponding to attributes of the first relation R—<list1>— is kept in the result Q.
Notice that if no combination of tuples satisfies the join condition, the result of a JOIN is an empty relation with zero tuples. In general, if R has n R tuples and S has n S tuples, the result of a JOIN operation R <join condition> S will have between zero and n R *n S tuples. The expected size of the join result divided by the maximum size n R * n S leads to a ratio called join selectivity, which is a property of each join condition. If there is no join condition, all combinations of tuples qualify and the JOIN degen- erates into a CARTESIAN PRODUCT , also called CROSS PRODUCT or CROSS JOIN .
As we can see, a single JOIN operation is used to combine data from two relations so that related information can be presented in a single table. These operations are also
6.3 Binary Relational Operations: JOIN and DIVISION 161
outer joins (see Section 6.4.4). Informally, an inner join is a type of match and com- bine operation defined formally as a combination of CARTESIAN PRODUCT and
SELECTION . Note that sometimes a join may be specified between a relation and itself, as we will illustrate in Section 6.4.3. The NATURAL JOIN or EQUIJOIN opera-
tion can also be specified among multiple tables, leading to an n-way join. For example, consider the following three-way join:
(( PROJECT Dnum = Dnumber DEPARTMENT )
Mgr_ssn = Ssn EMPLOYEE )
This combines each project tuple with its controlling department tuple into a single tuple, and then combines that tuple with an employee tuple that is the department manager. The net result is a consolidated relation in which each tuple contains this project-department-manager combined information.
In SQL, JOIN can be realized in several different ways. The first method is to specify the <join conditions> in the WHERE clause, along with any other selection condi- tions. This is very common, and is illustrated by queries Q1 , Q1A , Q1B , Q2 , and Q8 in Sections 4.3.1 and 4.3.2, as well as by many other query examples in Chapters 4 and 5. The second way is to use a nested relation, as illustrated by queries Q4A and Q16 in Section 5.1.2. Another way is to use the concept of joined tables, as illus-
trated by the queries Q1A , Q1B , Q8B , and Q2A in Section 5.1.6. The construct of joined tables was added to SQL2 to allow the user to specify explicitly all the various types of joins, because the other methods were more limited. It also allows the user to clearly distinguish join conditions from the selection conditions in the WHERE clause.
Parts
» Fundamentals_of_Database_Systems,_6th_Edition
» Characteristics of the Database Approach
» Advantages of Using the DBMS Approach
» A Brief History of Database Applications
» Schemas, Instances, and Database State
» The Three-Schema Architecture
» The Database System Environment
» Centralized and Client/Server Architectures for DBMSs
» Classification of Database Management Systems
» Domains, Attributes, Tuples, and Relations
» Key Constraints and Constraints on NULL Values
» Relational Databases and Relational Database Schemas
» Integrity, Referential Integrity, and Foreign Keys
» Update Operations, Transactions, and Dealing with Constraint Violations
» SQL Data Definition and Data Types
» Specifying Constraints in SQL
» The SELECT-FROM-WHERE Structure of Basic SQL Queries
» Ambiguous Attribute Names, Aliasing, Renaming, and Tuple Variables
» Substring Pattern Matching and Arithmetic Operators
» INSERT, DELETE, and UPDATE Statements in SQL
» Comparisons Involving NULL and Three-Valued Logic
» Nested Queries, Tuples, and Set/Multiset Comparisons
» The EXISTS and UNIQUE Functions in SQL
» Joined Tables in SQL and Outer Joins
» Grouping: The GROUP BY and HAVING Clauses
» Discussion and Summary of SQL Queries
» Specifying General Constraints as Assertions in SQL
» Introduction to Triggers in SQL
» Specification of Views in SQL
» View Implementation, View Update, and Inline Views
» Schema Change Statements in SQL
» Sequences of Operations and the RENAME Operation
» The UNION, INTERSECTION, and MINUS Operations
» The CARTESIAN PRODUCT (CROSS PRODUCT) Operation
» Variations of JOIN: The EQUIJOIN and NATURAL JOIN
» Additional Relational Operations
» Examples of Queries in Relational Algebra
» The Tuple Relational Calculus
» The Domain Relational Calculus
» Using High-Level Conceptual Data Models
» Entity Types, Entity Sets, Keys, and Value Sets
» Relationship Types, Relationship Sets, Roles, and Structural Constraints
» ER Diagrams, Naming Conventions, and Design Issues
» Example of Other Notation: UML Class Diagrams
» Relationship Types of Degree Higher than Two
» Subclasses, Superclasses, and Inheritance
» Constraints on Specialization and Generalization
» Specialization and Generalization Hierarchies
» Modeling of UNION Types Using Categories
» A Sample UNIVERSITY EER Schema, Design Choices, and Formal Definitions
» Data Abstraction, Knowledge Representation, and Ontology Concepts
» ER-to-Relational Mapping Algorithm
» Discussion and Summary of Mapping for ER Model Constructs
» Mapping EER Model Constructs
» The Role of Information Systems
» The Database Design and Implementation Process
» Use of UML Diagrams as an Aid to Database Design Specification 6
» Rational Rose: A UML-Based Design Tool
» Automated Database Design Tools
» Introduction to Object-Oriented Concepts and Features
» Object Identity, and Objects versus Literals
» Complex Type Structures for Objects and Literals
» Encapsulation of Operations and Persistence of Objects
» Type Hierarchies and Inheritance
» Other Object-Oriented Concepts
» Object-Relational Features: Object Database Extensions to SQL
» Overview of the Object Model of ODMG
» Built-in Interfaces and Classes in the Object Model
» Atomic (User-Defined) Objects
» Extents, Keys, and Factory Objects
» The Object Definition Language ODL
» Differences between Conceptual Design of ODB and RDB
» Mapping an EER Schema to an ODB Schema
» Query Results and Path Expressions
» Overview of the C++ Language Binding in the ODMG Standard
» Structured, Semistructured, and Unstructured Data
» XML Hierarchical (Tree) Data Model
» Well-Formed and Valid XML Documents and XML DTD
» XPath: Specifying Path Expressions in XML
» XQuery: Specifying Queries in XML
» Extracting XML Documents from
» Database Programming: Techniques
» Retrieving Single Tuples with Embedded SQL
» Retrieving Multiple Tuples with Embedded SQL Using Cursors
» Specifying Queries at Runtime Using Dynamic SQL
» SQLJ: Embedding SQL Commands in Java
» Retrieving Multiple Tuples in SQLJ Using Iterators
» Database Programming with SQL/CLI Using C
» JDBC: SQL Function Calls for Java Programming
» Database Stored Procedures and SQL/PSM
» PHP Variables, Data Types, and Programming Constructs
» Overview of PHP Database Programming
» Imparting Clear Semantics to Attributes in Relations
» Redundant Information in Tuples and Update Anomalies
» Normal Forms Based on Primary Keys
» General Definitions of Second and Third Normal Forms
» Multivalued Dependency and Fourth Normal Form
» Join Dependencies and Fifth Normal Form
» Inference Rules for Functional Dependencies
» Minimal Sets of Functional Dependencies
» Properties of Relational Decompositions
» Dependency-Preserving Decomposition
» Dependency-Preserving and Nonadditive (Lossless) Join Decomposition into 3NF Schemas
» Problems with NULL Values and Dangling Tuples
» Discussion of Normalization Algorithms and Alternative Relational Designs
» Further Discussion of Multivalued Dependencies and 4NF
» Other Dependencies and Normal Forms
» Memory Hierarchies and Storage Devices
» Hardware Description of Disk Devices
» Magnetic Tape Storage Devices
» Placing File Records on Disk
» Files of Unordered Records (Heap Files)
» Files of Ordered Records (Sorted Files)
» External Hashing for Disk Files
» Hashing Techniques That Allow Dynamic File Expansion
» Other Primary File Organizations
» Parallelizing Disk Access Using RAID Technology
» Types of Single-Level Ordered Indexes
» Some General Issues Concerning Indexing
» Algorithms for External Sorting
» Implementing the SELECT Operation
» Implementing the JOIN Operation
» Algorithms for PROJECT and Set
» Notation for Query Trees and Query Graphs
» Heuristic Optimization of Query Trees
» Catalog Information Used in Cost Functions
» Examples of Cost Functions for SELECT
» Examples of Cost Functions for JOIN
» Example to Illustrate Cost-Based Query Optimization
» Factors That Influence Physical Database Design
» Physical Database Design Decisions
» An Overview of Database Tuning in Relational Systems
» Transactions, Database Items, Read and Write Operations, and DBMS Buffers
» Why Concurrency Control Is Needed
» Transaction and System Concepts
» Desirable Properties of Transactions
» Serial, Nonserial, and Conflict-Serializable Schedules
» Testing for Conflict Serializability of a Schedule
» How Serializability Is Used for Concurrency Control
» View Equivalence and View Serializability
» Types of Locks and System Lock Tables
» Guaranteeing Serializability by Two-Phase Locking
» Dealing with Deadlock and Starvation
» Concurrency Control Based on Timestamp Ordering
» Multiversion Concurrency Control Techniques
» Validation (Optimistic) Concurrency
» Granularity of Data Items and Multiple Granularity Locking
» Using Locks for Concurrency Control in Indexes
» Other Concurrency Control Issues
» Recovery Outline and Categorization of Recovery Algorithms
» Caching (Buffering) of Disk Blocks
» Write-Ahead Logging, Steal/No-Steal, and Force/No-Force
» Transaction Rollback and Cascading Rollback
» NO-UNDO/REDO Recovery Based on Deferred Update
» Recovery Techniques Based on Immediate Update
» The ARIES Recovery Algorithm
» Recovery in Multidatabase Systems
» Introduction to Database Security Issues 1
» Discretionary Access Control Based on Granting and Revoking Privileges
» Mandatory Access Control and Role-Based Access Control for Multilevel Security
» Introduction to Statistical Database Security
» Introduction to Flow Control
» Encryption and Public Key Infrastructures
» Challenges of Database Security
» Distributed Database Concepts 1
» Types of Distributed Database Systems
» Distributed Database Architectures
» Data Replication and Allocation
» Example of Fragmentation, Allocation, and Replication
» Query Processing and Optimization in Distributed Databases
» Overview of Transaction Management in Distributed Databases
» Overview of Concurrency Control and Recovery in Distributed Databases
» Current Trends in Distributed Databases
» Distributed Databases in Oracle 13
» Generalized Model for Active Databases and Oracle Triggers
» Design and Implementation Issues for Active Databases
» Examples of Statement-Level Active Rules
» Time Representation, Calendars, and Time Dimensions
» Incorporating Time in Relational Databases Using Tuple Versioning
» Incorporating Time in Object-Oriented Databases Using Attribute Versioning
» Temporal Querying Constructs and the TSQL2 Language
» Spatial Database Concepts 24
» Multimedia Database Concepts
» Clausal Form and Horn Clauses
» Datalog Programs and Their Safety
» Evaluation of Nonrecursive Datalog Queries
» Introduction to Information Retrieval
» Types of Queries in IR Systems
» Evaluation Measures of Search Relevance
» Web Analysis and Its Relationship to Information Retrieval
» Analyzing the Link Structure of Web Pages
» Approaches to Web Content Analysis
» Trends in Information Retrieval
» Data Mining as a Part of the Knowledge
» Goals of Data Mining and Knowledge Discovery
» Types of Knowledge Discovered during Data Mining
» Market-Basket Model, Support, and Confidence
» Frequent-Pattern (FP) Tree and FP-Growth Algorithm
» Other Types of Association Rules
» Approaches to Other Data Mining Problems
» Commercial Data Mining Tools
» Data Modeling for Data Warehouses
» Difficulties of Implementing Data Warehouses
» Grouping, Aggregation, and Database Modification in QBE
Show more