Basic concepts

2.2.5 Schemas

A database schema is a formal description of all the database relations and all the relationships existing between them. In Chapter 3, Conceptual data modeling, and Chapter

4, Relational database design, you will learn more about a relational database schema.

2.2.6 Keys

The relational data model uses keys to define identifiers for a relation’s tuples. The keys are used to enforce rules and/or constraints on database data. Those constraints are essential for maintaining data consistency and correctness. Relational DBMS permits definition of such keys, and starting with this point the relational database management system is responsible to verify and maintain the correctness and consistency of database data. Let’s define each type of key.

2.2.6.1 Candidate keys

A candidate key is a unique identifier for the tuples of a relation. By definition, every relation has at least one candidate key (the first property of a relation). In practice, most relations have multiple candidate keys.

C. J. Date in [2.2] gives the following definition for a candidate key: Let R be a relation with attributes A1, A2, …, An. The set of K=(Ai, Aj, …, Ak) of R

is said to be a candidate key of R if and only if it satisfies the following two time- independent properties:

 Uniqueness At any given time, no two distinct tuples of R have the same value for Ai, the same

value for Aj, …, and the same value for Ak.  Minimality

Database Fundamentals

42 None of Ai, Aj, …, Ak can be discarded from K without destroying the

uniqueness property. Every relation has at least one candidate key, because at least the combination of all of its

attributes has the uniqueness property (the first property of a relation), but usually exist at least one other candidate key made of fewer attributes of the relation. For example, the CARS relation shown earlier in Figure 2.2 has only one candidate key K=(Type, Producer, Model, FabricationYear, Color, Fuel) considering that we can have multiple cars with the same characteristics in the relation. Nevertheless, if we create another relation CARS as in Figure 2.3 by adding other two attributes like SerialNumber (engine serial number) and IdentificationNumber (car identification number) we will have 3 candidate keys for that relation.

The new CARS Relation

Candidate keys

TYPE PRODUCE

NUMBER LIMOUSINE

SB24MEA

AB08DGF

NF37590

LIMOUSIN MERCEDES

SB06GHX

WM19875

LIMOUSINE AUDI

SB52MAG

MW79580

LIMOUSINE BMW

AB02AMR

WQ21998

Figure 2.3 – The new CARS Relation and its candidate keys

A candidate key is sometimes called a unique key. A unique key can be specified at the Data Definition Language (DDL) level using the UNIQUE parameter beside the attribute name. If a relation has more than one candidate key, the one that is chosen to represent the relation is called the primary key, and the remaining candidate keys are called alternate keys.

Note:

To correctly define candidate keys you have to take into consideration all relation instances to understand the attributes meaning so you can be able to determine if duplicates are possible during the relation lifetime.

2.2.6.2 Primary keys

Chapter 2 – The relational data model 43

A primary key is a unique identifier of the relation tuples. As mentioned already, it is a candidate key that is chosen to represent the relation in the database and to provide a way to uniquely identify each tuple of the relation. A database relation always has a primary key.

Relational DBMS allow a primary key to be specified the moment you create the relation (table). The DDL sublanguage usually has a PRIMARY KEY construct for that. For example, for the CARS relation from Figure 2.3 the primary key will be the candidate key IdentificationNumber . This attribute values must be “UNIQUE” and “NOT NULL” for

all tuples from all relation instances. There are situations when real-world characteristic of data, modeled by that relation, do not

have unique values. For example, the first CARS relation from Figure 2.2 suffers from this inconvenience. In this case, the primary key must be the combination of all relation attributes. Such a primary key is not a convenient one for practical matters as it would require too much physical space for storage, and maintaining relationships between database relations would be more difficult. In those cases, the solution adopted is to

introduce another attribute, like an ID, with no meaning to real-world data, which will have unique values and will be used as a primary key. This attribute is usually called a surrogate key. Sometimes, in database literature, you will also find it referenced as artificial key.

Surrogate keys usually have unique numerical values. Those values grow or decrease automatically with an increment (usually by 1).

2.2.6.3 Foreign keys

A foreign key is an attribute (or attribute combination) in one relation R2 whose values are required to match those of the primary key of some relation R1 (R1 and R2 not necessarily distinct). Note that a foreign key and the corresponding primary key should be defined on the same underlying domain.

For example, in Figure 2.4 we have another relation called OWNERS which contains the data about the owners of the cars from relation CARS.

Database Fundamentals

OWNERS Relation

Foreign key Primary key

ID FIRST NAME LAST NAME

IDENTIFI CATION NUMBER

1 JOHN SMITH

SB24MEA

2 MARY FORD

ALBA

TE ILOR

AB08DGF

3 ANNE SHEPARD

SIB IU

SEBASTIA N

SB06GHX

4 WILLIAM HILL

SB52MAG

5 JOE PESCI

ALBA

MOLD OVA

AB02AMR

Figure 2.4 – The OWNERS relation and its primary and foreign keys

The IdentificationNumber foreign key from the OWNERS relation refers to the IdentificationNumber primary key from CARS relation. In this manner, we are able to know which car belongs to each person.

Foreign-to-primary-key matches represent references from one relation to another. They are the “glue” that holds the database together. Another way of saying this is that

foreign-to-primary-key matches represent certain relationships between tuples. Note carefully, however, that not all such relationships are represented by foreign-to-primary-key matches.

The DDL sublanguage usually has a FOREIGN KEY construct for defining the foreign keys. For each foreign key the corresponding primary key and its relation is also specified.