Selecting the Primary Key

The selection of the primary key is important. The DBMS will use the primary key to facili- tate searching and sorting of table rows, and some DBMS products use it to organize table

Figure 6-1

1. Create a table for each entity:

Steps for Transforming a – Specify primary key (consider surrogate keys, as appropriate) Data Model into a Database

– Specify candidate keys

Design

– Specify properties for each column:

• Null status • Data type • Default value (if any) • Specify data constraints (if any)

– Verify normalization 2. Create relationships by placing foreign keys – Relationships between strong entities (1:1, 1:N, N:M) – Identifying relationships with ID-dependent entities (intersection tables,

association patterns, multivalued attributes, archetype/instance patterns) – Relationships between a strong entity and a weak but non-ID-dependent entity

(1:1, 1:N, N:M) – Mixed relationships – Relationships between supertype/subtype entities – Recursive relationships (1:1, 1:N, N:M)

3. Specify logic for enforcing minimum cardinality: – M-O relationships – O-M relationships – M-M relationships

Figure 6-2

EMPLOYEE

Transforming an Entity

EmployeeNumber to a Table

(a) EMPLOYEE Entity

(b) EMPLOYEE Table

Chapter 6 Transforming Data Models into Database Designs

storage. DBMS products almost always create indexes and other data structures using the values of the primary key.

The ideal primary key is short, numeric, and fixed. EmployeeNumber in Figure 6-2 meets all of these conditions and is acceptable. Beware of primary keys such as EmployeeName, Email, (AreaCode, PhoneNumber), (Street, City, State, Zip), and other long character columns. In cases like these, when the identifier is not short, numeric, or fixed, consider using another candidate key as the primary key. If there are no additional candidate keys, or if none of them is any better, consider using a surrogate key.

A surrogate key is a DBMS-supplied identifier of each row of a table. Surrogate key values are unique within the table, and they never change. They are assigned when the row is created, and they are destroyed when the row is deleted. Surrogate key values are the best possible primary keys because they are designed to be short, numeric, and fixed. Because of these advantages, some organizations have gone so far as to require that surrogates be used for the primary key of every table.

Before endorsing such a policy, however, consider two disadvantages of surrogate keys. First, their values have no meaning to a user. Suppose you want to determine the department to which an employee is assigned. If DepartmentName is a foreign key in EMPLOYEE, then when you retrieve an employee row, you obtain a value such as ‘Accounting’ or ‘Finance’. That value may be all that you need to know about department.

Alternatively, if you define the surrogate key DepartmentID as the primary key of DEPARTMENT, then DepartmentID will also be the foreign key in EMPLOYEE. When you retrieve a row of EMPLOYEE, you will get back a number such as 123499788 for the DepartmentID, a value that has no meaning to you at all. You have to perform a second query on DEPARTMENT to obtain DepartmentName.

The second disadvantage of surrogate keys arises when data are shared among different databases. Suppose, for example, that a company maintains three different SALES databases, one for each of three different product lines. Assume that each of these databases has a table called SALES_ORDER that has a surrogate key called ID. The DBMS assigns values to IDs so that they are unique within a particular database. It does not, however, assign ID values so that they are unique across the three different databases. Thus, it is possible for two different SALES_ORDER rows, in two different databases, to have the same ID value.

This duplication is not a problem until data from the different databases are merged. When that happens, to prevent duplicates, ID values will need to be changed. However, if ID values are changed, then foreign key values may need to be changed as well, and the result is a mess, or at least much work to prevent a mess.

It is, of course, possible to construct a scheme using different starting values for surrogate keys in different databases. Such a policy ensures that each database has its own range of surrogate key values. This requires careful management and procedures, however; and if the starting values are too close to one another, the ranges will overlap and duplicate surrogate key values will result.

Some database designers take the position that, for consistency, if one table has a surrogate key, all of the tables in the database should have a

surrogate key. Others think that such a policy is too rigid; after all, there are good data keys, such as ProductSKU (which would use SKU codes as discussed in Chapter 2). If such a key exists, it should be used instead of a surrogate key. Your organization may have standards on this issue that you should follow.

Be aware that DBMS products vary in their support for surrogate keys. Microsoft Access, Microsoft SQL Server, and Oracle MySQL provide them. Microsoft SQL Server allows the designer to pick the starting value and increment of the key, and Oracle MySQL allows the designer to pick the starting value. Oracle’s Oracle Database 11g, however, does not provide direct support for surrogate keys, but you can obtain the essence of them in a rather backhanded way, as discussed in Chapter 10A.

We use surrogate keys unless there is some strong reason not to. In addition to the advantages described here, the fact that they are fixed simplifies the enforcement of minimum cardinality, as you will learn in the last section of this chapter.

Part 2 Database Design

Name (AK1.1)

Phone

City (AK1.2)

Email (AK1.1)

Phone

HireDate

Email (AK2.1)

ReviewDate

Figure 6-3

EmpCode

Representing Candidate (Alternative) Keys

(a)

(b)

Selecting the Primary Key