The Multivalue, Multicolumn Problem

The Multivalue, Multicolumn Problem

The table in Figure 4-7 illustrates the first common problem. Notice the columns VendorContact_1 and VendorContact_2. These columns store the names of two contacts at the part vendor. If the company wanted to store the names of three or four contacts using this strategy, it would add columns VendorContact_3, VendorContact_4, and so forth.

Consider another example for an employee parking application. Suppose the EMPLOYEE_AUTO table includes basic employee data plus columns for license numbers for up to three cars. The following is the typical table structure:

EMPLOYEE (EmployeeNumber, EmployeeLastName, EmployeeFirstName,Email, Auto1_LicenseNumber, Auto2_LicenseNumber, Auto3_LicenseNumber)

Other examples of this strategy are to store employees’ children’s names in columns such as Child_1, Child_2, Child_3, and so forth, for as many children as the designer of the table thinks appropriate, to store a picture of a house in a real estate application in columns labeled Picture_1, Picture_2, Picture_3, and so forth.

Storing multiple values in this way is convenient, but it has two serious disadvantages. The more obvious one is that the number of possible items is fixed. What if there are three contacts at a particular vendor? Where do we put the third name if only columns VendorContact_1 and VendorContact_2 are available? Or, if there are only three columns for child names, where do we put the name of the fourth child? And so forth.

The second disadvantage occurs when querying the data. Suppose we have the following EMPLOYEE table:

EMPLOYEE (EmployeeNumber, EmployeeLastName, EmployeeFirstName, Email, Child_1, Child_2, Child_3, . . . {other data})

Figure 4-8

• Multivalue, Multicolumn Problem

Practical Problems in Designing Databases from

• Inconsistent Values

Existing Data

• Missing Values • General-Purpose Remarks Column

Part 2 Database Design

Further, suppose we want to know the names of employees who have a child with the first name Gretchen. If there are three child name columns as shown in our EMPLOYEE table, we must write:

/* *** EXAMPLE CODE-DO NOT RUN *** */ /* *** SQL-Query-CH04-03 *** */ SELECT

Child_1 = 'Gretchen'

OR

Child_2 = 'Gretchen'

OR

Child_3 = 'Gretchen';

Of course, if there are seven child names . . . well, you get the picture. These problems can be eliminated by using a second table to store the multivalued attribute. For the employee–child case, the tables are:

EMPLOYEE (EmployeeNumber, EmployeeLastName, EmployeeFirstName, Email, . . . {other data})

CHILD (EmployeeNumber, ChildFirstName, . . . {other data}) Using this second structure, employees can have an unlimited number of children, and storage

space will be saved for employees who have no children at all. Additionally, to find all of the employees who have a child named Gretchen, we can code:

/* *** EXAMPLE CODE-DO NOT RUN *** */ /* *** SQL-Query-CH04-04 *** */ SELECT

EmployeeNumber IN (SELECT

ChildFirstName = 'Gretchen');

This second query is easier to write and understand and will work regardless of the number of children that an employee has.

The alternate design does require the DBMS to process two tables, and if the tables are large and performance is a concern one can argue that the original design is better. In such cases, storing multivalues in multiple columns may be preferred. Another, less valid objection to the two-table design is as follows: “We only need space for three cars because university policy restricts each employee to registering no more than three cars.” The problem with this statement is that databases often outlive policies. Next year that policy may change, and, if it does, the database will need to be redesigned. As you will learn in Chapter 8, database redesign is tricky, complex, and expensive. It is better to avoid the need for a database redesign.

A few years ago, people argued that only three phone number columns were needed per person: Home, Office, and Fax. Later they said, “Well, OK, maybe we need four: Home, Office, Fax, and Mobile.” Today, who would want to guess the maximum number of phone numbers a person might have? Rather than guess, just store Phone in a separate table; such a design will allow each person to have from none to an unlimited number of phone numbers.

You are likely to encounter the multivalue, multicolumn problem when creating databases from nondatabase data. It is particularly common in spreadsheet and text data files. Fortunately, the preferred two-table design is easy to create, and the SQL for moving the data to the new design is easy to write.

Chapter 4 Database Design Using Normalization

The multivalue, multicolumn problem is just another form of a multivalued dependency. For the parking application, for example, rather than store

multiple rows in EMPLOYEE for each auto, multiple named columns are created in the table. The underlying problem is the same, however.