The Double NOT EXISTS Query

The following SQL statement implements the strategy just described:

/* *** SQL-Query-CH08-06 *** */ SELECT

A.FirstName, A.LastName FROM

ARTIST AS A WHERE

NOT EXISTS (SELECT

C.CustomerID

FROM

CUSTOMER AS C

WHERE

NOT EXISTS (SELECT

CAI.CustomerID

FROM

CUSTOMER_ARTIST_INT AS CAI

WHERE

C.CustomerID = CAI.CustomerID

AND

A.ArtistID = CAI.ArtistID));

The result of this query is an empty set, indicating that there is no artist that every customer is interested in:

Let’s see how this works. The bottom SELECT (the third SELECT in the SQL statement) finds all of the customers who are interested in a particular artist. As you read this SELECT (the last SELECT in the query), keep in mind that this is a correlated subquery; this SELECT is nested inside the query on CUSTOMER, which is nested inside the query on ARTIST. C.CustomerID is coming from the SELECT on CUSTOMER in the middle, and A.ArtistID is coming from the SELECT on ARTIST at the top.

Now, the NOT EXISTS in the sixth line of the query will find the customers who are not interested in the given artist. If all customers are interested in the given artist, the result of the

Part 3 Database Implementation

middle SELECT will be null. If the result of the middle SELECT is null, the NOT EXISTS in the third line of the query will be true, and the name of that artist will be produced, just as we want.

Consider what happens for artists who do not qualify in this query. Suppose that every customer except Tiffany Twilight is interested in the artist Joan Miro. (This is not the case for the data in Figure 7-16, but assume that it were true.) Now, for the preceding query, when Miro’s row is considered, the bottom SELECT will retrieve every customer except Tiffany Twilight. In this case, because of the NOT EXISTS in the sixth line of the query, the middle SELECT will produce the CustomerID for Tiffany Twilight (because her row is the only one that does not appear in the bottom SELECT). Now, because there is a result from the middle SELECT, the NOT EXISTS in the top SELECT is false, and the name Joan Miro will not

be included in the output of the query. This is correct because there is a customer who is not interested in Joan Miro.

Again, take some time to study this pattern. It is a famous one, and if you become a database professional, you will certainly see it again in one form or another. In fact, you will not not see it again!

How Do I Analyze an Existing Database?

Before we proceed with a discussion of database redesign, reflect for a moment on what this task means for a real company whose operations are dependent on the database. Suppose, for example, that you work for a company such as Amazon.com. Further suppose that you have been tasked with an important database redesign assignment, say to change the primary key of the vendor table.

To begin, you may wonder, why would Amazon want to do this? It could be that in the early days, when it only sold books, Amazon used company names for vendors. But, as Amazon began to sell more types of products, company name was no longer sufficient. Perhaps there are too many duplicates, and Amazon may have decided to switch to an Amazon-created VendorID.

Now, what does it mean to switch primary keys? Besides adding the new data to the correct rows, what else does it mean? Clearly, if the old primary key has been used as a foreign key, all of the foreign keys need to be changed as well. So we need to know all of the relation- ships in which the old primary key was used. But what about views? Do any views use the old primary key? If so, they will need to be changed. What about triggers and stored procedures? Do any of them use the old primary key? Not to mention any application code that may break when the old key is removed.

Now, to create a nightmare, what happens if you get partway through the change process and something fails? Suppose you encounter unexpected data, and you receive errors from the DBMS while trying to add the new primary key. Amazon cannot change its Web site to display “Sorry, our database is broken—come back tomorrow (we hope)!”

This nightmare brings up many topics, most of which relate to systems analysis and design (see Appendix B for a brief introduction to systems analysis and design). But with regard to database processing, three principles become clear. First, as the carpenters say, “Measure twice and cut once.” Before we attempt any structural changes to a database, we must clearly understand the current structure and contents of the database, and we must know what depends on what. Second, before we make any structural changes to an operational database, we must test those changes on a realistically sized test database that has all of the important test data cases. Finally, if at all possible, we need to create a complete backup of the operational database prior to making any structural changes. If all goes awry, the backup can

be used to restore the database while problems are corrected. We will consider each of these important topics next.

The Double NOT EXISTS Query