Types of Distributed Databases

A database can be distributed by partitioning, which means breaking the database into pieces and storing the pieces on multiple computers; by replication, which means storing the copies of the database on multiple computers; or by a combination of replication and partitioning. Figure 9-20 illustrates these alternatives.

Figure 9-20(a) shows a nondistributed database with four pieces labeled W, X, Y, and Z. In Figure 9-20(b), the database has been partitioned but not replicated. Portions W and X are stored and processed on Computer 1, and portions Y and Z are stored and processed on Computer 2. Figure 9-20(c) shows a database that has been replicated but not partitioned. The entire database is stored and processed on Computers 1 and 2. Finally, Figure 9-20(d) shows a database that is partitioned and replicated. Portion Y of the database is stored and processed on Computers 1 and 2.

The portions to be partitioned or replicated can be defined in many different ways.

A database that has five tables (e.g., CUSTOMER, SALESPERSON, INVOICE, LINE_ITEM, and PART) could be partitioned by assigning CUSTOMER to portion W, SALESPERSON to portion X, INVOICE and LINE_ITEM to portion Y, and PART to portion Z. Alternatively, different rows of each of these five tables could be assigned to different computers, or different columns of each of these tables could be assigned to different computers.

Databases are distributed for two major reasons: performance and control. Having a database on multiple computers can improve throughput, either because multiple computers are sharing the workload or because communications delays can be reduced by placing the computers closer to their users. Distributing the database can improve control by segregating different portions of the database to different computers, each of which can have its own set of authorized users and permissions.

Chapter 9 Managing Multiuser Databases

DBMS/OS X

DB 1 DBMS/OS

Single Processing Computer AP 2 Y

DBMS/OS Z DB 2

Computer 2 DB (a) Nonpartitioned, Nonreplicated

(b) Partitioned, Nonreplicated Alternative Alternative

X DBMS/OS

DBMS/OS

Computer 1 DB 1 Communication

DB (Copy 1)

2 Y AP 2 X DBMS/OS

DB 2

DBMS/OS

Computer 2 DB

DB (Copy 2) Computer 2

Figure 9-20

Types of Distributed

Challenges of Distributed Databases

Databases

Significant challenges must be overcome when distributing a database, and those challenges depend on the type of distributed database and the activity that is allowed. In the case of a fully replicated database, if only one computer is allowed to make updates on one of the copies, then the challenges are not too great. All update activity occurs on that single computer, and copies of that database are periodically sent to the replication sites. The challenge is to ensure that only a logically consistent copy of the database is distributed (no partial or uncommitted transactions, for example) and to ensure that the sites understand that they are processing data that might not be current because changes could have been made to the updated database after the local copy was made.

If multiple computers can make updates to a replicated database, then difficult problems arise. Specifically, if two computers are allowed to process the same row at the same time, they can cause three types of error: They can make inconsistent changes, one computer can delete

a row that another computer is updating, or the two computers can make changes that violate uniqueness constraints.

To prevent these problems, some type of record locking is required. Because multiple computers are involved, standard record locking does not work. Instead, a far more complicated locking scheme, called distributed two-phase locking, must be used. The specifics of the scheme are beyond the scope of this discussion; for now, just know that implementing this algorithm is difficult and expensive. If multiple computers can process multiple replications of

a distributed database, then significant problems must be solved.

If the database is partitioned but not replicated [Figure 9-20(b)], then problems will occur if any transaction updates data that span two or more distributed partitions. For example, suppose the CUSTOMER and SALESPERSON tables are placed on a partition on one computer and that INVOICE, LINE_ITEM, and PART tables are placed on a second computer. Further

Part 4 Multiuser Database Processing

suppose that when recording a sale all five tables are updated in an atomic transaction. In this case, a transaction must be started on both computers, and it can be allowed to commit on one computer only if it can be allowed to commit on both computers. In this case, distributed two-phase locking also must be used.

If the data are partitioned in such a way that no transaction requires data from both partitions, then regular locking will work. However, in this case the databases are actually two separate databases, and some would argue that they should not be considered a distributed database.

If the data are partitioned in such a way that no transaction updates data from both partitions but that one or more transactions read data from one partition and update data on

a second partition, then problems might or might not result with regular locking. If dirty reads are possible, then some form of distributed locking is required; otherwise, regular locking should work.

If a database is partitioned and at least one of those partitions is replicated, then locking requirements are a combination of those just described. If the replicated portion is updated, if transactions span the partitions, or if dirty reads are possible, then distributed two-phase locking is required; otherwise, regular locking might suffice.

Distributed processing is complicated and can create substantial problems. Except in the case of replicated, read-only databases, only experienced teams with a substantial budget and significant time to invest should attempt distributed databases. Such databases also require data communications expertise. Distributed databases are not for the faint of heart.

Object-Relational Databases

Object-oriented programming (OOP) is a technique for designing and writing computer programs. Today, most new program development is done using OOP techniques. Java, C++, C#, and Visual Basic.NET are object-oriented computer programs.

Objects are data structures that have both methods, which are computer programs that perform some task, and properties, which are data items particular to an object. All objects of a given class have the same methods, but each has its own set of data items. When using an OOP, the properties of the object are created and stored in main memory. Storing the values of properties of an object is called object persistence. Many different techniques have been used for object persistence. One of them is to use some variation of database technology.

Although relational databases can be used for object persistence, using this method requires substantial work on the part of the programmer. The problem is that, in general, object data structures are more complicated than the row of a table. Typically, several, or even many, rows of several different tables are required to store object data. This means the OOP programmer must design a mini-database just to store objects. Usually, many objects are involved in an information system, so many different mini-databases need to be designed and processed. This method is so undesirable that it is seldom used.

In the early 1990s, several vendors developed special-purpose DBMS products for storing object data. These products, which were called object-oriented DBMSs (OODBMSs), never achieved commercial success. The problem was that by the time they were introduced, billions of bytes of data were already stored in relational DBMS format, and no organization wanted to convert their data to OODBMS format to be able to use an OODBMS. Consequently, such products failed in the marketplace.

However, the need for object persistence did not disappear. Some vendors, most notably Oracle, added features and functions to their relational database DBMS products to create object-relational databases. These features and functions are basically add-ons to a relational DBMS that facilitate object persistence. With these features, object data can be stored more readily than with a purely relational database. However, an object-relational database can still process relational data at the same time. To learn more about object- relational databases, search for OODBMS and ODBMS on the Web.

Chapter 9 Managing Multiuser Databases

Multiuser databases pose difficult problems for the organiza- and declare the concurrent behavior they want. The DBMS tions that create and use them, and most organizations have

then places locks for the application that will result in the created an office of database administration to ensure that

desired behavior.

such problems are solved. In this text, the term database An ACID transaction is one that is atomic, consistent, administrator refers to the person or office that is concerned

isolated, and durable. Durable means that database changes with a single database. The term data administrator is used

are permanent. Consistency can mean either statement-level to describe a management function that is concerned with

or transaction-level consistency. With transaction-level the organization’s data policy and security. Major functions

consistency, a transaction may not see its own changes. The of the database administrator are listed in Figure 9-1.

1992 SQL standard defines four transaction isolation levels: The database administrator (DBA) participates in the

read uncommitted, read committed, repeatable read, and initial development of database structures and in providing

serializable. The characteristics of each are summarized in configuration control when requests for changes arise.

Figure 9-11.

Keeping accurate documentation of the structure and

A cursor is a pointer into a set of records. Four cursor changes to it is an important DBA function.

types are prevalent: forward only, static, keyset, and dynamic. The goal of concurrency control is to ensure that one

Developers should select isolation levels and cursor types user’s work does not inappropriately influence another user’s

that are appropriate for their application workload and for work. No single concurrency control technique is ideal for all

the DBMS product in use.

circumstances. Trade-offs need to be made between the level The goal of database security is to ensure that only of protection and throughput. A transaction, or logical unit of

authorized users can perform authorized activities at work (LUW), is a series of actions taken against the database

authorized times. To develop effective database security, the that occurs as an atomic unit; either all of them occur or none

processing rights and responsibilities of all users must be of them do. The activity of concurrent transactions is inter-

determined.

leaved on the database server. In some cases, updates can be DBMS products provide security facilities. Most involve lost if concurrent transactions are not controlled. Another

the declaration of users, groups, objects to be protected, and concurrency problem concerns inconsistent reads.

permissions or privileges on those objects. Almost all DBMS To avoid concurrency problems, database elements are

products use some form of user name and password security. locked. Implicit locks are placed by the DBMS; explicit locks

Security guidelines are listed in Figure 9-15. DBMS security are issued by the application program. The size of the locked

can be augmented by application security. resource is called lock granularity. An exclusive lock prohibits

In the event of system failure, the database must be other users from reading the locked resource; a shared lock

restored to a usable state as soon as possible. Transactions in allows other users to read the locked resource, but they cannot

process at the time of the failure must be reapplied or update it. Two transactions that run concurrently and gener-

restarted. Although in some cases recovery can be done by ate results that are consistent with the results that would have

reprocessing, the use of logs and rollback and rollforward is occurred if they had run separately are referred to as serializ-

almost always preferred. Checkpoints can be taken to reduce able transactions. Two-phase locking, in which locks are

the amount of work that needs to be done after a failure. acquired in a growing phase and released in a shrinking phase,

In addition to these tasks, the DBA manages the DBMS is one scheme for serializability. A special case of two-phase

product itself, measuring database application performance locking is to acquire locks throughout the transaction, but not

and assessing the need for changes in database structure or to free any lock until the transaction is finished.

DBMS performance tuning. The DBA also ensures that new Deadlock, or the deadly embrace, occurs when two trans-

DBMS features are evaluated and used as appropriate. Finally, actions are each waiting on a resource that the other transaction

the DBA is responsible for maintaining the data repository. holds. Deadlock can be prevented by requiring transactions to

A distributed database is a database that is stored and acquire all locks at the same time. Once deadlock occurs, the

processed on more than one computer. A replicated data- only way to cure it is to abort one of the transactions (and back

base is one in which multiple copies of some or all of the out of partially completed work). Optimistic locking assumes

database are stored on different computers. A partitioned that no transaction conflict will occur and deals with the conse-

database is one in which different pieces of the database are quences if it does. Pessimistic locking assumes that conflict will

stored on different computers. A distributed database can be occur and so prevents it ahead of time with locks. In general,

replicated and distributed.

optimistic locking is preferred for the Internet and for many Distributed databases pose processing challenges. If a intranet applications.

database is updated on a single computer, then the challenge Most application programs do not explicitly declare

is simply to ensure that the copies of the database are locks. Instead, they mark transaction boundaries with

logically consistent when they are distributed. However, if BEGIN, COMMIT, and ROLLBACK transaction statements

updates are to be made on more than one computer, the

Part 4 Multiuser Database Processing

challenges become significant. If the database is partitioned but they have different property values. Object persistence is and not replicated, then challenges occur if transactions

the process of storing object property values. Relational span data on more than one computer. If the database is

databases are difficult to use for object persistence. Some replicated and if updates occur to the replicated portions,

specialized products called object-oriented DBMSs were then a special locking algorithm called distributed two-

developed in the 1990s but never received commercial phase locking is required. Implementing this algorithm can

acceptance. Oracle and others have extended the capabilities

be difficult and expensive. of their relational DBMS products to provide support for Objects consist of methods and properties or data

object persistence. Such databases are referred to as object- values. All objects of a given class have the same methods,

relational databases.

ACID transaction

object persistence

active repository object-oriented DBMSs (OODBMSs) after image

object-oriented programming (OOP) atomic

object-relational databases before image

optimistic locking

checkpoint

partitioning

concurrent transaction

passive repository

concurrent update problem

pessimistic locking

consistent

phantom reads

cursor processing rights and responsibilities data administration

properties

data repository recovery via reprocessing database administration

recovery via rollback/rollforward database administrator

replication

database save

resource locking

deadly embrace

rollforward

dirty read

scrollable cursor

distributed database

serializable

distributed two-phase locking

shared lock

durable

shrinking phase

dynamic cursor SQL BEGIN TRANSACTION statement exclusive lock

SQL COMMIT TRANSACTION statement explicit lock

SQL data control language (DCL) growing phase

SQL GRANT statement implicit lock

SQL injection attack

inconsistent read problem SQL REVOKE statement isolated

isolation levels SQL ROLLBACK TRANSACTION statement keyset cursor

SQL START TRANSACTION statement lock

SQL WORK keyword

lock granularity statement-level consistency log

static cursor

logical unit of work (LUW)

strong password

lost update problem

transaction

methods transaction-level consistency nonrepeatable reads

two-phase locking

objects

user group

Chapter 9 Managing Multiuser Databases

9.1 Briefly describe five difficult problems for organizations that create and use multiuser databases.

9.2 Explain the difference between a database administrator and a data administrator.

9.3 List seven important DBA tasks.

9.4 Summarize the DBA’s responsibilities for managing database structure.

9.5 What is configuration control? Why is it necessary?

9.6 Explain the meaning of the word inappropriately in the phrase “one user’s work does not inappropriately influence another user’s work.”

9.7 Explain the trade-off that exists in concurrency control.

9.8 Define an atomic transaction and explain why atomicity is important.

9.9 Explain the difference between concurrent transactions and simultaneous transactions. How many CPUs are required for simultaneous transactions?

9.10 Give an example, other than the one in this text, of the lost update problem.

9.11 Explain the difference between an explicit and an implicit lock.

9.12 What is lock granularity?

9.13 Explain the difference between an exclusive lock and a shared lock.

9.14 Explain two-phase locking.

9.15 How does releasing all locks at the end of the transaction relate to two-phase locking?

9.16 In general, how should the boundaries of a transaction be defined?

9.17 What is deadlock? How can it be avoided? How can it be resolved once it occurs?

9.18 Explain the difference between optimistic and pessimistic locking.

9.19 Explain the benefits of marking transaction boundaries, declaring lock characteristics, and letting the DBMS place locks.

9.20 Explain the use of the SQL BEGIN TRANSACTION, COMMIT TRANSACTION, and ROLLBACK TRANSACTION statements. Why does MySQL also use the SQL START TRANSACTION statement?

9.21 Explain the meaning of the expression ACID transaction.

9.22 Describe statement-level consistency.

9.23 Describe transaction-level consistency. What disadvantage can exist with it?

9.24 What is the purpose of transaction isolation levels?

9.25 Explain the read-uncommitted isolation level. Give an example of its use.

9.26 Explain the read-committed isolation level. Give an example of its use.

9.27 Explain the repeatable-read isolation level. Give an example of its use.

9.28 Explain the serializable isolation level. Give an example of its use.

9.29 Explain the term cursor.

9.30 Explain why a transaction may have many cursors. Also, how is it possible that a transaction may have more than one cursor on a given table?

9.31 What is the advantage of using different types of cursors?

9.32 Explain forward-only cursors. Give an example of their use.

9.33 Explain static cursors. Give an example of their use.

Part 4 Multiuser Database Processing

9.34 Explain keyset cursors. Give an example of their use.

9.35 Explain dynamic cursors. Give an example of their use.

9.36 What happens if you do not declare the transaction isolation level and the cursor type to the DBMS? Is this good or bad?

9.37 Explain the necessity of defining processing rights and responsibilities. How are such responsibilities enforced, and what is the role of SQL DCL in enforcing them?

9.38 Explain the relationships among USER, ROLE, PERMISSION, and OBJECT for a generic database security system.

9.39 Should the DBA assume a firewall when planning security?

9.40 What should be done with unused DBMS features and functions?

9.41 Explain how to protect the computer that runs the DBMS.

9.42 With regard to security, what actions should the DBA take on user accounts and passwords?

9.43 List two elements of a database security plan.

9.44 Describe the advantages and disadvantages of DBMS-provided and application-provided security.

9.45 What is an SQL injection attack and how can it be prevented?

9.46 Explain how a database could be recovered via reprocessing. Why is this generally not feasible?

9.47 Define rollback and rollforward.

9.48 Why is it important to write to the log before changing the database values?

9.49 Describe the rollback process. Under what conditions should it be used?

9.50 Describe the rollforward process. Under what conditions should it be used?

9.51 What is the advantage of taking frequent checkpoints of a database?

9.52 Summarize the DBA’s responsibilities for managing the DBMS.

9.53 What is a data repository? A passive data repository? An active data repository?

9.54 Explain why a data repository is important. What is likely to happen if one is not available?

9.55 Define distributed database.

9.56 Explain one way to partition a database that has three tables: T1, T2, and T3.

9.57 Explain one way to replicate a database that has three tables: T1, T2, and T3.

9.58 Explain what must be done when fully replicating a database but allowing only one computer to process updates.

9.59 If more than one computer can update a replicated database, what three problems can occur?

9.60 What solution is used to prevent the problems in Review Question 9.59?

9.61 Explain what problems can occur in a distributed database that is partitioned but not replicated.

9.62 What organizations should consider using a distributed database?

9.63 Explain the meaning of the term object persistence.

9.64 In general terms, explain why relational databases are difficult to use for object persistence.

9.65 What does OODBMS stand for, and what is its purpose?

9.66 According to this chapter, why were OODBMSs not successful?

9.67 What is an object-relational database?

Chapter 9 Managing Multiuser Databases

9.68 Visit www.oracle.com and search for “Oracle Security Guidelines.” Read articles at three of the links that you find and summarize them. How does the information you find compare with that in Figure 9-15?

9.69 Visit www.msdn.microsoft.com and search for “SQL Server Security Guidelines.” Read articles at three of the links that you find and summarize them. How does the information you find compare with that in Figure 9-15?

9.70 Visit www.mysql.com and search for “MySQL Security Guidelines.” Read articles at three of the links that you find and summarize them. How does the information you find compare with that in Figure 9-15?

9.71 Use Google (www.google.com) or another search engine and search the Web for “Database Security Guidelines.” Read articles at three of the links that you find and summarize them. How does the information you find compare with that in Figure 9-15?

9.72 Search the Web for “distributed two-phase locking.” Find a tutorial on that topic and explain, in general terms, how this locking algorithm works.

9.73 Answer the following questions for the View Ridge Gallery database discussed in Chapter 7 with the tables shown in Figure 7-15.

A. Suppose that you are developing a stored procedure to record an artist who has never been in the gallery before, a work for that artist, and a row in the TRANS table to record the date acquired and the acquisition price. How will you declare the boundaries of the transaction? What transaction isolation level will you use?

B. Suppose that you are writing a stored procedure to change values in the CUSTOMER table. What transaction isolation level will you use?

C. Suppose that you are writing a stored procedure to record a customer’s purchase. Assume that the customer’s data are new. How will you declare the boundaries of the transaction? What isolation level will you use?

D. Suppose that you are writing a stored procedure to check the validity of the intersection table. Specifically, for each customer, your procedure should read the customer’s transaction and determine the artist of that work. Given the artist, your procedure should then check to ensure that an interest has been declared for that artist in the intersection table. If there is no such intersection row, your procedure should create one. How will you set the boundaries of your transaction? What isolation level will you use? What cursor types (if any) will you use?

Assume that Marcia has hired you as a database consultant to develop an operational database having the following four tables (the same tables described at the end of Chapter 7):

CUSTOMER (CustomerID, FirstName, LastName, Phone, Email) INVOICE (InvoiceNumber, CustomerID, DateIn, DateOut, Subtotal, Tax, TotalAmount) INVOICE_ITEM (InvoiceNumber, ItemNumber, ServiceID, Quantity, UnitPrice,

ExtendedPrice) SERVICE (ServiceID, ServiceDescription, UnitPrice)

A. Assume that Marcia’s has the following personnel: two owners, a shift manager, a part-time seamstress, and two salesclerks. Prepare a two- to three-page memo that addresses the following points:

1. The need for database administration.

2. Your recommendation as to who should serve as database administrator. Assume that Marcia’s is not sufficiently large to need or afford a full-time database administrator.

3. Using Figure 9-1 as a guide, describe the nature of database administration activities at Marcia’s. As an aggressive consultant, keep in mind that you can recommend yourself for performing some of the DBA functions.

Part 4 Multiuser Database Processing

B. For the employees described in part A, define users, groups, and permissions on data in these four tables. Use the security scheme shown in Figure 9-13 as an example. Create a table like that in Figure 9-12. Don’t forget to include yourself.

C. Suppose that you are writing a stored procedure to create new records in SERVICE for new services that Marcia’s will perform. Suppose that you know that while your procedure is running another stored procedure that records new or modifies existing customer orders and order line items can also be running. Additionally, suppose that a third stored procedure that records new customer data also can be running.

1. Give an example of a dirty read, a nonrepeatable read, and a phantom read among this group of stored procedures.

2. What concurrency control measures are appropriate for the stored procedure that you are creating?

3. What concurrency control measures are appropriate for the two other stored procedures?

Assume that Morgan has hired you as a database consultant to develop an operational database having the same tables described at the end of Chapter 7 (note that STORE uses the surrogate key StoreID):

STORE (StoreID, StoreName, City, Country, Phone, Fax, Email, Contact) PURCHASE_ITEM (PurchaseItemID, StoreID, PurchaseDate, ItemDescription, Category,

PriceUSD) SHIPMENT (ShipmentID, ShipperID, ShipperInvoiceNumber, Origin, Destination,

DepartureDate, Arrival Date) SHIPMENT_ITEM (ShipmentID, ShipmentItemID, PurchaseItemID, InsuredValue) SHIPPER (ShipperID, ShipperName, Phone, Fax, Email, Contact)

A. Assume that Morgan personnel are the owner (Morgan), an office administrator, one full-time salesperson, and two part-time salespeople. Morgan and the office administrator want to process data in all tables. Additionally, the full-time salesperson can enter purchase and shipment data. The part-time employees can only read shipment data; they are not allowed to see InsuredValue, however. Prepare a three- to five-page memo for the owner that addresses the following issues:

1. The need for database administration at Morgan.

2. Your recommendation as to who should serve as database administrator. Assume that Morgan is not sufficiently large that it needs or can afford a full-time database administrator.

3. Using Figure 9-1 as a guide, describe the nature of database administration activities at Morgan. As an aggressive consultant, keep in mind that you can recommend yourself for performing some of the DBA functions.

B. For the employees described in part A, define users, groups, and permissions on data in these five tables. Use the security scheme shown in Figure 9-13 as an example. Create a table like that in Figure 9-12. Don’t forget to include yourself.

C. Suppose that you are writing a stored procedure to record new purchases. Suppose that you know that while your procedure is running, another stored procedure that records shipment data can be running, and a third stored procedure that updates shipper data can also be running.

1. Give an example of a dirty read, a nonrepeatable read, and a phantom read among this group of stored procedures.

2. What concurrency control measures are appropriate for the stored procedure that you are creating?

3. What concurrency control measures are appropriate for the two other stored procedures?

Managing Databases with SQL Server 2008 R2

Chapter Objectives

• To install SQL Server 2008 R2 • To understand the purpose and role of triggers and to • To use SQL Server 2008 R2’s graphical utilities

create simple stored procedures

• To create a database in SQL Server 2008 R2 • To understand how SQL Server implements • To submit both SQL DDL and DML via the Microsoft

concurrency control

• To understand the fundamental features of SQL Server • To understand the purpose and role of stored

SQL Server Management Studio

backup and recovery facilities

procedures and to create simple stored procedures

This chapter describes the basic features and functions of Microsoft SQL Server 2008 R2. The discussion uses the View Ridge Gallery database from Chapter 7, and it parallels the discussion of the database administration tasks in Chapter 9. The presentation is similar in scope and orientation to that of Oracle Database 11g in Chapter 10A and Oracle MySQL 5.5 in Chapter 10B.

SQL Server 2008 R2 is a large and complicated product. In this one chapter, we will only be able to scratch the surface. Your goal should be to learn sufficient basics so that you can continue learning on your own or in other classes.

Part 4 Multiuser Database Processing

The topics and techniques discussed here also apply to SQL Server 2008 R2 and to the earlier SQL Server 2005, although the exact functions of the SQL Server 2005 Management Studio vary a bit from the SQL Server 2008 and SQL Server 2008 R2 versions. The material you learn in this chapter will be applicable to these older versions.

Installing SQL Server 2008 R2

Microsoft SQL Server is an enterprise-class DBMS that has been around for many years. In 2005, SQL Server 2005 was released, followed by SQL Server 2008 R2 in 2008 and SQL Server 2008 R2 in 2010. As this book goes to press, Microsoft is poised to release the next version of SQL Server—SQL Server 2011. SQL Server 2008 R2 is available in several versions, two of which—SQL Server 2008 R2 Datacenter and SQL Server 2008 R2 Parallel Data Warehouse—are new vesions introduced as part of SQL Server 2008 R2. The full set can be reviewed at the Microsoft SQL Server 2008 R2 Web site (www.microsoft.com/sqlserver/en/us/product-info/ compare.aspx ). For our purposes, there are four editions you need to be aware of:

Types of Distributed Databases