Structured Concurrency Control in Object Oriented Databases pdf pdf
STRUCTURED CONCURRENCY CONTROL
IN OBJECT ORIENTED DATABASES
Francisco Mariátegui
STRUCTURED CONCURRENCY CONTROL
IN OBJECT ORIENTED DATABASES
A Dissertation Presented to the Graduate Faculty of The School of Engineering and Applied Science of
Southern Methodist University in Partial Fulfillment of the Requirements for the Degree of
Doctor of Philosophy with a Major in Computer Science
By Francisco José Mariátegui
B.Sc., Honors, Naval Academy of Peru, 1974
Systems Engineering Specialization Degree, Honors, University of Lima, 1977
M.Sc. Computer Science, U.S.A. Naval Postgraduate School, 1979 M.Sc. Computer Systems Management U.S.A. Naval Postgraduate
School, 1979
May 13, 1989
COPYRIGHT @ 1989
Francisco J. Mariategui
All Rights Reserved
Mariategui, Francisco J.
B.Sc. Naval Sciences, Naval Academy of Peru, 1974 System Engineering, University of Lima, 1977 M.Sc. Computer Science, U.S. Naval Postgraduate School, 1979
M.Sc. Computer Systems Management, U.S. Naval Postgraduate School, 1979
STRUCTURED CONCURRENCY CONTROL IN OBJECT ORIENTED DATABASES Advisor: Dr. Margaret H. Eich Doctor of Philosophy degree conferred August 12, 1989 Dissertation completed May 13, 1989
In the last few years a number of object-‐oriented database systems have appeared in the literature, most of which addresses specific areas such as office information systems (OIS), computer aided design (CAD), computer aided manufacturing (CAM), software engineering (SE), and artificial intelligence (AI). Unfortunately, hardly any one of them addresses the problem of concurrency control from the general-‐ purpose database point of view. Due to the extreme differences in types of transactions supported by these environments, the need for combining different concurrency control approaches has been recognized but never thoroughly investigated. A high level design of a Multi-‐Group Multi-‐Layer approach to concurrency control for object-‐oriented message-‐passing based databases is presented. The design follows a formal definition of transaction. The concurrency control takes advantage of the structured nature of transactions to manage an on-‐line serializer. The serializer is specified as a set of filters. These filters are specifications of algorithms that ensure serializable histories. The concurrency control manages these histories by layers. Each layer, along with its corresponding filters, constitutes a different level of abstraction in concurrency control processing. Mutually exclusive groups of transactions being processed in parallel are assumed. The availability of a processor per group is also assumed. The performance is improved when this case of large granularity and limited interaction is applied. The decomposition of the histories into layers allows the problem to be more manageable, the principles of hierarchical design to be applied, and the benefits of hierarchical thought to be utilized. Summarizing, this research has led to the following results:
1) First cut definition of an Object-‐Oriented Data Model (OODM) which encompasses data structures, operations, and integrity constraints.
2) Transaction processing model for the OODM environment, which facilitates not only definition of transactions but also, allows investigation of concurrency control.
the OODM and transaction models that allow the use of several different concurrency control techniques in parallel in the same environment.
TABLE OF CONTENTS
TABLE OF CONTENTSB ACKGROUND
ISSERTATION
28
2.1
I NTRODUCTION
30
2.2 O BJECT -‐O RIENTED D ATABASES : A N O
VERVIEW
32
2.2.1
33 2.2.2 D EFINITION OF T ERMS 35
1.6 G ENERAL O
2.2.3
D EFINITION OF
P ROPERTIES OF
OODB S 38
2.3 D ATA M ODELS
41
2.4 A N O BJECT -‐O RIENTED D ATA M ODEL
44
2.4.1
D ATA
VERVIEW OF THE CCMM 24 1.7 O UTLINE OF THE D
I NTERFACE 23
8
1.4 S
LIST OF FIGURES
13
ACKNOWLEDGEMENTS
15
CHAPTER 1 -‐ INTRODUCTION
16
1.1 T HE P ROBLEM 16 1.2 T HE A PPROACH
18
1.3 C ONTRIBUTION
19
IGNIFICANCE
1.5.4
20
1.5 T HE C ONCURRENCY C ONTROL M ANAGER
21
1.5.1
P URPOSE
21
1.5.2
C ONCEPTS AND
M EANS 22 1.5.3 B ENEFITS
22
S TRUCTURE 45
2.4.2
4.3 S ERIALIZABILITY 107
87
3.6 T HE C OMPLETE S TRUCTURE OF A T RANSACTION 93 3.5 S UMMARY
102
CHAPTER 4 -‐ THEORY OF EXECUTION AND SERIALIZABILITY 104
4.1
I NTRODUCTION 104
4.2 P RELIMINARIES 105
4.4 S ERIALIZABILITY AND THE P AIR <GM, DM> 118
XPANDING THE M ODEL TO
4.5
I NFORMAL C OMPRESSION OF T RANSACTION T REES 119
4.6 F ROM T RANSACTION T REES TO T RANSACTION H
ISTORIES 128
4.7 S UMMARY 130
CHAPTER 5 -‐ MULTI-‐LAYER CONCURRENCY: RATIONALE 132
5.1
I NTRODUCTION 132
I NCLUDE L EAVES
3.5 E
O PERATORS
2.6 S UMMARY
57 2.4.3
I NTEGRITY R ULES 62
2.4.4
S UMMARY
63
2.5 M ODELING A BILITY OF THE OODM
65
67
79
CHAPTER 3 -‐ UNIT OF CONSISTENCY
69
3.1
I NTRODUCTION
69
3.2 P RELIMINARIES
70
3.4 T HE P AIR < GM, DM > AS A M ODELING T OOL
5.2 T HE N EED FOR C HANGE 133
5.3 R EPRESENTATIVE C ONCURRENCY C ONTROL T ECHNIQUES 137
5.4 S TRUCTURED C ONCURRENCY C ONTROL 142
6.9.2.1 Filter F1rsws 183
6.9.2.2 Filter F1ccg 184
6.8 N OTATION 173
6.9 F
ILTERS 175
6.9.1 C OMPONENT H
ISTORIES F
ILTERS 177
6.9.1.1 Filter F0rsws 179
6.9.1.2 Filter F0ccg 181
6.9.2 T RANSACTION HISTORIES F
ILTERS 182
6.9.3
ISTORIES 164
F OREST
H
ISTORIES F
ILTERS 185
6.9.3.1 Filter F2rsws 186
6.9.3.2 Filter F2tcg 187
6.9.4
G ROUP
H
ISTORY F
6.7 E LEVATOR F UNCTIONS 165
IERARCHY OF H
5.5 T HE L AYERED A PPROACH 144
5.6 E
XPANDING THE T HEORY OF E
XECUTION : G ROUP H
ISTORY 145
5.7 S UMMARY 147
CHAPTER 6 -‐ MULTI-‐LAYER CONCURRENCY ARCHITECTURE 149
6.1
I NTRODUCTION 149
6.2 B ACKGROUND 150
6.3 A CTIVE H
ISTORIES 160 6.6 H
ISTORIES 151
6.4 P REFIXES 155
6.5 C ONTENTS OF P REFIXES OF A CTIVE H
ISTORIES 158
6.5.1 N OTATION FOR P REFIXES 158
6.5.2
M EANING OF THE
I NDEXES OF
H
ISTORIES 159 6.5.3 C ONTENTS OF H
ILTERS 189
6.9.4.1 Filter F3m 192
6.9.4.2 Filter F3gcg 194
6.9.4.3 Filter F3oeg 195
6.9.4.4 Filter F3rsws 196
6.10 D ELETING T RANSACTIONS FROM H7.2 C ONCURRENCY C ONTROL T
7.10 S UMMARY 247
S OLUTIONS 244
S UITABLE
7.9.2
7.9.1 P OTENTIAL P ROBLEM 242
IME A NALYSIS S UMMARY 238
7.9 S PACE A NALYSIS 242
7.8 T
7.7 L AYER P ARTITION 229
7.6 G ROUP P ARTITION 223
7.5 G ENERAL B EHAVIOR 222
7.4 C RITERIA AND A SSUMPTIONS 218
7.3 N OTATION 217
IME C OMPLEXITY 214
I NTRODUCTION 213
ISTORIES 196
7.1
CHAPTER 7 -‐ TIME AND SPACE ANALYSIS 213
211
6.11.1 C YCLE A LGORITHM 205
6.12 S UMMARYIERARCHY C YCLE 205
ISTORY H
6.10.2.3 Filter F1del 204
6.10.2.4 Filter F0del 204
6.11 H
6.10.2.2 Filter F2del 203
ILTERS 202
6.10.2.1 Filter F3del 203
F
D ELETION
6.10.2
B ACKGROUND 197
6.10.1
CHAPTER 8 -‐ CONCLUSIONS AND FURTHER RESEARCH 249
8.1 C ONCLUSIONS 249
UMMARY OF CCOMPLISHMENTS
8.1.1 S A 249
8.1.2 R ESULTS BY S TAGES 250
8.2 S UGGESTIONS FOR F URTHER W ORK 255
IMULATION
8.2.1 S 255
OMMITTED BUT OT ELETED RANSACTIONS
8.2.2 C N D T 257
IBRARIES OF YPED BJECTS
8.2.3 L T O 259
8.2.4 C ONFLICT P REDICATES 260
ARLY
VALUATION OF NTER ROUP ONFLICTS
8.2.5 E E I -‐G C 261 8.2.6
I NCREASE THE N UMBER OF P ROCESSORS 262
IPELINE THE YCLE LGORITHM
8.2.7 P C A 263
8.2.8 R OUTERI SSUES 264 REFERENCES
267
LIST OF FIGURES
1-‐1 Concurrency Control Manager Module 1-‐2 2-‐1
Expansion of CCMM of Figure 1-‐1 Object Classes
2-‐2 Complex Objects 2-‐3 Simple Objects and Composition 2-‐4 Messages 2-‐5 Class Hierarchy for Object Graph of Figures 2-‐1 to 2-‐4 3-‐1 Traditional Transaction 3-‐2 A Possible Instance of the Pair <GM,DM> 3-‐3 Transaction with No Nested Messages Calls 3-‐4 The Pair <GM,DM> or a Transaction Tree 3-‐5 Transaction Tree with Potential Data Base Accesses 3-‐6 Complete Pair <GM, DM> or a Transaction Tree 3-‐7 Two Transaction Trees & One Transaction 4-‐1 Transaction Tree Before Compression 4-‐2 Transaction Tree Compressed 1 Level (Level n to Level n-‐1) 4-‐3 Transaction Tree Compressed 2 Levels (Level n to Level n-‐2) 4-‐4 Transaction Tree Compressed 3 Levels (Level n to Level n-‐3) 4-‐5 Transaction Tree Fully Compressed 6-‐1 Multi Group Approach to Concurrency 6-‐2 Window or Active History 6-‐3 Relationship Among Histories 6-‐4 Hierarchy of Histories 6-‐5 Upward and Downward Motion of Elevator Functions 6-‐6 Backward Edge Between Components 1 and 2 6-‐7 Predecessors and Immediate Predecessors of Transactions 6-‐8 Removing a Completed Transaction
6-‐9 Tight Predecessors 6-‐10 Necessary and Sufficient Condition to Remove a Transaction 6-‐11 Steps of the History Hierarchy Cycle Algorithm 7-‐1 Work Done with Traditional Approach 7-‐2 Work Done with Partitioning & Parallel Execution
ACKNOWLEDGEMENTS
This dissertation is dedicated to my parents, Carmela and Francisco Mariategui, as a small tribute of my admiration and love. Special and most sincere thanks to my advisor, Maggie Eich, for things too numerous to list here. Also, I thank Dennis Frailey, Milan Milenkovic, Marion Sobol, and David Yun, for their careful reading of the dissertation, and their helpful comments. I gratefully acknowledge the Fulbright Commission of Peru, the National Science Foundation, and the Texas Advanced Research Program for their generous support.
CHAPTER 1 -‐ INTRODUCTION 1.1 The Problem
In the last few years, a number of self-‐named object-‐oriented database systems have appeared in the literature, most of which addresses specific areas such as office information systems (OIS), computer aided design (CAD), computer aided manufacturing (CAM), software engineering (SE), and artificial intelligence (AI). Unfortunately hardly any one of them addresses the problem of concurrency control from the general-‐purpose database point of view. These specialized databases are not general database management systems (DBMS) in the sense that they are just applications; they are specific applications with their own file system. One of the reasons for building these specialized databases for specific applications is the difficulty of undertaking a major effort in attacking the hard problems such as concurrency control in a general framework. The concurrency control must provide for multiple access to the database to multiple users while guaranteeing database consistency at all times (as seen by the users). To preserve consistency, a transaction must see the values of all the objects either before or after other transactions have updated them. This work is aimed at an encompassing solution to concurrency control for databases in general, and object-‐oriented databases in particular. An approach to a solution can be accomplished by focusing our efforts in Structured Concurrency Control, which provides flexibility and adaptability. This methodology allows object-‐oriented databases to accomplish an efficient concurrency level with a tolerable amount of overhead, even in the presence of a variety of transactions, each one with its own requirements (short lived, long lived, etc.). Such a methodology will be developed in the framework of object-‐oriented databases, which, in theory, are capable of handling a variety of environments. As of today, most of the commercial database systems, and even most of the research prototypes, choose a specific concurrency control method in the design phase of the DBMS. It is at this point where this approach differs from the conventional ones. It does not pick one concurrency control method only, it allows more than one from the start: each one at the appropriate occasion and at the appropriate time. Although this work is not concerned with specifying when each
different technique should be chosen, it does show how to combine them in the framework of an original design.1.2 The Approach
The concurrency controller is a key module within any DBMS, it encompasses most of the activities of the other modules in a DBMS in the sense that they must "obtain permission to continue" in order to perform their own tasks. To be able to cope with the new demands of the newer applications (OIS, CAD, CAM, SE, Al, etc.), the concurrency controller should no longer be "single-‐minded" (e.g., one concurrency control technique only). The different types of applications impose different demands on the DBMS, and thus affect the concurrency control. What is needed is the use of concurrency control techniques designed specifically for active transactions and database accesses in an object-‐ oriented database system. This fact does not necessarily imply the development of new techniques, but facilitates selection and control of the correct method for each transaction based upon what the transaction is doing and the database state. This type of flexible and adaptable concurrency controller should be capable of reducing the overhead by, in some cases eliminating all concurrency control activity, and in others perhaps combining the use of several techniques together. Different transactions may use different techniques. It is also possible that different executions of the same transaction may use different techniques. This approach must be able to ensure correctness across all the different techniques being used. In order to accomplish success in this endeavor, the new flexible concurrency controller must be able to keep track of the states of the database as indicated by the type of transactions active at any point in time.
1.3 Contribution
This research has led to the following results: 1) First cut definition of an Object-‐Oriented Data Model (OODM) which encompasses data structures, operations, and integrity constraints.
2) Transaction Processing model for the OODM environment which facilitates not only definition of transactions but also allows investigation of concurrency control.
3) Group Concurrency Control technique built on the OODM and transaction-‐processing model that allows the use of several different concurrency control techniques in parallel in the same environment.
The first two results are considered as supporting result number three. A special section is included in this introductory chapter to introduce the latter.
1.4 Significance
Due to the extreme differences in types of transactions to be executed in an object-‐oriented database (long lived and short lived), the need for combining different (concurrency control) approaches has been recognized but never totally investigated. Not only does this group approach facilitate the ability to combine techniques, it also allows parallelism of various pieces of the concurrency control process. It has the potential to drastically reduce the overhead of concurrency control processing. This is of general interest to general purpose object-‐oriented databases and of particular interest to heterogeneous systems, when there is the need to integrate differently designed systems, real time systems and main memory databases, where the relative impact of concurrency control overhead can be quite large. The OODM definition provides the framework for much future research and the potential for definition of a universally accepted data model applicable to object-‐oriented databases.
1.5 The Concurrency Control Manager
The goal in this dissertation is to describe and define an effective and flexible mechanism to control concurrency in object-‐oriented databases. In order to achieve this objective the theory has been created, the rationale has been discussed, the architecture has been specified, and the costs involved in a Concurrency Control Manager
Module (CCMM) have been analyzed. The models, algorithms, and
specifications used to this effect are the result of original research as well as adaptations of state of the art technology. The resulting CCMM is an algorithmic specification of the proposed approach that could be implemented in hardware (the hardware could take the form of a Concurrency Control Board). This dissertation is concerned with the presentation of the underlying technology to make the software CCMM possible.
1.5.1 Purpose
The purposes of the CCMM module are as follows:
Reduce the overhead attributable to the concurrency controller.
- Improve throughput (i.e., number of transactions per unit of
- time).
Provide multiple concurrency control technique capability in
- parallel. Contribute to the ongoing research in Concurrency Control
- Management.
1.5.2 Concepts and Means
The concepts and means used to specify the CCMM are as follows:
- Conflict-‐graph based serializability. Current concurrency control techniques.
- Model of transactions in OODBs.
- Multi-‐layer approach to the treatment of histories.
- Parallel processing technology.
- 1.5.3 Benefits
Summarizing, the potential benefits of the CCMM are as follows:
Speed: throughput.
Flexibility: several concurrency control techniques.
Modularity: different environments may use different techniques.
- Research Tool: for high speed transaction processing. &n
1.5.4 Interface
The CCMM interfaces with transaction managers and data managers as shown in Figure 1-‐1. Transaction managers send requests to the CCMM, such as BEGIN, END, COMMIT, ABORT, LOCK, and UNLOCK. The CCMM informs the transaction manager about the state of execution of transactions. The CCMM sends requests to the data manager to perform database accesses on its behalf. This document is not concerned with the details of the protocols used to achieve proper interface among these modules. It is (mainly) concerned with the internal workings of the CCMM.
1.6 General Overview of the CCMM
In order to provide an initial insight into the approach (to be developed in detail in the chapters to come), the core constituents of the scheme are presented in this section. To be in consonance with Confucius saying "A picture is worth a thousand words", Figure 1-‐2 depicts the structure of the CCMM. Figure 1-‐2 could be interpreted as a more detailed representation of the dotted box part of Figure 1-‐1.