PPT UEU Basis Data Pertemuan 14
Physical DB Issues, Indexes, Query Optimisation Database Systems Lecture 14 Munawar
In This Lecture
- Physical DB Issues
- RAID arrays for recovery and speed
- Indexes and query effici>Query optimisation
- Query t>For more information
- Connolly and Begg chapter 21 and
Physical Design
- Design so far • Physical de>E/R modelling helps • Concerned with find the requirements storing and accessing of a database the data
- Normalisation helps to • How to deal with refine a design by media failures removing
- How to access redundancy information efficiently
RAID Arrays
- RAID - redundant • RAID techniques
array of independent • Mirroring - multiple
copies of a file are
(inexpensive) disks
stored on separate
- Storing information disks across more than one
- Striping - parts of a physical disk file are stored on >Speed - can access disk more than one disk
- Different levels (
- Robustness - if one
0, RAID 1…)
RAID Level 0
- Files are split across several disks
Data
- For a system with n disks, each file is split into n parts, one part stored on each disk
Data1 Data2 Data3
- Improves speed, but no redundancy
RAID Level 1
- As RAID 0 but with redundancy
Data
- Files are split over multiple disks
- Each disk is mirrored
Data1 Data2
- For n disks, split files into n/2 parts, each stored on 2 disks
- Improves speed, has
Disk 1 Disk 2 Disk 3 Disk 4 Parity Checking
- We can use parity checking to reduce the number of disks
- Parity - for a set of data in binary form we count the number of 1s for each bit across the data
- If this is even the
- If one of our pieces of data is lost we can recover it
- Just compute it as the parity of the remaining data and our original parity information
- Data is striped over disks, and a parity disk for redundancy
- For n disks, we split the data in n-1 parts
- Each part is stored on a disk
- The final disk stores parity information
- Other RAID levels cons>Considerations with RAID sys>How to split data between disks
- Whether to store parity information on one disk, or spread across several
- How to deal >Cost of disks
- Do you need speed or redundancy?
- How reliable are the individual disks?
- ‘Hot swapping’
- Is the disk the weak
- Indexes are to do • Types of indexes with ordering data • Primary or clustered
- The relational model order that the data is says that order stored in a file doesn’t matter
- Secondary ind>From a practical point give a look-up table of view it is very into the file important
- Only one primary index, but many
- A telephone book • Ind>You store people’s • A clustered index can addresses and phone be made on name numbers
- A secondary index>Usually you have a be made on number name and want the number
- Sometimes you have a number and want the name
- You can only have one primary i>Don’t create too many ind>The most frequently looked-up value is often the best choice
- Some DBMSs assume the primary key is the primary index, as it is usually used to refer to >They can speed up queries, but they slow down inserts, updates and deletes
- Whenever the data is changed, the index may need to change
- A product database, which we want to search by keyword
- Each product can have many keywords
- The same keyword can be associated with many products prodID prodName prodID keyID keyID keyWord
- To search the products given a keyWord value
- In SQL we use • Example:
- Once a database is • Three main stages designed and made • Parsing and
- A query language internal form
- Optimisation - to do this
- The query goes efficiency through several
- Evaluation - the stages to be executed optimised query is
- SQL is a good • Given an SQL
language for people statement we want
- It is quite high level to find an equivalent
- It is non-procedural relational algebra
- Relational algebra is
- This expression may better for machines be represented
- It can be reasoned about more easily
- Produ>Selecti
- Product finds all the combinations of one tuple from each of two relations
- R1 R2 is equivalent to
- Selection finds all those rows where some condition is true
- Projection, selection
- Projecti
- Projection chooses a and product are set of attributes from
- Relational Alg
- SQL statement
- Take the product of
- select tuples where the IDs are the same and the Code is DBS
- project over
- There are often many ways to express the same query
- Some of these will be more efficient than others
- Need to find a >Many ways to optimise que>Changing the query tree to an equivalent but more efficient one
- Choosing efficient implementations of each operator
- Exploiting database
- In our query tree before we have the s>This is equivalen
- selecting those
- Take the product of
- Then taking the product of the result of the selection operator with Student
- Then select those entries where the Enrolment.Code
• To see the benefit of • From these statistics
this, consider the we can compute the following statistics sizes of the relat- Nottingham has produced by each around 18,000 full
- Each student is enrolled in at about 10 modules
• The original query • There is much more
tree produces an to optimisation intermediate result • In the example, the- The optimised
- If we have an index on Student.ID we can find a student from their ID with a binary search
- For 18,000 students, this will take at most 15 operat
- For each Enrolment entry with Code ‘DBS’ we find the corresponding Student from the ID
- 200 x 15 = 3,000 operations to do
- Transact>ACID properties
- The transaction man>Reco>System and Media Fail>Concurr>Concurrency prob
- For more information
1 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 0 1 0 1 0 0 1 0 1 1 0 1 1 1 0 0 1 0 0 0 1 1 1
Recovery With Parity
1 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 1 1 0 1 1 1 0 0 1 0 0 0 1 1 1
RAID Level 3
Data
Disk 1 Disk 2 Disk 3
Data1 Data2 Disk 4
Data3 Parity
Other RAID Issues
Indexes
indexes affect the
Index Example
Index Example
As a Table As a File Secondary Index
Jane, 9258501 8751209 Name Number
John, 9251229 9251229 John 925 1229
Mark, 8751209 9258501 Mary 925 8923 Jane 925 8501
Mary, 9258923 9258923 Mark 875 1209
Sometimes we Most of the time we
Order does not look up names look up numbers by
Choosing Indexes
Products WordLink Keywords
Index Example
prodID prodName
1. We look up the keyWord in Keywords to find its keyID
2. We look up that keyID in prodID keyID WordLink to find the related prodIDs
3. We look up those prodIDs in Products to find more keyID keyWord
Creating Indexes
CREATE INDEX: CREATE INDEX keyIndex ON Keywords (keyWord) CREATE INDEX CREATE INDEX linkIndex <index name> ON WordLink(keyID) ON <table> CREATE INDEX prodIndex ON Products (prodID) (<columns>)
Query Processing
translation - the
we can query it
query is put into an
(such as SQL) is used
changes are made for
Parsing and Translation
expression
tree - the query tree
SELECT DISTINCT * FROM R1, R2
Some Relational Operators
cond
R is equivalent to SELECT DISTINCT * FROM R WHERE <cond>
Some Relational Operators
enough to express
a relation, removing
queries of the form
any others
SELECT <cols> R is
A1,A2,…
FROM <table> equivalent to
WHERE <cond> SELECT DISTINCT A1, A2, ...
SQL Relational Algebra
SELECT Student.Name FROM Student, Enrolment WHERE Student.ID = Enrolment.ID AND
Student and Enrolment
Student.Name Query Tree
Student.Name
Student.ID = Enrolment.ID
Enrolment.Code = ‘DBS’
Optimisation
Optimisation Example
Student and Enrolment
Enrolment entries with Code = ‘DBS’
Optimised Query Tree
Student.Name
Student.ID = Enrolment.ID
Optimisation Example
operator in our query
time students
trees
Original Query Tree
Student.Name
Student.ID = Enrolment.ID
Enrolment.Code = ‘DBS’ 3,240,000,000 3,600,000
200 200 Optimised Query Tree
Student.Name
Student.ID = Enrolment.ID
18,000200 3,600,000 200 200
Optimisation Example
product and the
with 3,240,000,000
second selection can
entries
be combined and
implemented efficiently to avoid
version at worst has
generating all
3,600,000
Student-Enrolment
both the product and Next Lecture
Optimisation Example