PPT UEU Basis Data Pertemuan 14

Physical DB Issues, Indexes, Query Optimisation Database Systems Lecture 14 Munawar

In This Lecture

Physical DB Issues
RAID arrays for recovery and speed
Indexes and query effici>Query optimisation
Query t>For more information
Connolly and Begg chapter 21 and

Physical Design

Design so far • Physical de>E/R modelling helps • Concerned with find the requirements storing and accessing of a database the data
Normalisation helps to • How to deal with refine a design by media failures removing
How to access redundancy information efficiently

RAID Arrays

RAID - redundant • RAID techniques
array of independent • Mirroring - multiple

copies of a file are

(inexpensive) disks

stored on separate

Storing information disks across more than one
Striping - parts of a physical disk file are stored on >Speed - can access disk more than one disk
Different levels (
Robustness - if one

0, RAID 1…)

RAID Level 0

Files are split across several disks

Data

For a system with n disks, each file is split into n parts, one part stored on each disk

Data1 Data2 Data3

Improves speed, but no redundancy

RAID Level 1

As RAID 0 but with redundancy

Data

Files are split over multiple disks
Each disk is mirrored

Data1 Data2

For n disks, split files into n/2 parts, each stored on 2 disks
Improves speed, has

Disk 1 Disk 2 Disk 3 Disk 4 Parity Checking

We can use parity checking to reduce the number of disks

1 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 0 1 0 1 0 0 1 0 1 1 0 1 1 1 0 0 1 0 0 0 1 1 1

Parity - for a set of data in binary form we count the number of 1s for each bit across the data
If this is even the

Recovery With Parity

If one of our pieces of data is lost we can recover it

1 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 1 1 0 1 1 1 0 0 1 0 0 0 1 1 1

Just compute it as the parity of the remaining data and our original parity information

RAID Level 3

Data is striped over disks, and a parity disk for redundancy

Data

For n disks, we split the data in n-1 parts
Each part is stored on a disk
The final disk stores parity information

Disk 1 Disk 2 Disk 3

Data1 Data2 Disk 4

Data3 Parity

Other RAID Issues

Other RAID levels cons>Considerations with RAID sys>How to split data between disks
Whether to store parity information on one disk, or spread across several
How to deal >Cost of disks
Do you need speed or redundancy?
How reliable are the individual disks?
‘Hot swapping’
Is the disk the weak

Indexes

Indexes are to do • Types of indexes with ordering data • Primary or clustered

indexes affect the

The relational model order that the data is says that order stored in a file doesn’t matter
Secondary ind>From a practical point give a look-up table of view it is very into the file important
Only one primary index, but many

Index Example

A telephone book • Ind>You store people’s • A clustered index can addresses and phone be made on name numbers
A secondary index>Usually you have a be made on number name and want the number
Sometimes you have a number and want the name

Index Example

As a Table As a File Secondary Index

Jane, 9258501 8751209 Name Number

John, 9251229 9251229 John 925 1229

Mark, 8751209 9258501 Mary 925 8923 Jane 925 8501

Mary, 9258923 9258923 Mark 875 1209

Sometimes we Most of the time we

Order does not look up names look up numbers by

Choosing Indexes

You can only have one primary i>Don’t create too many ind>The most frequently looked-up value is often the best choice
Some DBMSs assume the primary key is the primary index, as it is usually used to refer to >They can speed up queries, but they slow down inserts, updates and deletes
Whenever the data is changed, the index may need to change

A product database, which we want to search by keyword

Products WordLink Keywords

Each product can have many keywords
The same keyword can be associated with many products prodID prodName prodID keyID keyID keyWord

Index Example

To search the products given a keyWord value

prodID prodName

1. We look up the keyWord in Keywords to find its keyID

2. We look up that keyID in prodID keyID WordLink to find the related prodIDs

3. We look up those prodIDs in Products to find more keyID keyWord

Creating Indexes

In SQL we use • Example:

CREATE INDEX: CREATE INDEX keyIndex ON Keywords (keyWord) CREATE INDEX CREATE INDEX linkIndex <index name> ON WordLink(keyID) ON <table> CREATE INDEX prodIndex ON Products (prodID) (<columns>)

Query Processing

Once a database is • Three main stages designed and made • Parsing and

translation - the

we can query it

query is put into an

A query language internal form

(such as SQL) is used

Optimisation - to do this

changes are made for

The query goes efficiency through several
Evaluation - the stages to be executed optimised query is

Parsing and Translation

SQL is a good • Given an SQL
language for people statement we want
It is quite high level to find an equivalent
It is non-procedural relational algebra

expression

Relational algebra is
This expression may better for machines be represented
It can be reasoned about more easily

tree - the query tree

SELECT DISTINCT * FROM R1, R2

Some Relational Operators

Produ>Selecti
Product finds all the combinations of one tuple from each of two relations
R1  R2 is equivalent to
Selection finds all those rows where some condition is true

  cond

R is equivalent to SELECT DISTINCT * FROM R WHERE <cond>

Some Relational Operators

Projection, selection
Projecti
Projection chooses a and product are set of attributes from

enough to express

a relation, removing

queries of the form

any others

SELECT <cols> R is

A1,A2,…  

FROM <table> equivalent to

WHERE <cond> SELECT DISTINCT A1, A2, ...

SQL  Relational Algebra

Relational Alg
SQL statement
Take the product of

SELECT Student.Name FROM Student, Enrolment WHERE Student.ID = Enrolment.ID AND

Student and Enrolment

select tuples where the IDs are the same and the Code is DBS
project over

Student.Name Query Tree  

Student.Name

Student.ID = Enrolment.ID

Enrolment.Code = ‘DBS’

Optimisation

There are often many ways to express the same query
Some of these will be more efficient than others
Need to find a >Many ways to optimise que>Changing the query tree to an equivalent but more efficient one
Choosing efficient implementations of each operator
Exploiting database

Optimisation Example

In our query tree before we have the s>This is equivalen
selecting those

Take the product of
Then taking the product of the result of the selection operator with Student

Student and Enrolment

Enrolment entries with Code = ‘DBS’

Then select those entries where the Enrolment.Code

Optimised Query Tree  

Student.Name 

Student.ID = Enrolment.ID

Optimisation Example

• To see the benefit of • From these statistics
this, consider the we can compute the following statistics sizes of the relat
Nottingham has produced by each around 18,000 full

operator in our query

time students

trees

Each student is enrolled in at about 10 modules

Original Query Tree  

Student.Name 

Student.ID = Enrolment.ID 

Enrolment.Code = ‘DBS’ 3,240,000,000 3,600,000

200 200 Optimised Query Tree  

Student.Name 

Student.ID = Enrolment.ID

200 3,600,000 200 200

Optimisation Example

• The original query • There is much more
tree produces an to optimisation intermediate result • In the example, the

product and the

with 3,240,000,000

second selection can

entries

be combined and

The optimised

implemented efficiently to avoid

version at worst has

generating all

3,600,000

Student-Enrolment

both the product and Next Lecture

Optimisation Example

If we have an index on Student.ID we can find a student from their ID with a binary search
For 18,000 students, this will take at most 15 operat
For each Enrolment entry with Code ‘DBS’ we find the corresponding Student from the ID
200 x 15 = 3,000 operations to do

Transact>ACID properties
The transaction man>Reco>System and Media Fail>Concurr>Concurrency prob
For more information

PPT UEU Basis Data Pertemuan 14

Data

As a Table As a File Secondary Index

Dokumen yang terkait

RPS CCS 120 Struktur Data S. Genap 2017

PPT UEU Struktur Data Pertemuan 2

PPT UEU Struktur Data Pertemuan 3

RPS CCB 210 Basis Data S. Ganjil 2017

PPT UEU Basis Data Pertemuan 2

PPT UEU Basis Data Pertemuan 3

PPT UEU Basis Data Pertemuan 6

PPT UEU Basis Data Pertemuan 10

PPT UEU Basis Data Pertemuan 11

PPT UEU Basis Data Pertemuan 12

Dukungan

Links

PPT UEU Basis Data Pertemuan 14

Data

As a Table As a File Secondary Index

Dokumen yang terkait

RPS CCS 120 Struktur Data S. Genap 2017

PPT UEU Struktur Data Pertemuan 2

PPT UEU Struktur Data Pertemuan 3

RPS CCB 210 Basis Data S. Ganjil 2017

PPT UEU Basis Data Pertemuan 2

PPT UEU Basis Data Pertemuan 3

PPT UEU Basis Data Pertemuan 6

PPT UEU Basis Data Pertemuan 10

PPT UEU Basis Data Pertemuan 11

PPT UEU Basis Data Pertemuan 12

Dokumen yang Anda mencari sudah siap untuk unduhkan