PPT UEU Basis Data Pertemuan 14

  Physical DB Issues, Indexes, Query Optimisation Database Systems Lecture 14 Munawar

  In This Lecture

  • Physical DB Issues
  • RAID arrays for recovery and speed
  • Indexes and query effici>Query optimisation
  • Query t>For more information
  • Connolly and Begg chapter 21 and

  

Physical Design

  • Design so far • Physical de>E/R modelling helps • Concerned with find the requirements storing and accessing of a database the data
  • Normalisation helps to • How to deal with refine a design by media failures removing
  • How to access redundancy information efficiently

  

RAID Arrays

  • RAID - redundant • RAID techniques

    array of independent • Mirroring - multiple

  copies of a file are

  (inexpensive) disks

  stored on separate

  • Storing information disks across more than one
  • Striping - parts of a physical disk file are stored on >Speed - can access disk more than one disk
  • Different levels (
  • Robustness - if one

  0, RAID 1…)

  

RAID Level 0

  • Files are split across several disks

  Data

  • For a system with n disks, each file is split into n parts, one part stored on each disk

  Data1 Data2 Data3

  • Improves speed, but no redundancy

  

RAID Level 1

  • As RAID 0 but with redundancy

  Data

  • Files are split over multiple disks
  • Each disk is mirrored

  Data1 Data2

  • For n disks, split files into n/2 parts, each stored on 2 disks
  • Improves speed, has

  Disk 1 Disk 2 Disk 3 Disk 4 Parity Checking

  • We can use parity checking to reduce the number of disks
  •   1 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 0 1 0 1 0 0 1 0 1 1 0 1 1 1 0 0 1 0 0 0 1 1 1

    • Parity - for a set of data in binary form we count the number of 1s for each bit across the data
    • If this is even the

      Recovery With Parity

    • If one of our pieces of data is lost we can recover it
    •   1 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 1 1 0 1 1 1 0 0 1 0 0 0 1 1 1

      • Just compute it as the parity of the remaining data and our original parity information

        

      RAID Level 3

      • Data is striped over disks, and a parity disk for redundancy
      • Data

        • For n disks, we split the data in n-1 parts
        • Each part is stored on a disk
        • The final disk stores parity information

          Disk 1 Disk 2 Disk 3

          Data1 Data2 Disk 4

          Data3 Parity

          

        Other RAID Issues

        • Other RAID levels cons>Considerations with RAID sys>How to split data between disks
        • Whether to store parity information on one disk, or spread across several
        • How to deal >Cost of disks
        • Do you need speed or redundancy?
        • How reliable are the individual disks?
        • ‘Hot swapping’
        • Is the disk the weak

          

        Indexes

        • Indexes are to do • Types of indexes with ordering data • Primary or clustered

          indexes affect the

        • The relational model order that the data is says that order stored in a file doesn’t matter
        • Secondary ind>From a practical point give a look-up table of view it is very into the file important
        • Only one primary index, but many

          

        Index Example

        • A telephone book • Ind>You store people’s • A clustered index can addresses and phone be made on name numbers
        • A secondary index>Usually you have a be made on number name and want the number
        • Sometimes you have a number and want the name

          Index Example

        As a Table As a File Secondary Index

          Jane, 9258501 8751209 Name Number

          John, 9251229 9251229 John 925 1229

          Mark, 8751209 9258501 Mary 925 8923 Jane 925 8501

          Mary, 9258923 9258923 Mark 875 1209

          Sometimes we Most of the time we

          Order does not look up names look up numbers by

          

        Choosing Indexes

        • You can only have one primary i>Don’t create too many ind>The most frequently looked-up value is often the best choice
        • Some DBMSs assume the primary key is the primary index, as it is usually used to refer to >They can speed up queries, but they slow down inserts, updates and deletes
        • Whenever the data is changed, the index may need to change
        Index Example

        • A product database, which we want to search by keyword
        •   Products WordLink Keywords

          • Each product can have many keywords
          • The same keyword can be associated with many products prodID prodName prodID keyID keyID keyWord

            Index Example

          • To search the products given a keyWord value

            prodID prodName

            1. We look up the keyWord in Keywords to find its keyID

            2. We look up that keyID in prodID keyID WordLink to find the related prodIDs

            3. We look up those prodIDs in Products to find more keyID keyWord

            Creating Indexes

          • In SQL we use • Example:

            CREATE INDEX: CREATE INDEX keyIndex ON Keywords (keyWord) CREATE INDEX CREATE INDEX linkIndex <index name> ON WordLink(keyID) ON <table> CREATE INDEX prodIndex ON Products (prodID) (<columns>)

            

          Query Processing

          • Once a database is • Three main stages designed and made • Parsing and

            translation - the

            we can query it

            query is put into an

          • A query language internal form

            (such as SQL) is used

          • Optimisation - to do this

            changes are made for

          • The query goes efficiency through several
          • Evaluation - the stages to be executed optimised query is

            

          Parsing and Translation

          • SQL is a good • Given an SQL

            language for people statement we want

          • It is quite high level to find an equivalent
          • It is non-procedural relational algebra

            expression

          • Relational algebra is
          • This expression may better for machines be represented
          • It can be reasoned about more easily

            tree - the query tree

            SELECT DISTINCT * FROM R1, R2

            

          Some Relational Operators

          • Produ>Selecti
          • Product finds all the combinations of one tuple from each of two relations
          • R1  R2 is equivalent to
          • Selection finds all those rows where some condition is true

              cond

            R is equivalent to SELECT DISTINCT * FROM R WHERE <cond>

            

          Some Relational Operators

          • Projection, selection
          • Projecti
          • Projection chooses a and product are set of attributes from

            enough to express

            a relation, removing

            queries of the form

            any others

            SELECT <cols> R is

            A1,A2,…  

             FROM <table> equivalent to

             WHERE <cond> SELECT DISTINCT A1, A2, ...

            

          SQL  Relational Algebra

          • Relational Alg
          • SQL statement
          • Take the product of

            SELECT Student.Name FROM Student, Enrolment WHERE Student.ID = Enrolment.ID AND

            Student and Enrolment

          • select tuples where the IDs are the same and the Code is DBS
          • project over

            Student.Name Query Tree  

            

          Student.Name

            

          Student.ID = Enrolment.ID

            

          Enrolment.Code = ‘DBS’

            

          Optimisation

          • There are often many ways to express the same query
          • Some of these will be more efficient than others
          • Need to find a >Many ways to optimise que>Changing the query tree to an equivalent but more efficient one
          • Choosing efficient implementations of each operator
          • Exploiting database

            

          Optimisation Example

          • In our query tree before we have the s>This is equivalen
          • selecting those

          • Take the product of
          • Then taking the product of the result of the selection operator with Student

            Student and Enrolment

            Enrolment entries with Code = ‘DBS’

          • Then select those entries where the Enrolment.Code
          •   Optimised Query Tree  

              Student.Name 

              Student.ID = Enrolment.ID

              

            Optimisation Example

            • • To see the benefit of • From these statistics

              this, consider the we can compute the following statistics sizes of the relat
            • Nottingham has produced by each around 18,000 full

              operator in our query

              time students

              trees

            • Each student is enrolled in at about 10 modules

              Original Query Tree  

              Student.Name 

              Student.ID = Enrolment.ID 

              Enrolment.Code = ‘DBS’ 3,240,000,000 3,600,000

              200 200 Optimised Query Tree  

              Student.Name 

              

            Student.ID = Enrolment.ID

            18,000

              200 3,600,000 200 200

              

            Optimisation Example

            • • The original query • There is much more

              tree produces an to optimisation intermediate result • In the example, the

              product and the

              with 3,240,000,000

              second selection can

              entries

              be combined and

            • The optimised

              implemented efficiently to avoid

              version at worst has

              generating all

              3,600,000

              Student-Enrolment

              both the product and Next Lecture

              

            Optimisation Example

            • If we have an index on Student.ID we can find a student from their ID with a binary search
            • For 18,000 students, this will take at most 15 operat
            • For each Enrolment entry with Code ‘DBS’ we find the corresponding Student from the ID
            • 200 x 15 = 3,000 operations to do

            • Transact>ACID properties
            • The transaction man>Reco>System and Media Fail>Concurr>Concurrency prob
            • For more information