Design of Parallel Systems

18.8 Design of Parallel Systems

So far this chapter has concentrated on parallelization of data storage and of query processing. Since large-scale parallel database systems are used primarily for storing large volumes of data, and for processing decision-support queries on those data, these topics are the most important in a parallel database system. Parallel loading of data from external sources is an important requirement, if we are to handle large volumes of incoming data.

A large parallel database system must also address these availability issues: • Resilience to failure of some processors or disks.

• Online reorganization of data and schema changes. We consider these issues here.

816 Chapter 18 Parallel Databases

With a large number of processors and disks, the probability that at least one processor or disk will malfunction is significantly greater than in a single- processor system with one disk. A poorly designed parallel system will stop functioning if any component (processor or disk) fails. Assuming that the probability of failure of a single processor or disk is small, the probability of failure of the system goes up linearly with the number of processors and disks. If a single processor or disk would fail once every 5 years, a system with 100 processors would have a failure every 18 days.

Therefore, large-scale parallel database systems, such as Teradata, and IBM Informix XPS , are designed to operate even if a processor or disk fails. Data are replicated across at least two processors. If a processor fails, the data that it stored can still be accessed from the other processors. The system keeps track of failed processors and distributes the work among functioning processors. Requests for data stored at the failed site are automatically routed to the backup sites that store a replica of the data. If all the data of a processor A are replicated at a single processor B, B will have to handle all the requests to A as well as those to itself, and that will result in B becoming a bottleneck. Therefore, the replicas of the data of a processor are partitioned across multiple other processors.

When we are dealing with large volumes of data (ranging in the terabytes), simple operations, such as creating indices, and changes to schema, such as adding a column to a relation, can take a long time—perhaps hours or even days. Therefore, it is unacceptable for the database system to be unavailable while such operations are in progress. Most database systems allow such operations to be performed online , that is, while the system is executing other transactions.

Consider, for instance, online index construction . A system that supports this feature allows insertions, deletions, and updates on a relation even as an index is being built on the relation. The index-building operation therefore cannot lock the entire relation in shared mode, as it would have done otherwise. Instead, the process keeps track of updates that occur while it is active and incorporates the changes into the index being constructed. (Most database systems today support online index construction, since this feature is very important even for non-parallel database systems.)

In recent years, a number of companies have developed new parallel database products, including Netezza, DATA llegro (which was acquired by Microsoft), Greenplum, and Aster Data. Each of these products runs on systems containing tens to thousands of nodes, with each node running an instance of an underlying database; Each product manages the partitioning of data, as well as parallel processing of queries, across the database instances.

Netezza, Greenplum and Aster Data use P ostgre SQL as the underlying database; DATA llegro originally used Ingres as the underlying database system, but moved to SQL S erver subsequent to its acquisition by Microsoft. By building on top of an existing database system, these systems are able to leverage the data storage, query processing, and transaction management features of the underlying database, leaving them free to focus on data partitioning (including replication for fault tolerance), fast interprocessor communication, parallel query processing, and parallel-query optimization. Another benefit of using a public domain

18.9 Parallelism on Multicore Processors 817

database such as P ostgre SQL is that the software cost per node is very low; in contrast commercial databases have a significant per-processor cost.

It is also worth mentioning that Netezza and DATA llegro actually sell data warehouse “appliances”, which include hardware and software, allowing cus- tomers to build parallel databases with minimal effort.

Design of Parallel Systems

18.8 Design of Parallel Systems

Parts

Dokumen yang terkait

View of pengaruh gaya kepemimpinan demokratis terhadap kinerja pegawai pada kantor sekretariat daerah kota samarinda

View of ALFIANA SARTIKA PENDAFTARAN PENDUDUK PINDAH DATANG DI KELURAHAN SANGASANGA DALAM KECAMATAN SANGASANGA KABUPATEN KUTAI KARTANEGARA

View of Dampak Kebijakan Pertambangan Batu Bara Bagi Masyarakat Bengkuring Kelurahan Sempaja Selatan Kecamatan Samarinda Utara

View of Implementasi Dana Bantuan PT. Kideco Jaya Agung Dalam Pembangunan Di Desa Sempulang Kecamatan Tanah Grogot Kabupaten Paser

PENGELOLAAN DANA PNPM MANDIRI PEDESAAN DI KELURAHAN WARU KECAMATAN WARU KABUPATEN PENAJAM PASER UTARA (Studi Tentang Program Dana Bergulir Simpan Pinjam Untuk Perempuan SPP) Min Anwar Rasyid1 Abstrak - View of PENGELOLAAN DANA PNPM MANDIRI PEDESAAN DI KEL

Evaluation of Teacher-Student Learning Style Disparity in Construction Management Education

Conceptions of early leaving a comparison of the views of teaching staff and.pdf

The Englishes of English tests: bias revisited

Research and Trends in the Studies of Native Non-native Speaker Teachers of Languages: A Review on Selected Researches and Theses

Variety of intelligence test (‘IQ’) tests IQ scores= good means of predicting

Dukungan

Links

Design of Parallel Systems

18.8 Design of Parallel Systems

Parts

Dokumen yang terkait

View of pengaruh gaya kepemimpinan demokratis terhadap kinerja pegawai pada kantor sekretariat daerah kota samarinda

View of ALFIANA SARTIKA PENDAFTARAN PENDUDUK PINDAH DATANG DI KELURAHAN SANGASANGA DALAM KECAMATAN SANGASANGA KABUPATEN KUTAI KARTANEGARA

View of Dampak Kebijakan Pertambangan Batu Bara Bagi Masyarakat Bengkuring Kelurahan Sempaja Selatan Kecamatan Samarinda Utara

View of Implementasi Dana Bantuan PT. Kideco Jaya Agung Dalam Pembangunan Di Desa Sempulang Kecamatan Tanah Grogot Kabupaten Paser

PENGELOLAAN DANA PNPM MANDIRI PEDESAAN DI KELURAHAN WARU KECAMATAN WARU KABUPATEN PENAJAM PASER UTARA (Studi Tentang Program Dana Bergulir Simpan Pinjam Untuk Perempuan SPP) Min Anwar Rasyid1 Abstrak - View of PENGELOLAAN DANA PNPM MANDIRI PEDESAAN DI KEL

Evaluation of Teacher-Student Learning Style Disparity in Construction Management Education

Conceptions of early leaving a comparison of the views of teaching staff and.pdf

The Englishes of English tests: bias revisited

Research and Trends in the Studies of Native Non-native Speaker Teachers of Languages: A Review on Selected Researches and Theses

Variety of intelligence test (‘IQ’) tests IQ scores= good means of predicting

Dokumen yang Anda mencari sudah siap untuk unduhkan