Data Modeling, Data Structures, Databases, and the Data Warehouse
14.2.1 Data Modeling, Data Structures, Databases, and the Data Warehouse
The data objects defined during software requirements analysis are modeled using entity/relationship diagrams and the data dictionary (Chapter 12). The data design
“Data quality is the activity translates these elements of the requirements model into data structures at difference between a
the software component level and, when necessary, a database architecture at the data warehouse and
a data garbage application level. dump.”
In years past, data architecture was generally limited to data structures at the pro-
Jarrett Rosenberg
gram level and databases at the application level. But today, businesses large and small are awash in data. It is not unusual for even a moderately sized business to have dozens of databases serving many applications encompassing hundreds of giga- bytes of data. The challenge for a business has been to extract useful information from this data environment, particularly when the information desired is cross- functional (e.g., information that can be obtained only if specific marketing data are cross-correlated with product engineering data).
To solve this challenge, the business IT community has developed data mining
WebRef
techniques, also called knowledge discovery in databases (KDD), that navigate through Up-to-date information on
data warehouse existing databases in an attempt to extract appropriate business-level information. technologies can be
However, the existence of multiple databases, their different structures, the degree obtained at
of detail contained with the databases, and many other factors make data mining dif-
www.data warehouse.com
ficult within an existing database environment. An alternative solution, called a data warehouse, adds an additional layer to the data architecture.
CHAPTER 14
ARCHITECTURAL DESIGN
A data warehouse is a separate data environment that is not directly integrated with day-to-day applications but encompasses all data used by a business [MAT96]. In a sense, a data warehouse is a large, independent database that encompasses some, but not all, of the data that are stored in databases that serve the set of appli- cations required by a business. But many characteristics differentiate a data ware- house from the typical database [INM95]:
Subject orientation. A data warehouse is organized by major business subjects, rather than by business process or function. This leads to the exclu- sion of data that may be necessary for a particular business function but is generally not necessary for data mining. Integration. Regardless of the source, the data exhibit consistent naming conventions, units and measures, encoding structures, and physical attri-
A data warehouse butes, even when inconsistency exists across different application-oriented encompasses all data
that are used by a databases. business. The intent is
Time variancy. For a transaction-oriented application environment, data to enable access to
are accurate at the moment of access and for a relatively short time span “knowledge” that
(typically 60 to 90 days) before access. For a data warehouse, however, data might not be otherwise
available. can be accessed at a specific moment in time (e.g., customers contacted on the date that a new product was announced to the trade press). The typical time horizon for a data warehouse is five to ten years. Nonvolatility. Unlike typical business application databases that undergo a continuing stream of changes (inserts, deletes, updates), data are loaded into the warehouse, but after the original transfer, the data do not change.
These characteristics present a unique set of design challenges for a data architect.
A detailed discussion of the design of data structures, databases, and the data warehouse is best left to books dedicated to these subjects (e.g., [PRE98], [DAT95], [KIM98]). The interested reader should see the Further Readings and Information Sources section of this chapter for additional references.
Parts
» The Concurrent Development Model
» SUMMARY Software engineering is a discipline that integrates process, methods, and tools for
» PEOPLE In a study published by the IEEE [CUR88], the engineering vice presidents of three
» THE PROCESS The generic phases that characterize the software process—definition, development,
» THE PROJECT In order to manage a successful software project, we must understand what can go
» METRICS IN THE PROCESS AND PROJECT DOMAINS
» Extended Function Point Metrics
» METRICS FOR SOFTWARE QUALITY
» INTEGRATING METRICS WITHIN THE SOFTWARE PROCESS
» METRICS FOR SMALL ORGANIZATIONS
» ESTABLISHING A SOFTWARE METRICS PROGRAM
» Obtaining Information Necessary for Scope
» An Example of LOC-Based Estimation
» QUALITY CONCEPTS 1 It has been said that no two snowflakes are alike. Certainly when we watch snow
» SUMMARY Software quality assurance is an umbrella activity that is applied at each step in the
» R diagram 1.4 <part-of> data model; data model <part-of> design specification;
» SYSTEM MODELING Every computer-based system can be modeled as an information transform using an
» Facilitated Application Specification Techniques
» Data Objects, Attributes, and Relationships
» Entity/Relationship Diagrams
» Hatley and Pirbhai Extensions
» Creating an Entity/Relationship Diagram
» SUMMARY Design is the technical kernel of software engineering. During design, progressive
» Data Modeling, Data Structures, Databases, and the Data Warehouse
» Data Design at the Component Level
» A Brief Taxonomy of Styles and Patterns
» Quantitative Guidance for Architectural Design
» Isolate the transform center by specifying incoming and outgoing
» SUMMARY Software architecture provides a holistic view of the system to be built. It depicts the
» The User Interface Design Process
» Defining Interface Objects and Actions
» D E S I G N E VA L U AT I O N
» Testing for Real-Time Systems
» Organizing for Software Testing
» Criteria for Completion of Testing
» The Transition to a Quantitative View
» The Attributes of Effective Software Metrics
» Architectural Design Metrics
» Component-Level Design Metrics
» SUMMARY Software metrics provide a quantitative way to assess the quality of internal product
» Encapsulation, Inheritance, and Polymorphism
» Identifying Classes and Objects
» The Common Process Framework for OO
» OO Project Metrics and Estimation
» Event Identification with Use-Cases
» SUMMARY Object-oriented analysis methods enable a software engineer to model a problem by
» Partitioning the Analysis Model
» Designing Algorithms and Data Structures
» Program Components and Interfaces
» SUMMARY Object-oriented design translates the OOA model of the real world into an
» Testing Surface Structure and Deep Structure
» Deficiencies of Less Formal Approaches 1
» What Makes Cleanroom Different?
» Design Refinement and Verification
» SUMMARY Cleanroom software engineering is a formal approach to software development that
» Structural Modeling and Structure Points
» Describing Reusable Components
» SUMMARY Component-based software engineering offers inherent benefits in software quality,
» Guidelines for Distributing Application Subsystems
» Middleware and Object Request Broker Architectures
» An Overview of a Design Approach
» Consider expert Web developer will create a complete design, but time and cost can be appropriate
» A Software Reengineering Process Model
» Reverse Engineering to Understand Data
» Forward Engineering for Client/Server Architectures
» SUMMARY Reengineering occurs at two different levels of abstraction. At the business level,
Show more