The Star Schema
The Star Schema
Rather than using the normalized database designs used in operational databases, a dimensional database uses a star schema. A star schema is so named because, as shown in Figure 13-8, it visually resembles a star, with a fact table at the center of the star and dimension tables radiating out from the center. The fact table is always fully normalized, but dimension tables may be non-normalized.
Figure 13-7
Operational Database
Dimensional Database
Characteristics of Operational and Dimensional
Used for unstructured analytical Databases
Used for structured transaction
data processing
data processing
Current data are used
Current and historical data are used
Data are inserted, updated, and
Data are loaded and updated
deleted by users
systematically, not by users
Part 5 Database Access Standards
PRODUCT
(Dimension Table)
PRODUCT_SALES
(Fact Table)
Figure 13-8
CUSTOMER
TIME
(Dimension Table) The Star Schema
(Dimension Table)
There is a more complex version of the star schema called the snowflake schema. In the snowflake schema, each dimension table is normalized,
which may create additional tables attached to the dimension tables.
To illustrate a star schema for a dimensional database, we will build a small (very small) data warehouse for Heather Sweeney Designs (HSD), a Texas company specializing in products for kitchen-remodeling services. HSD puts on seminars to attract customers and sell books and videos in addition to doing actual design work. A database design for HSD is shown in Figure 13-9, and an SQL Server database diagram for the HSD database is shown in Figure 13-10. The actual dimensional database for BI use at HSD is named HSD-DW, and it is shown in Figure 13-11. The SQL statements needed to create the tables in the HSD-DW database are shown in Figure 13-12, and the data for the HSD-DW database are shown in Figure 13-13. Compare the HSD-DW dimensional database model in Figure 13-11 to the HSD database diagram shown in Figure 13-10 and note how data in the HSD database have been used in the HSD-DW schema.
Figure 13-9
SEMINAR
SEMINAR_CUSTOMER
CUSTOMER
EmailAddress The HSD Database
SeminarID
SeminarID (FK)
EmailAddress (FK)
StreetAddress City State ZIP
CONTACT EmailAddress (FK) ContactDate
ContactNumber ContactType SeminarID (FK)
INVOICE
LINE_ITEM
PRODUCT
InvoiceNumber
InvoiceNumber (FK)
ProductNumber (FK)
Total EmailAddress (FK)
Chapter 13 Database Processing for Business Intelligence Systems
Figure 13-10
Note that in the HSD-DW database the CUSTOMER table has a surrogate The HSD Database
primary key named CustomerID, which has an integer value, whereas Diagram
in the HSD database the primary key EmailAddress was used. There are two reasons for this. First, the primary key EmailAddress used in the HSD database is simply too cumbersome for a data warehouse, so we switched to the preferable small and numeric surrogate key. Second, we do not use individual EmailAddress values in the HSD-DW database, only values of EmailDomain, which is not unique and cannot be used as a primary key.
Figure 13-11
The HSD-DW Star Schema
PRODUCT dimension table
PRODUCT_SALES
fact table TIMELINE dimension
table
CUSTOMER
dimension table
Part 5 Database Access Standards
Figure 13-12
A fact table is used to store measures of business activity, which are quantitative or
The HSD-DW SQL
factual data about the entity represented by the fact table. For example, in the HSD-DW
Statements
database, the fact table is PRODUCT_SALES:
PRODUCT_SALES (TimeID, CustomerID, ProductNumber, Quantity, UnitPrice, Total)
Chapter 13 Database Processing for Business Intelligence Systems
(a) TIMELINE Dimension Table
(d) PRODUCT_SALES Fact Table
(b) CUSTOMER Dimension Table
(c) PRODUCT Dimension Table
Figure 13-13
The HSD-DW Table Data
In this table:
• Quantity is quantitative data that record how many of the item were sold. • UnitPrice is quantitative data that record the dollar price of each item sold. • Total (= Quantity * UnitPrice) is quantitative data that record the total dollar value of
the sale of this item. The measures in the PRODUCT_SALES table are for units of product per day. We do not
use individual sale data (which would be based on InvoiceNumber), but rather data summed for each customer for each day. For example, if you could compare the HSD database INVOICE data for Ralph Able for 6/5/11, you would see that Ralph made two purchases on that date (InvoiceNumber 35013 and InvoiceNumber 35016). In the HSD-DW database, however, these two purchases are summed into the PRODUCT_SALES data for Ralph (CustomerID = 3) for 6/5/11 (TimeID = 40699).
The TimeID values are the sequential serial values used in Microsoft Excel to represent dates. Starting with 01-JAN-1900 as date value 1, the date
value is increased by 1 for each calendar day. Thus, 05-JUN-2011 = 40699. For more information, search “Date formats” in the Excel help system.
A dimension table is used to record values of attributes that describe the fact measures in the fact table, and these attributes are used in queries to select and group the measures in the fact table. Thus, CUSTOMER records data about the customers referenced by CustomerID in
Part 5 Database Access Standards
the SALES table, TIMELINE provides data that can be used to interpret the SALES event in time (which month? which quarter?), and so on. A query to summarize product units sold by Customer (CustomerName) and Product (ProductName) would be:
/* *** SQL-Query-CH13-01 *** */ SELECT
C.CustomerID, C.CustomerName, P.ProductNumber, P.ProductName, SUM(PS.Quantity) AS TotalQuantity
FROM
CUSTOMER AS C, PRODUCT_SALES AS PS, PRODUCT AS P
WHERE
C.CustomerID = PS.CustomerID
AND
P.ProductNumber = PS.ProductNumber
GROUP BY
C.CustomerID, C.CustomerName, P.ProductNumber, P.ProductName
ORDER BY
C.CustomerID, P.ProductNumber;
The results of this query are shown in Figure 13-14.
In Chapter 6, we discussed how an N:M relationship is created in a database as two 1:N relationships by use of an intersection table. We also discussed how additional attributes can
be added to the intersection table in an association relationship.
In a star schema, the fact table is often an association table—it is an intersection table for the relationships between the dimension tables with additional measures also stored in it. And, as with all other intersection and association tables, the key of the fact table is a composite key made up of all the foreign keys to the dimension tables.