The Star Schema

The Star Schema

Rather than using the normalized database designs used in operational databases, a dimensional database uses a star schema. A star schema is so named because, as shown in Figure 13-8, it visually resembles a star, with a fact table at the center of the star and dimension tables radiating out from the center. The fact table is always fully normalized, but dimension tables may be non-normalized.

Figure 13-7

Operational Database

Dimensional Database

Characteristics of Operational and Dimensional

Used for unstructured analytical Databases

Used for structured transaction

data processing

data processing

Current data are used

Current and historical data are used

Data are inserted, updated, and

Data are loaded and updated

deleted by users

systematically, not by users

Part 5 Database Access Standards

PRODUCT

(Dimension Table)

PRODUCT_SALES

(Fact Table)

Figure 13-8

CUSTOMER

TIME

(Dimension Table) The Star Schema

(Dimension Table)

There is a more complex version of the star schema called the snowflake schema. In the snowflake schema, each dimension table is normalized,

which may create additional tables attached to the dimension tables.

To illustrate a star schema for a dimensional database, we will build a small (very small) data warehouse for Heather Sweeney Designs (HSD), a Texas company specializing in products for kitchen-remodeling services. HSD puts on seminars to attract customers and sell books and videos in addition to doing actual design work. A database design for HSD is shown in Figure 13-9, and an SQL Server database diagram for the HSD database is shown in Figure 13-10. The actual dimensional database for BI use at HSD is named HSD-DW, and it is shown in Figure 13-11. The SQL statements needed to create the tables in the HSD-DW database are shown in Figure 13-12, and the data for the HSD-DW database are shown in Figure 13-13. Compare the HSD-DW dimensional database model in Figure 13-11 to the HSD database diagram shown in Figure 13-10 and note how data in the HSD database have been used in the HSD-DW schema.

Figure 13-9

SEMINAR

SEMINAR_CUSTOMER

CUSTOMER

EmailAddress The HSD Database

SeminarID

SeminarID (FK)

EmailAddress (FK)

StreetAddress City State ZIP

CONTACT EmailAddress (FK) ContactDate

ContactNumber ContactType SeminarID (FK)

INVOICE

LINE_ITEM

PRODUCT

InvoiceNumber

InvoiceNumber (FK)

ProductNumber (FK)

Total EmailAddress (FK)

Chapter 13 Database Processing for Business Intelligence Systems

Figure 13-10

Note that in the HSD-DW database the CUSTOMER table has a surrogate The HSD Database

primary key named CustomerID, which has an integer value, whereas Diagram

in the HSD database the primary key EmailAddress was used. There are two reasons for this. First, the primary key EmailAddress used in the HSD database is simply too cumbersome for a data warehouse, so we switched to the preferable small and numeric surrogate key. Second, we do not use individual EmailAddress values in the HSD-DW database, only values of EmailDomain, which is not unique and cannot be used as a primary key.

Figure 13-11

The HSD-DW Star Schema

PRODUCT dimension table

PRODUCT_SALES

fact table TIMELINE dimension

table

CUSTOMER

dimension table

Part 5 Database Access Standards

Figure 13-12

A fact table is used to store measures of business activity, which are quantitative or

The HSD-DW SQL

factual data about the entity represented by the fact table. For example, in the HSD-DW

Statements

database, the fact table is PRODUCT_SALES:

PRODUCT_SALES (TimeID, CustomerID, ProductNumber, Quantity, UnitPrice, Total)

Chapter 13 Database Processing for Business Intelligence Systems

(a) TIMELINE Dimension Table

(d) PRODUCT_SALES Fact Table

(b) CUSTOMER Dimension Table

(c) PRODUCT Dimension Table

Figure 13-13

The HSD-DW Table Data

In this table:

• Quantity is quantitative data that record how many of the item were sold. • UnitPrice is quantitative data that record the dollar price of each item sold. • Total (= Quantity * UnitPrice) is quantitative data that record the total dollar value of

the sale of this item. The measures in the PRODUCT_SALES table are for units of product per day. We do not

use individual sale data (which would be based on InvoiceNumber), but rather data summed for each customer for each day. For example, if you could compare the HSD database INVOICE data for Ralph Able for 6/5/11, you would see that Ralph made two purchases on that date (InvoiceNumber 35013 and InvoiceNumber 35016). In the HSD-DW database, however, these two purchases are summed into the PRODUCT_SALES data for Ralph (CustomerID = 3) for 6/5/11 (TimeID = 40699).

The TimeID values are the sequential serial values used in Microsoft Excel to represent dates. Starting with 01-JAN-1900 as date value 1, the date

value is increased by 1 for each calendar day. Thus, 05-JUN-2011 = 40699. For more information, search “Date formats” in the Excel help system.

A dimension table is used to record values of attributes that describe the fact measures in the fact table, and these attributes are used in queries to select and group the measures in the fact table. Thus, CUSTOMER records data about the customers referenced by CustomerID in

Part 5 Database Access Standards

the SALES table, TIMELINE provides data that can be used to interpret the SALES event in time (which month? which quarter?), and so on. A query to summarize product units sold by Customer (CustomerName) and Product (ProductName) would be:

/* *** SQL-Query-CH13-01 *** */ SELECT

C.CustomerID, C.CustomerName, P.ProductNumber, P.ProductName, SUM(PS.Quantity) AS TotalQuantity

FROM

CUSTOMER AS C, PRODUCT_SALES AS PS, PRODUCT AS P

WHERE

C.CustomerID = PS.CustomerID

AND

P.ProductNumber = PS.ProductNumber

GROUP BY

C.CustomerID, C.CustomerName, P.ProductNumber, P.ProductName

ORDER BY

C.CustomerID, P.ProductNumber;

The results of this query are shown in Figure 13-14.

In Chapter 6, we discussed how an N:M relationship is created in a database as two 1:N relationships by use of an intersection table. We also discussed how additional attributes can

be added to the intersection table in an association relationship.

In a star schema, the fact table is often an association table—it is an intersection table for the relationships between the dimension tables with additional measures also stored in it. And, as with all other intersection and association tables, the key of the fact table is a composite key made up of all the foreign keys to the dimension tables.