U SING D ATABASES TO I MPROVE B USINESS P ERFORMANCE AND D ECISION M AKING

6.3 U SING D ATABASES TO I MPROVE B USINESS P ERFORMANCE AND D ECISION M AKING

Businesses use their databases to keep track of basic transactions, such as paying suppliers, processing orders, keeping track of customers, and paying employees. But they also need databases to provide information that will help the company

FIGURE 6-11 AN ENTITY-RELATIONSHIP DIAGRAM

This diagram shows the relationships between the entities SUPPLIER, PART, LINE_ITEM, and ORDER that might be used to model the database in Figure 6-10.

222 Part Two Information Technology Infrastructure run the business more efficiently, and help managers and employees make bet-

ter decisions. If a company wants to know which product is the most popular or who is its most profitable customer, the answer lies in the data.

For example, by analyzing data from customer credit card purchases, Louise’s Trattoria, a Los Angeles restaurant chain, learned that quality was more important than price for most of its customers, who were college- educated and liked fine wine. Acting on this information, the chain introduced vegetarian dishes, more seafood selections, and more expensive wines, raising sales by more than 10 percent.

In a large company, with large databases or large systems for separate functions, such as manufacturing, sales, and accounting, special capabilities and tools are required for analyzing vast quantities of data and for accessing data from multiple systems. These capabilities include data warehousing, data mining, and tools for accessing internal databases through the Web.

DATA WAREHOUSES Suppose you want concise, reliable information about current operations,

trends, and changes across the entire company If you worked in a large company, obtaining this might be difficult because data are often maintained in separate systems, such as sales, manufacturing, or accounting. Some of the data you need might be found in the sales system, and other pieces in the manufacturing system. Many of these systems are older legacy systems that use outdated data management technologies or file systems where information is difficult for users to access.

You might have to spend an inordinate amount of time locating and gathering the data you need, or you would be forced to make your decision based on incomplete knowledge. If you want information about trends, you might also have trouble finding data about past events because most firms only make their current data immediately available. Data warehousing addresses these problems.

What Is a Data Warehouse?

A data warehouse is a database that stores current and historical data of potential interest to decision makers throughout the company. The data originate in many core operational transaction systems, such as systems for sales, customer accounts, and manufacturing, and may include data from Web site transactions. The data warehouse consolidates and standardizes information from different operational databases so that the information can be used across the enterprise for management analysis and decision making.

Figure 6-12 illustrates how a data warehouse works. The data warehouse makes the data available for anyone to access as needed, but it cannot be altered. A data warehouse system also provides a range of ad hoc and standardized query tools, analytical tools, and graphical reporting facilities. Many firms use intranet portals to make the data warehouse information widely available throughout the firm.

Catalina Marketing, a global marketing firm for major consumer packaged goods companies and retailers, operates a gigantic data warehouse that includes three years of purchase history for 195 million U.S. customer loyalty program members at supermarkets, pharmacies, and other retailers. It is the largest loyalty database in the world. Catalina’s retail store customers analyze this database of customer purchase histories to determine individual customers’ buying preferences. When a shopper checks out at the cash register of one of

Chapter 6 Foundations of Business Intelligence: Databases and Information Management 223 FIGURE 6-12

COMPONENTS OF A DATA WAREHOUSE

The data warehouse extracts current and historical data from multiple operational systems inside the organization. These data are combined with data from external sources and reorganized into a central database designed for management reporting and analysis. The information directory provides users with information about the data available in the warehouse.

Catalina’s retail customers, the purchase is instantly analyzed along with that customer’s buying history in the data warehouse to determine what coupons that customer will receive at checkout along with a receipt.

The U.S. Internal Revenue Service (IRS) maintains a Compliance Data Warehouse that consolidates taxpayer data that had been fragmented among many different legacy systems, including personal information about taxpay- ers and archived tax returns. These systems had been designed to process tax return forms efficiently but their data were very difficult to query and analyze. The Compliance Data Warehouse integrates taxpayer data from many disparate sources into a relational structure, which makes querying and analysis much easier. With a complete and comprehensive picture of taxpayers, the warehouse helps IRS analysts and staff identify people who are most likely to cheat on their income tax payments and respond rapidly to taxpayer queries.

Data Marts Companies often build enterprise-wide data warehouses, where a central data warehouse serves the entire organization, or they create smaller, decentral- ized warehouses called data marts. A data mart is a subset of a data ware- house in which a summarized or highly focused portion of the organization’s data is placed in a separate database for a specific population of users. For example, a company might develop marketing and sales data marts to deal with customer information. Before implementing an enterprise-wide data warehouse, bookseller Barnes & Noble maintained a series of data marts—one

for point-of-sale data in retail stores, another for college bookstore sales, and a third for online sales. A data mart typically focuses on a single subject area or line of business, so it usually can be constructed more rapidly and at lower cost than an enterprise-wide data warehouse.

224 Part Two Information Technology Infrastructure

TOOLS FOR BUSINESS INTELLIGENCE: MULTIDIMENSIONAL DATA ANALYSIS AND DATA MINING Once data have been captured and organized in data warehouses and data marts,

they are available for further analysis using tools for business intelligence, which we introduced briefly in Chapter 2. Business intelligence tools enable users to analyze data to see new patterns, relationships, and insights that are useful for guiding decision making.

Principal tools for business intelligence include software for database querying and reporting, tools for multidimensional data analysis (online analytical processing), and tools for data mining. This section will introduce you to these tools, with more detail about business intelligence analytics and applications in the Chapter 12 discussion of decision making.

Online Analytical Processing (OLAP) Suppose your company sells four different products—nuts, bolts, washers, and screws—in the East, West, and Central regions. If you wanted to ask a fairly straightforward question, such as how many washers were sold during the past quarter, you could easily find the answer by querying your sales database. But what if you wanted to know how many washers sold in each of your sales regions and compare actual results with projected sales?

To obtain the answer, you would need online analytical processing (OLAP) . OLAP supports multidimensional data analysis, enabling users to view the same data in different ways using multiple dimensions. Each aspect of information—product, pricing, cost, region, or time period—represents a different dimension. So, a product manager could use a multidimensional data analysis tool to learn how many washers were sold in the East in June, how that compares with the previous month and the previous June, and how it compares with the sales forecast. OLAP enables users to obtain online answers to ad hoc questions such as these in a fairly rapid amount of time, even when the data are stored in very large databases, such as sales figures for multiple years.

Figure 6-13 shows a multidimensional model that could be created to represent products, regions, actual sales, and projected sales. A matrix of actual sales can be stacked on top of a matrix of projected sales to form a cube with six faces. If you rotate the cube 90 degrees one way, the face showing will

be product versus actual and projected sales. If you rotate the cube 90 degrees again, you will see region versus actual and projected sales. If you rotate 180 degrees from the original view, you will see projected sales and product versus region. Cubes can be nested within cubes to build complex views of data. A company would use either a specialized multidimensional database or a tool that creates multidimensional views of data in relational databases.

Data Mining Traditional database queries answer such questions as, “How many units of product number 403 were shipped in February 2010?” OLAP, or multidimen- sional analysis, supports much more complex requests for information, such as

“Compare sales of product 403 relative to plan by quarter and sales region for the past two years.” With OLAP and query-oriented data analysis, users need to have a good idea about the information for which they are looking.

Data mining is more discovery-driven. Data mining provides insights into corporate data that cannot be obtained with OLAP by finding hidden patterns and

Chapter 6 Foundations of Business Intelligence: Databases and Information Management 225 FIGURE 6-13

MULTIDIMENSIONAL DATA MODEL

The view that is showing is product versus region. If you rotate the cube 90 degrees, the face will show product versus actual and projected sales. If you rotate the cube 90 degrees again, you will see region versus actual and projected sales. Other views are possible.

relationships in large databases and inferring rules from them to predict future behavior. The patterns and rules are used to guide decision making and forecast the effect of those decisions. The types of information obtainable from data mining include associations, sequences, classifications, clusters, and forecasts.

• Associations are occurrences linked to a single event. For instance, a study of supermarket purchasing patterns might reveal that, when corn chips are purchased, a cola drink is purchased 65 percent of the time, but when there is a promotion, cola is purchased 85 percent of the time. This information helps managers make better decisions because they have learned the profitability of a promotion.

• In sequences, events are linked over time. We might find, for example, that if

a house is purchased, a new refrigerator will be purchased within two weeks

65 percent of the time, and an oven will be bought within one month of the home purchase 45 percent of the time.

• Classification recognizes patterns that describe the group to which an item belongs by examining existing items that have been classified and by inferring a set of rules. For example, businesses such as credit card or telephone companies worry about the loss of steady customers. Classification helps discover the characteristics of customers who are likely to leave and can provide a model to help managers predict who those customers are so that the managers can devise special campaigns to retain such customers.

• Clustering works in a manner similar to classification when no groups have yet been defined. A data mining tool can discover different groupings within data, such as finding affinity groups for bank cards or partitioning a database into groups of customers based on demographics and types of personal investments.

• Although these applications involve predictions, forecasting uses predictions in

a different way. It uses a series of existing values to forecast what other values will be. For example, forecasting might find patterns in data to help managers estimate the future value of continuous variables, such as sales figures.

These systems perform high-level analyses of patterns or trends, but they can also drill down to provide more detail when needed. There are data mining

226 Part Two Information Technology Infrastructure applications for all the functional areas of business, and for government and

scientific work. One popular use for data mining is to provide detailed analyses of patterns in customer data for one-to-one marketing campaigns or for identifying profitable customers.

For example, Harrah’s Entertainment, the second-largest gambling company in its industry, uses data mining to identify its most profitable customers and generate more revenue from them. The company continually analyzes data about its customers gathered when people play its slot machines or use Harrah’s casinos and hotels. Harrah’s marketing department uses this informa- tion to build a detailed gambling profile, based on a particular customer’s ongoing value to the company. For instance, data mining lets Harrah’s know the favorite gaming experience of a regular customer at one of its Midwest riverboat casinos, along with that person’s preferences for room accomodations, restaurants, and entertainment. This information guides management deci- sions about how to cultivate the most profitable customers, encourage those customers to spend more, and attract more customers with high revenue-gen- erating potential. Business intelligence has improved Harrah’s profits so much that it has become the centerpiece of the firm’s business strategy.

Predictive analytics use data mining techniques, historical data, and assump- tions about future conditions to predict outcomes of events, such as the probabil- ity a customer will respond to an offer or purchase a specific product. For exam-

ple, the U.S. division of The Body Shop International plc used predictive analytics with its database of catalog, Web, and retail store customers to identify customers who were more likely to make catalog purchases. That information helped the company build a more precise and targeted mailing list for its catalogs, improving the response rate for catalog mailings and catalog revenues.

Te x t M i n i n g a n d W e b M i n i n g Business intelligence tools deal primarily with data that have been structured in databases and files. However, unstructured data, most in the form of text files, is believed to account for over 80 percent of an organization’s useful informa- tion. E-mail, memos, call center transcripts, survey responses, legal cases, patent descriptions, and service reports are all valuable for finding patterns and trends that will help employees make better business decisions. Text mining tools are now available to help businesses analyze these data. These tools are able to extract key elements from large unstructured data sets, discover

patterns and relationships, and summarize the information. Businesses might turn to text mining to analyze transcripts of calls to customer service centers to identify major service and repair issues.

Text mining is a relatively new technology, but what’s really new are the myriad ways in which unstructured data are being generated by consumers and the business uses for these data. The Interactive Session on Technology explores some of these business applications of text mining.

The Web is another rich source of valuable information, some of which can now be mined for patterns, trends, and insights into customer behavior. The discovery and analysis of useful patterns and information from the World Wide Web is called Web mining. Businesses might turn to Web mining to help them understand customer behavior, evaluate the effectiveness of a particular Web site, or quantify the success of a marketing campaign. For instance, marketers use Google Trends and Google Insights for Search services, which track the popularity of various words and phrases used in Google search queries, to learn what people are interested in and what they are interested in buying.

Chapter 6 Foundations of Business Intelligence: Databases and Information Management 227

INTERACTIVE SESSION: TECHNOLOGY

Text mining is the discovery of patterns and relation- ships from large sets of unstructured data—the kind of data we generate in e-mails, phone conversations, blog postings, online customer surveys, and tweets. The mobile digital platform has amplified the explosion in digital information, with hundreds of millions of people calling, texting, searching, “apping” (using applications), buying goods, and writing billions of e-mails on the go.

Consumers today are more than just consumers: they have more ways to collaborate, share informa- tion, and influence the opinions of their friends and peers, and the data they create in doing so have significant value to businesses. Unlike structured data, which are generated from events such as completing a purchase transaction, unstructured data have no distinct form. Nevertheless, managers believe such data may offer unique insights into cus- tomer behavior and attitudes that were much more difficult to determine years ago.

For example, in 2007, JetBlue experienced unprecedented levels of customer discontent in the wake of a February ice storm that resulted in widespread flight cancellations and planes stranded on Kennedy Airport runways. The airline received 15,000 e-mails per day from customers during the storm and immediately afterwards, up from its usual daily volume of 400. The volume was so much larger than usual that JetBlue had no simple way to read everything its customers were saying.

Fortunately, the company had recently contracted with Attensity, a leading vendor of text analytics software, and was able to use the software to analyze all of the e-mail it had received within two days. According to JetBlue research analyst Bryan Jeppsen, Attensity Analyze for Voice of the Customer (VoC) enabled JetBlue to rapidly extract customer sentiments, preferences, and requests it couldn’t find any other way. This tool uses a proprietary technol- ogy to automatically identify facts, opinions, requests, trends, and trouble spots from the unstruc- tured text of survey responses, service notes, e-mail messages, Web forums, blog entries, news articles, and other customer communications. The technol- ogy is able to accurately and automatically identify the many different “voices” customers use to express their feedback (such as a negative voice, positive voice, or conditional voice), which helps organiza-

WHAT CAN BUSINESSES LEARN FROM TEXT MINING?

tions pinpoint key events and relationships, such as intent to buy, intent to leave, or customer “wish” events. It can reveal specific product and service issues, reactions to marketing and public relations efforts, and even buying signals.

Attensity’s software integrated with JetBlue’s other customer analysis tools, such as Satmetrix’s Net Promoter metrics, which classifies customers into groups that are generating positive, negative, or no feedback about the company. Using Attensity’s text analytics in tandem with these tools, JetBlue devel- oped a customer bill of rights that addressed the major issues customers had with the company.

Hotel chains like Gaylord Hotels and Choice Hotels are using text mining software to glean insights from thousands of customer satisfaction surveys provided by their guests. Gaylord Hotels is using Clarabridge’s text analytics solution delivered via the Internet as a hosted software service to gather and analyze customer feedback from surveys, e-mail, chat messaging, staffed call centers, and online forums associated with guests’ and meeting planners’ experiences at the company’s convention resorts. The Clarabridge software sorts through the hotel chain’s customer surveys and gathers positive and negative comments, organizing them into a variety of categories to reveal less obvious insights. For example, guests complained about many things more frequently than noisy rooms, but complaints of noisy rooms were most frequently correlated with surveys indicating an unwillingness to return to the hotel for another stay.

Analyzing customer surveys used to take weeks, but now takes only days, thanks to the Clarabridge software. Location managers and corporate execu- tives have also used findings from text mining to influence decisions on building improvements.

Wendy’s International adopted Clarabridge software to analyze nearly 500,000 messages it collects each year from its Web-based feedback forum, call center notes, e-mail messages, receipt-based surveys, and social media. The chain’s customer satisfaction team had previously used spreadsheets and keyword searches to review customer comments, a very slow manual approach. Wendy’s management was looking for a better tool to speed analysis, detect emerging issues, and pinpoint troubled areas of the business at the store, regional, or corporate level.

228 Part Two Information Technology Infrastructure The Clarabridge technology enables Wendy’s to

determine how consumers feel about their brand and track customer experiences down to the store level

take steps to respond to negative sentiment. within minutes. This timely information helps store,

Structured data analysis won’t be rendered obso- regional, and corporate managers spot and address

lete by text analytics, but companies that are able to problems related to meal quality, cleanliness, and

use both methods to develop a clearer picture of speed of service.

their customers’ attitudes will have an easier time Text analytics software caught on first with

establishing and building their brand and gleaning government agencies and larger companies with

insights that will enhance profitability. information systems departments that had the means to properly use the complicated software, but

Sources: Doug Henschen, “Wendy’s Taps Text Analytics to Mine

Clarabridge is now offering a version of its product

Customer Feedback,” Information Week, March 23, 2010; David Stodder,” How Text Analytics Drive Customer Insight” Information

geared towards small businesses. The technology has

Week , February 1, 2010; Nancy David Kho, “Customer Experience

already caught on with law enforcement, search tool

and Sentiment Analysis,” KMWorld, February 1, 2010; Siobhan

interfaces, and “listening platforms” like Nielsen

Gorman, “Details of Einstein Cyber-Shield Disclosed by White

Online. Listening platforms are text mining tools that House,” The Wall Street Journal, March 2, 2010; www.attensity.com,

accessed June 16, 2010; and www.clarabridge.com, accessed

focus on brand management, allowing companies to

June 17, 2010.

CASE STUDY QUESTIONS MIS IN ACTION

1. What challenges does the increase in unstruc- Visit a Web site such as QVC.com or TripAdvisor.com tured data present for businesses?

detailing products or services that have customer

2. How does text-mining improve decision-making? reviews. Pick a product, hotel, or other service with

3. What kinds of companies are most likely to benefit at least 15 customer reviews and read those reviews, from text mining software? Explain your answer.

both positive and negative. How could Web content

4. In what ways could text mining potentially lead to mining help the offering company improve or better the erosion of personal information privacy?

market this product or service? What pieces of Explain.

information should highlighted?

Web mining looks for patterns in data through content mining, structure mining, and usage mining. Web content mining is the process of extracting knowledge from the content of Web pages, which may include text, image, audio, and video data. Web structure mining extracts useful information from the links embedded in Web documents. For example, links pointing to a document indicate the popularity of the document, while links coming out of a document indicate the richness or perhaps the variety of topics covered in the document. Web usage mining examines user interaction data recorded by a Web server whenever requests for a Web site’s resources are received. The usage data records the user’s behavior when the user browses or makes transactions on the Web site and col- lects the data in a server log. Analyzing such data can help companies determine the value of particular customers, cross marketing strategies across products, and the effectiveness of promotional campaigns.

DATABASES AND THE WEB Have you ever tried to use the Web to place an order or view a product catalog?

If so, you were probably using a Web site linked to an internal corporate database. Many companies now use the Web to make some of the information in their internal databases available to customers and business partners.

Chapter 6 Foundations of Business Intelligence: Databases and Information Management 229 Suppose, for example, a customer with a Web browser wants to search an online

retailer’s database for pricing information. Figure 6-14 illustrates how that cus- tomer might access the retailer’s internal database over the Web. The user accesses the retailer’s Web site over the Internet using Web browser software on his or her client PC. The user’s Web browser software requests data from the orga- nization’s database, using HTML commands to communicate with the Web server.

Because many back-end databases cannot interpret commands written in HTML, the Web server passes these requests for data to software that translates HTML commands into SQL so that they can be processed by the DBMS working with the database. In a client/server environment, the DBMS resides on a dedicated computer called a database server. The DBMS receives the SQL requests and provides the required data. The middleware transfers information from the organization’s internal database back to the Web server for delivery in the form of a Web page to the user.

Figure 6-14 shows that the middleware working between the Web server and the DBMS is an application server running on its own dedicated computer (see Chapter 5). The application server software handles all application operations, including transaction processing and data access, between browser-based computers and a company’s back-end business applications or databases. The application server takes requests from the Web server, runs the business logic to process transactions based on those requests, and provides connectivity to the organization’s back-end systems or databases. Alternatively, the software for handling these operations could be a custom program or a CGI script. A CGI script is a compact program using the Common Gateway Interface (CGI) specifi- cation for processing data on a Web server.

There are a number of advantages to using the Web to access an organiza- tion’s internal databases. First, Web browser software is much easier to use than proprietary query tools. Second, the Web interface requires few or no changes to the internal database. It costs much less to add a Web interface in front of a legacy system than to redesign and rebuild the system to improve user access.

Accessing corporate databases through the Web is creating new efficiencies, opportunities, and business models. ThomasNet.com provides an up-to-date online directory of more than 600,000 suppliers of industrial products, such as chemicals, metals, plastics, rubber, and automotive equipment. Formerly called Thomas Register, the company used to send out huge paper catalogs with this information. Now it provides this information to users online via its Web site and has become a smaller, leaner company.

Other companies have created entirely new businesses based on access to large databases through the Web. One is the social networking site MySpace, which helps users stay connected with each other or meet new people.

FIGURE 6-14 LINKING INTERNAL DATABASES TO THE WEB

Users access an organization’s internal database through the Web using their desktop PCs and Web browser software.

230 Part Two Information Technology Infrastructure MySpace features music, comedy, videos, and “profiles” with information

supplied by 122 million users about their age, hometown, dating preferences, marital status, and interests. It maintains a massive database to house and manage all of this content. Facebook uses a similar database.