Web Service Invocation in Integration Flows

16 Working with Oracle Data Quality Products 16-1 16 Working with Oracle Data Quality Products This chapter describes how to work with Data Quality Products in Oracle Data Integrator. This chapter includes the following sections: ■ Section 16.1, Introduction to Oracle Data Quality Products ■ Section 16.2, The Data Quality Process

16.1 Introduction to Oracle Data Quality Products

Oracle Data Profiling and Oracle Data Quality for Data Integrator also referred to as Oracle Data Quality Products extend the inline Data Quality features of Oracle Data Integrator to provide more advanced data governance capabilities. A complete Data Quality system includes data profiling, integrity and quality: ■ Profiling makes possible data investigation and quality assessment. It allows business users to get a clear picture of their data quality challenges, to monitor and track the quality of their data over time. Profiling is handled by Oracle Data Profiling . It allows business users to assess the quality of their data through metrics, to discover or infer rules based on this data, and finally to monitor over time the evolution of the data quality. ■ Integrity control is essential in ensuring the overall consistency of the data in your information systems applications. Application data is not always valid for the constraints and declarative rules imposed by the information system. You may, for instance, find orders with no customer, or order lines with no product, and so forth. Oracle Data Integrator provides built-in working environment to detect these constraint violation and store them for recycling or reporting purposes. Static and Flow checks in Oracle Data Integrator are integrity checks. ■ Quality includes integrity and extends to more complex quality processing. A rule-based engine apply data quality standards as part of an integration process to cleanse, standardize, enrich, match and de-duplicate any type of data, including names and addresses. Oracle Data Quality for Data Integrator places data quality as well as name and address cleansing at the heart of the enterprise integration strategy.

16.2 The Data Quality Process

The data quality process described in this section uses Oracle Data Quality products to profile and cleanse data extracted from systems using Oracle Data Integrator. The 16-2 Oracle Fusion Middleware Developers Guide for Oracle Data Integrator cleansed data is also re-integrated into the original system using Oracle Data Integrator. The Quality Process has the following steps: 1. Create a Quality Input File from Oracle Data Integrator, containing the data to cleanse. 2. Create an Entity in Oracle Data Quality, based on this file. 3. Create a Profiling Project to determine quality issues. 4. Create a Oracle Data Quality Project cleansing this Entity. 5. Export the Data Quality Project for run-time. 6. Reverse-engineer the Entities using the RKM Oracle Data Quality. 7. Use Oracle Data Quality Input and Output Files in Interfaces 8. Run this Quality Project from Oracle Data Integrator using the OdiDataQuality tool . 9. Sequence the Process in a Package .

16.2.1 Create a Quality Input File

Oracle Data Quality uses as a source for the Quality project a flat file which contains the data to cleanse. This Quality input file can be created from Data Integrator and loaded from any source datastore using interfaces. This file should be a FILE datastore with the following parameters defined on the Files tab: For more information on creating a FILE datastore, refer to the Chapter 5, Creating and Reverse-Engineering a Model . For more information on loading flat files, see Files in the Oracle Fusion Middleware Connectivity and Knowledge Modules Guide for Oracle Data Integrator.

16.2.2 Create an Entity

To import a data source into Oracle Data Quality for Data Integrator means to create an entity based on a delimited source file.

16.2.2.1 Step 1: Validate Loader Connections

Your administrator must set up at least one Loader Connection when he or she installs Oracle Data Quality for Data Integrator. This Loader Connection is used to access the Oracle Data Quality input file. As the input file is a delimited file, this Loader Connection should be a Delimited Loader Connection. Step 1 requires you validate Parameter Value File Format Delimited Heading Number of Lines 1 Record Separator MS-DOS Field Separator Other [Field Separator] Other ,comma sign - Hexadecimal 2C Text Delimiter double quotation marks Decimal Separator empty, not specified Working with Oracle Data Quality Products 16-3 this Delimited Loader Connection set up. Also verify that all the data and schema files you need are copied to the directory defined by the Loader Connection. If you do not have access to the Metabase Manager, ask your Metabase administrator to verify the Loader Connection for you. If you are a Metabase User and have access to the Metabase Manager, follow this procedure: To validate a Loader Connection

1. Open the Metabase Manager Start All Programs Oracle Oracle Data

Profiling and Quality Metabase Manager. 2. Verify you are in Admin Mode. 3. Expand the Control Admin node.

4. Double-click Loader Connections.

5. On the right, the Loader Connections list view displays each Loader Connection, showing its name, type, data file, and parameters. Review the information to verify that the Loader Connection created by your administrator is a Delimited Loader Connection and that the data and schema directories are pointing to the correct location.

16.2.2.2 Step 2: Create Entity and Import Data

Use the Create Entity wizard to create an Entity. The Wizard takes you through each step, helps you to select data to load, and provides an interface for specifying connection and schema settings. It also gives you options for customizing how the data appears in an Entity. To import a delimited source file into Oracle Data Quality for Data Integrator:

1. Copy the flat file that you want to import into Oracle Data Quality for Data

Integrator into the data directory that you specified when you defined the Loader Connection.

2. Click on the Windows Start menu and select All Programs Oracle Oracle Data

Profiling and Quality Oracle Data Profiling and Quality. 3. Log in the user interface with your metabase user. The Oracle Data Profiling and Quality user interface opens 4. From the Main menu, select Analysis Create Entity…

5. The Create Entity wizard opens in the upper right pane.

6. On the Connection Page of the Create Entity wizard, select the Loader Connection

given to you by the administrator that you have checked in Step 1.

7. Leave the default settings for the filter and the connection and click Next.

8. Oracle Data Quality

connects to the data source using the Loader Connection you selected in Step 4. If the connection fails, contact your Metabase Administrator

9. In the Entity Selection dialog, select the data source file name you want to import

in the list and click Next. Note: If you are a Metabase User with full Metabase privileges, you can create a new Loader Connection. 16-4 Oracle Fusion Middleware Developers Guide for Oracle Data Integrator 10. Select the schema settings for the selected data file corresponding to the parameters of the file described in the section Section 16.2.1, Create a Quality Input File ■ Delimiter : , comma ■ Quote : double quotation marks ■ Attribute information : Names on first line ■ Select Records are CRLF terminated. ■ Character encoding : ascii For more information on configuring Entities for delimited files, see the Online Help for Oracle Data Profiling and Oracle Data Quality.

11. After you select the schema settings, click Preview. The Preview mode shows how

the data will appear in the Entity, based on your selected schema settings. The data displays below in a list view. Use the Preview mode to customize how the data will appear in the new Entity.

12. When you are ready to continue, click Close.

13. Click Next. The Load Parameters dialog opens. Specify the parameters as follows:

■ Select All Rows. ■ Leave the default Job name. 14. Click Next to continue. 15. In Confirm Settings, review the list of settings and click Finish to schedule the Entity creation job. The Schedule Job window opens.

16. Click Run Now.

16.2.2.3 Step 3: Verify Entity

During the data import process, Oracle Data Quality for Data Integrator translates your data files into three basic components Metabase objects: Entities, Attributes, and Rows. Perform the following list of verification tasks to ensure that the data you expected has been successfully imported to a Metabase and are correctly represented in the Metabase Explorer.

1. Make sure that for every data file imported you have one corresponding Entity.

2. Make sure that the column names do not contain any special characters with the

exception of underscore _ or minus sign - characters. Minus signs and underscores will be translated into spaces during the data load process.

3. Make sure that for every field imported you have one corresponding Attribute.

4. Make sure that you have one Entity Row for every data row imported.

Note: If the file is generated using Oracle Data Integrator These file format parameters should correspond to the file format specified in the Files tab of the datastore definition. Working with Oracle Data Quality Products 16-5

16.2.3 Create a Profiling Project

You can now run a Data Profiling Project with Oracle Data Profiling to find quality problems. Profiling discovers and analyzes the quality of your enterprise data. It analyzes data at the most detailed levels to identify data anomalies, broken filters and data rules, misaligned data relationships, and other concerns that allow data to undermine your business objectives. For more information on Data Profiling see Working with Oracle Data Profiling in the Online Help for Oracle Data Profiling and Oracle Data Quality.

16.2.4 Create a Oracle Data Quality Project

You can now create an Oracle Data Quality Project to validate and transform your data, and resolve data issues such as mismatching and redundancy. Oracle Data Quality for Data Integrator is a powerful tool for repairing and correcting fields, values and records across multiple business contexts and applications, including data with country-specific origins. Oracle Data Quality for Data Integrator enables data processing for standardization, cleansing and enrichment, tuning capabilities for customization, and the ability to view your results in real-time. A Quality Project cleanses input files and loads cleansed data into output files. At the end of your Oracle Data Quality project this input file may be split into several output files, depending on the data Quality project. Important Note: A Data Quality project contains many temporary entities, some of them not useful in the integration process. To limit the Entities reversed-engineered for usage by Oracle Integrator, a filter based on entities name can be used. To use this filter efficiently, it is recommended that you rename in your quality project the entities that you want to use in Oracle Data Integrator in a consistent way. For example rename the entities ODI_IN_XXX and the output and no-match files ODI_OUT_XXX , where XXX is the name of the entity. For more information on Data Quality projects see Working with Oracle Data Quality in the Online Help for Oracle Data Profiling and Oracle Data Quality.

16.2.5 Export the Data Quality Project

Oracle Data Integrator is able to run projects exported from Oracle Data Quality. Once the Data Quality project is complete, you need to export it for Oracle Data Integrator. The exported project contains the data files, Data Dictionary Language DDL files, settings files, output and statistics files, user-defined tables and scripts for each process module you in the project. An exported project can be run on UNIX or Windows platforms without the user interface, and only requires the Oracle Data Quality Server. To create a batch script: 1. In the Explorer or Project Workflow, right-click the Oracle Data Quality project and select Export... ODQ Batch Project No data. 2. In Browse for Folder, select or make a folder where you want the project to be exported.

3. Click OK. A message window appears indicating that the files are being copied.

This export process creates a folder named after the metabase metabase_ name at the location that you specified. This folder contains a projectN sub-folder where N is the project identifier in Oracle Data Quality. This project folder contains the following folders among others: