Generating Interface INOUT Oracle Fusion Middleware Online Documentation Library

7 Working with Changed Data Capture 7-1 7 Working with Changed Data Capture This chapter describes how to use Oracle Data Integrator’s Changed Data Capture feature to detect changes occurring on the data and only process these changes in the integration flows. This chapter includes the following sections: ■ Section 7.1, Introduction to Changed Data Capture ■ Section 7.2, Setting up Journalizing ■ Section 7.3, Using Changed Data

7.1 Introduction to Changed Data Capture

Changed Data Capture CDC allows Oracle Data Integrator to track changes in source data caused by other applications. When running integration interfaces, thanks to CDC, Oracle Data Integrator can avoid processing unchanged data in the flow. Reducing the source data flow to only changed data is useful in many contexts, such as data synchronization and replication. It is essential when setting up an event-oriented architecture for integration. In such an architecture, applications make changes in the data Customer Deletion, New Purchase Order during a business process. These changes are captured by Oracle Data Integrator and transformed into events that are propagated throughout the information system. Changed Data Capture is performed by journalizing models. Journalizing a model consists of setting up the infrastructure to capture the changes inserts, updates and deletes made to the records of this models datastores. Oracle Data Integrator supports two journalizing modes: ■ Simple Journalizing tracks changes in individual datastores in a model. ■ Consistent Set Journalizing tracks changes to a group of the models datastores, taking into account the referential integrity between these datastores. The group of datastores journalized in this mode is called a Consistent Set.

7.1.1 The Journalizing Components

The journalizing components are: ■ Journals : Where changes are recorded. Journals only contain references to the changed records along with the type of changes insertupdate, delete. ■ Capture processes : Journalizing captures the changes in the source datastores either by creating triggers on the data tables, or by using database-specific programs to retrieve log data from data server log files. See the Oracle Fusion 7-2 Oracle Fusion Middleware Developers Guide for Oracle Data Integrator Middleware Connectivity and Knowledge Modules Guide for Oracle Data Integrator for more information on the capture processes available for the technology you are using. ■ Subscribers : CDC uses a publishsubscribe model. Subscribers are entities applications, integration processes, etc. that use the changes tracked on a datastore or on a consistent set. They subscribe to a models CDC to have the changes tracked for them. Changes are captured only if there is at least one subscriber to the changes. When all subscribers have consumed the captured changes, these changes are discarded from the journals. ■ Journalizing views: Provide access to the changes and the changed data captured. They are used by the user to view the changes captured, and by integration processes to retrieve the changed data. These components are implemented in the journalizing infrastructure.

7.1.2 Simple vs. Consistent Set Journalizing

Simple Journalizing enables you to journalize one or more datastores. Each journalized datastore is treated separately when capturing the changes. This approach has a limitation, illustrated in the following example: You want to process changes in the ORDER and ORDER_LINE datastores with a referential integrity constraint based on the fact that an ORDER_LINE record should have an associated ORDER record. If you have captured insertions into ORDER_LINE, you have no guarantee that the associated new records in ORDERS have also been captured. Processing ORDER_LINE records with no associated ORDER records may cause referential constraint violations in the integration process. Consistent Set Journalizing provides the guarantee that when you have an ORDER_ LINE change captured, the associated ORDER change has been also captured, and vice versa. Note that consistent set journalizing guarantees the consistency of the captured changes. The set of available changes for which consistency is guaranteed is called the Consistency Window . Changes in this window should be processed in the correct sequence ORDER followed by ORDER_LINE by designing and sequencing integration interfaces into packages. Although consistent set journalizing is more powerful, it is also more difficult to set up. It should be used when referential integrity constraints need to be ensured when capturing the data changes. For performance reasons, consistent set journalizing is also recommended when a large number of subscribers are required. It is not possible to journalize a model or datastores within a model using both consistent set and simple journalizing.

7.2 Setting up Journalizing

This section explains how to set up and start the journalizing infrastructure, and check that this infrastructure is running correctly. It also details the components of this infrastructure.

7.2.1 Setting up and Starting Journalizing

The basic process for setting up CDC on an Oracle Data Integrator data model is as follows: ■ Set the CDC parameters in the data model Working with Changed Data Capture 7-3 ■ Add the datastores to the CDC ■ For consistent set journalizing, set the datastores order ■ Add subscribers ■ Start the journals Set the CDC parameters Setting up the CDC parameters is performed on a data model. This consists of selecting or changing the journalizing mode and journalizing Knowledge Module used for the model. To set up the CDC parameters:

1. In the Models tree in the Designer Navigator, select the model that you want to

journalize. 2. Double-click this model to edit it.

3. In the Journalizing tab, select the journalizing mode you want to use: Consistent

Set or Simple. 4. Select the Journalizing Knowledge Module JKM you want to use for this model. Only Knowledge Modules suitable for the data models technology and journalizing mode, and that have been previously imported into at least one of your projects will appear in the list.

5. Set the Options for this KM. See the Oracle Fusion Middleware Connectivity and

Knowledge Modules Guide for Oracle Data Integrator for more information about this KM and its options.

6. From the File menu, select Save All.

Add or remove datastores for the CDC: You must flag the datastores that you want to journalize within the journalized model. A change in the datastore flag is taken into account the next time the journals are restarted. When flagging a model or a sub-model, all of the datastores contained in the model or sub-model are flagged. To add or remove datastores for the CDC: 1. Right-click the model, sub-model or datastore that you want to add toremove from the CDC in the Model tree in the Designer Navigator.

2. Right-click then select Changed Data Capture Add to CDC or Changed Data

Capture Remove from CDC to add to the CDC or remove from the CDC the selected datastore, or all datastores in the selected modelsub-model. The datastores added to CDC should now have a marker icon. The journal icon represents a small clock. It should be yellow, indicating that the journal infrastructure is not yet in place. Note: If the model is already being journalized, it is recommended that you stop journalizing with the existing configuration before modifying the data model journalizing parameters. 7-4 Oracle Fusion Middleware Developers Guide for Oracle Data Integrator Set the datastores order consistent set journalizing only: You only need to arrange the datastores in order when using consistent set journalizing. You should arrange the datastores in the consistent set in an order which preserves referential integrity when using their changed data. For example, if an ORDER table has references imported from an ORDER_LINE datastore i.e. ORDER_ LINE has a foreign key constraint that references ORDER, and both are added to the CDC, the ORDER datastore should come before ORDER_LINE. If the PRODUCT datastore has references imported from both ORDER and ORDER_LINE i.e. both ORDER and ORDER_LINE have foreign key constraints to the PRODUCT table, its order should be lower still. To set the datastores order:

1. In the Models tree in the Designer Navigator, select the model journalized in

consistent set mode. 2. Double-click this model to edit it.

3. Go to the Journalized Tables tab.

4. If the datastores are not currently in any particular order, click the Reorganize

button. This feature suggests an order for the journalized datastores based on the foreign keys defined in the model. Review the order suggested and edit the datastores order if needed. 5. Select a datastore from the list, then use the Up and Down buttons to move it within the list. You can also directly edit the Order value for this datastore. 6. Repeat the previous step until the datastores are ordered correctly.

7. From the File menu, select Save All.

Add or remove subscribers: Each subscriber consumes in a separate thread changes that occur on individual datastores for Simple Journalizing or on a model for Consistent Set Journalizing. Adding or removing a subscriber registers it to the CDC infrastructure in order to trap changes for it. To add subscribers: Note: It is possible to add datastores to the CDC after the journal creation phase. In this case, the journals should be re-started. If a datastore with journals running is removed from the CDC in simple mode, the journals should be stopped for this individual datastore. If a datastore is removed from CDC in Consistent Set mode, the journals should be restarted for the model Journalizing information is preserved for the other datastores. Note: Changes to the order of datastores are taken into account the next time the journals are restarted. If existing scenarios consume changes from this CDC set, you should regenerate them to take into account the new organization of the CDC set.