Data Collection and Processing

8-8 Application Administrators Guide for Content Server ■ Data Reduction on page 8-11 ■ Data Output on page 8-15 ■ Tracking Limitations on page 8-19

8.3.1 Data Collection and Processing

Depending on how Content Tracker is configured, it can collect event information such as dynamic and static content accesses and service calls. Several mechanisms are used to collect the data. ■ Service Handler Filter : Examines Content Server service requests and writes certain details from them directly to the SctAccessLog table in real time. Only services listed in the SctServiceFilter.hda file are logged. ■ Web Server Filter : Collects data values from static URLs and logs them in raw data files. ■ Content Tracker Logging Service : Used to log event information generated by a suitably configured application. This section covers the following topics: ■ Standard Data Reduction Process on page 8-8 ■ Data Reduction Process with Activity Metrics on page 8-9

8.3.1.1 Standard Data Reduction Process

During the data reduction process, the static URL information is extracted from the raw data files see Content Tracker Event Logs on page 8-11 and combined with the service information already stored in the SctAccessLog table see Combined Output Table on page 8-12. Depending on how Content Tracker is configured, this reduction process can: ■ Combine access information for static URL content access with service details. ■ Summarize information about user accounts that were active during the reporting period. This information is rolled up and written to the Content Trackers user metadata database tables. See Data Output on page 8-15 for details. Note: By default, Content Tracker collects and records data only for the SctAccessLog table. Although the user data output tables exist, Content Tracker does not populate them. See Performance Optimization Functions on page 8-1. Managing Content Tracker 8-9

8.3.1.2 Data Reduction Process with Activity Metrics

Content Tracker provides the option to selectively generate search relevancy data and store it in custom metadata fields. The snapshot function enables you to choose which activity metrics to activate. The logged data provides content item usage information that indicates the popularity of content items. If you activate the snapshot function and activity metrics, the values in the custom metadata fields are updated following the reduction processing phase. When users access content items, the values of the applicable search relevance metadata fields change accordingly. Then, during the subsequent post-reduction step, Content Tracker uses applicable SQL queries to determine which content items were accessed during the reporting period. Content Tracker updates the applicable database table metadata fields with the new values and initiates a re-indexing cycle. However, only the content items whose access count metadata values have changed are re-indexed. For more information about the snapshot function, the user interface screen, and activating the activity metrics, see Snapshot Tab on page A-158. For more information about the activity metrics SQL queries and how to customize them, see Activity Metrics SQL Queries on page 8-87. The post-reduction processing step is necessary to: ■ Process and tabulate the activity metrics for each affected content item and load the data into the assigned custom metadata fields. ■ Initiate a re-indexing cycle on the content items with changed activity metrics values. This ensures that the data is part of the search index and, consequently, accessible for selecting and ordering search results. Note: By default, Content Tracker collects and records data only for the SctAccessLog table. Although the user data output tables exist, Content Tracker does not populate them unless the Snapshot function is activated. However, using the snapshot function will affect Content Tracker’s performance. See Performance Optimization Functions on page 8-1 for more information. 8-10 Application Administrators Guide for Content Server

8.3.2 Data Collection