XML Conversion About Content Categorizer
9.1.2 XML Conversion
For Content Categorizer to recognize structural properties, the content must be converted to XML eXtensible Markup Language. The conversion method is a user-defined runtime configuration setting. The variable name is sccXMLConversion and its possible values include Flexiondoc, SearchML and None. The None value is used for files that are already XML. The CC_Sample directory that was installed with Content Categorizer includes a sample source file named Wellington_WordStyle.doc that is artificially rich with document properties and styles. The directory also contains sample XML files Wellington_WordStyle_flexion.xml and Wellington_WordStyle_searchml.xml that demonstrate the XML that results when the source document is converted by each of the available XML converters. Regardless of which XML converter is specified, the XML intermediate files are used only by Content Categorizer, so they are discarded after use, and documents are Important: It is important to instruct contributors to leave blank any fields they want to have filled by search rules. Important: There is a problem with the XSLT transformation used to post-process PDF content converted using the Flexiondoc schema. When Flexiondoc schema are used, single words are assigned to individual XML elements, making the final XML unusable. It is therefore necessary to use SearchML for categorizing PDF content. Managing Content Categorizer 9-3 checked into Content Server in their original source form. The only exception is content that is already in XML format, which is not subjected to the translation process. This section covers the following topics: ■ Flexiondoc XML Converter on page 9-3 ■ SearchML XML Converter on page 9-39.1.2.1 Flexiondoc XML Converter
The OutsideIn XML Export technology is used in combination with a custom XSLT style sheet flexiondoc_to_scc.xsl to produce XML in a two-stage process. In the first stage, the native document is converted to Flexiondoc-formatted XML. In the second stage, the style sheet is used to further refine the XML so that it is searchable by Content Categorizer. Native document properties and text segments are isolated in XML elements, which are named after the corresponding document property, paragraph style, or character style.9.1.2.2 SearchML XML Converter
The OutsideIn technology is used in combination with a custom XSLT style sheet searchml_to_scc.xsl to produce XML in a two-stage process. In the first stage, the native document is converted to SearchML-formatted XML. In the second stage, the style sheet is used to further refine the XML so that it is searchable by Content Categorizer. Native document properties and text segments are isolated in XML elements, which are named after the corresponding document property or paragraph style. Character styles are not supported by SearchML.9.1.3 Operating Requirements
Parts
» Oracle Fusion Middleware Online Documentation Library
» About Native File Conversion Identifying MIME Types
» About Custom Fields Managing Content Fields
» Changing the Default Sort Order of the Opening Query
» About Managed Links Managing Linked Content with Link Manager
» Configuring Link Manager Managing Linked Content with Link Manager
» Managing Links Managing Linked Content with Link Manager
» Link Manager Database Tables
» Link Manager Filters Managing Linked Content with Link Manager
» Site Studio Integration Managing Linked Content with Link Manager
» Click Go located next to the Abort refresh activity option. The refresh activity
» About DCLs and Metadata Schemas
» About Content Profiles Using Profiles to Customize Content Screens
» Content Profile Rules Using Profiles to Customize Content Screens
» Click OK. Oracle Fusion Middleware Online Documentation Library
» Select the Is global rule with priority check box. You can optionally change the
» On the Conditions tab, click Add. Click OK. Click OK. Click OK. Click OK.
» Workflow Overview Introduction to Workflows
» Workflow Steps Introduction to Workflows
» Workflow Step Evaluation Process
» Pre-design Questions Planning a Workflow
» Designing a Workflow Modifying Workflows
» Criteria Workflow Process Creating a Criteria Workflow
» Basic Workflow Process Creating a Basic Workflow
» Idoc Script Functions and Variables
» About Jumps Customizing Workflows
» Jump Variables and Steps Setting Up Jumps
» Jump Examples Customizing Workflows
» Scenario 1: Criteria Workflow Workflow Scenarios
» Scenario 2: Tokens Scenario 3: Jump Based on Metadata Scenario 4: Time-Dependent Jump
» Acquiring a Digital Signature Setting Up Parallel Workflows
» Adding Ad Hoc Step Users Customizing Criteria Workflow Emails
» Paste the following code into the entry event of your workflow step. Note that
» Other Customizations Workflow Tips and Tricks
» Searching Within a Workflow Step Suppressing Workflow Notifications
» Workflow Item Stuck in EDIT or GENWWW Status
» Workflow Item Entered in Wrong Workflow
» About PDF Watermark How PDF Watermark Works
» Folders Overview Usage Scenarios
» Folders Structure Naming Folders Folders Component Security
» Folder Metadata Inheritance Trash Bin Metadata Propagation
» Folder Content Item Revisions
» Folder Archiving Folder Searching URL-Mapped Folders
» About WebDAV What is WebDAV?
» WebDAV Clients WebDAV Architecture
» Preventing Folder Static Inheritance for Specific Metadata Fields
» Optimizing System Performance Click OK.
» Virtual Folders Configuring WebDAV
» Other Issues Click OK twice to save the settings and close the Internet Options screen.
» Adding Custom Viewers and Renderers
» Content Tracker Summary About Content Tracker Components and Functions
» Content Tracker Reports Summary Data Recording Overview
» Data Reduction Overview Data Reporting Overview Content Tracker Terminology
» General Limitations General Considerations
» Data Collection and Processing
» Data Collection Operational Overview
» Data Reduction Operational Overview
» Data Output Operational Overview
» Tracking Limitations Operational Overview
» Data Reduction Features Data Tracking Functions
» Activity Snapshots Data Tracking Functions
» Service Calls Data Tracking Functions
» Web Beacon Objects Data Tracking Functions
» Click the Snapshot tab. Click OK. Open the Content Tracker Administration page: Click OK.
» Oracle and DB2 Case Sensitivity Access Control Lists and Content Tracker Reports Secure Mode
» Pre-Defined Reports Report Generation
» Custom Reports External Report Generator
» User AuthenticationAuthorization and Auditing Site Studio Web Site Activity Reporting
» Security Checks and Query Results
» Accessing Drill Down Reports Accessing Reports from the Information Page
» Creating Custom Report Queries
» Creating Secure Report Queries Using an External Report Generator
» About the Service Call Configuration File
» About the Content Tracker Logging Service
» Managing Service Call Information
» Configuration Variables Configuration and Customization
» Activity Metrics SQL Queries
» Web Server Filter Debugging Support Java Code Debugging Support DataBinder Dump Facility
» Search Rules About Content Categorizer
» XML Conversion About Content Categorizer
» Configuration Variable Setting Up Content Categorizer
» Understanding Search Rules Search Rules
» Pattern Matching Search Rules
» Abstract Search Rules Search Rules
» Categorization Engine Search Rule Filetype Search Rule
» Translation Transformation Using XSLT Stylesheets
» SearchML Transformation XSLT Transformation
» Flexiondoc Transformation XSLT Transformation
Show more