SearchML Transformation XSLT Transformation

9-24 Application Administrators Guide for Content Server

9.5.3 SearchML Transformation

When the OutsideIn XML Export filter translates content into SearchML XML format, it identifies the properties of the content item, such as title, subject, and author, and tags them as a doc_property element. It distinguishes the properties by a type attribute. It also identifies document text and tags it as a p element. It distinguishes styles within text by an s attribute. Document Properties and Text Style Examples For example, using the Wellington_WordStyle.doc example found in the IntradocDircustomContentCategorizerCC_Sample directory, the file’s author property, Duke of Wellington, is tagged in the SearchML XML output as: doc_property type=authorDuke of Wellingtondoc_property The first paragraph of the item, listing the date, would be tagged as: pDate: August 24, 1812p Note that no style attribute is defined. Applying the searchml_to_scc.xsl stylesheet to the translated XML file searches the XML for all doc_property tags and uses the type attribute as the suffix for the transformed output tag used as a key in a Content Categorizer rule. For example, the following code in the searchml_to_scc.xsl stylesheet would take the tag: doc_property type=authorDuke of Wellingtondoc_property and output scc_authorDuke of Wellingtonscc_author: xsl:template match=sml:doc_property[type] xsl:variable name=typeValue xsl:value-of select=type xsl:variable xsl:element name=scc_{translatetypeValue, translateFrom, translateTo} xsl:value-of select=. xsl:element xsl:template Similarly, the searchml_to_scc.xsl stylesheet also causes the XML file to be searched for all p tags and uses the s attribute as the suffix for the transformed output tag used as a key in a Content Categorizer rule. Where no style attribute is defined, the transformation passes the p tag through. Important: There is a problem with the XSLT transformation used to post-process PDF content that is output in Flexiondoc format. When Flexiondoc is used, single words are assigned to individual XML elements, making the final XML unsuitable for most Categorizer search rules. It is therefore recommended that you use SearchML for categorizing PDF content. Managing Content Categorizer 9-25

9.5.4 Flexiondoc Transformation