Extracts data from a web page in an XML record.

The Record extraction rule helps you extract a set of data from HTML text parts with identical and recurring presentation in a web page.
Extracted data are organized into a simple XML structure made of:

• a parent base element “corresponding to” the base recurring HTML elements containing data to extract, e.g a <RECORD>
• child elements “corresponding to” HTML text parts containing data, e.g <DATAT1>, <DATAT2>, etc.

The rule is applied if the result of the record Xpath expression evaluation exists in the HTML page DOM.
The resulting record elements are appended to the HTML transaction output DOM as follows:
<record_tagname referer="referer_url">
<data1_tagname>extracted text from data1 xpath</data1_tagname>
<data2_tagname>extracted text from data2 xpath</data2_tagname>
</record_tagname>
<record_tagname referer="referer_url">
<data1_tagname>extracted text from data1 xpath</data1_tagname>
<data2_tagname>extracted text from data2 xpath</data2_tagname>
</record_tagname>

Property Type Category Description
Comment String configuration Describes the object comment to include in the documentation report.
This property generally contains an explanation about the object.
Display referer boolean configuration Defines whether the referer URL is displayed in the output XML element.
If this property is set to true, the referer URL is added as an attribute, named referer, to the XML element added by the extraction rule.
Is active boolean configuration Defines whether the extraction rule is active.
Tag name String configuration Defines the record tag name in resulting DOM (default tag name is XMLRecord).
Description XMLVector selection Describes how to extract data into record child text elements.
The record is structured as a recurring element containing data, which are defined through Description property.
This property is a list of child elements descriptions, also named columns descriptions. Each column description is composed of the following fields:

• Name: Tag name of the child element (the default name is data).
• Extract children: Indicates whether text extraction should recurse on child elements of the elements found thanks to the Xpath (by default it is set to false). As it needs more CPU if set to “true”, it is then recommended to customize your XPath (using //text() function for example).
• XPath: XPath expression selecting child element data. It is often defined relatively to parent Record extraction rule Xpath expression using the following syntax: ./.
XPath String selection Defines the Xpath expression of nodes on which the extraction rule applies.
Depending on the extraction rule, the execution of this Xpath on the web page DOM can result in a single Node or a NodeList.