Table | Convertigo Documentation

Extracts data from a web page in an XML table.

The Table extraction rule helps you extract data into a table structure.
Extracted data are organized into an XML table structure made of:

• a parent base element “corresponding to” the base HTML element containing data to extract. In most cases, this element is an HTML table element.
• child elements “corresponding to” recurring row HTML part. In most cases, if root is an HTML table element, each tr element is assumed to be a row of data.
• rows child elements “corresponding to” recurring column HTML part. In most cases, within a tr row, each td element is assumed to be a cell.

The rule is applied if the result of the table XPath expression evaluation exists in the HTML page DOM.
Based on this root, the child elements are also defined by Xpath expressions. Each Xpath expression may be relative to its parent element Xpath expression, using the following syntax: “./”.
By default, a row XPath expression is .//TR. You can add restrictions in the XPath expression, for example .//TR[position() > 1], meaning that each tr element within the table is a row except the first one.
Columns are defined relatively to their parent row. By default, a column Xpath expression is .//TD.
The resulting table element is appended to the HTML transaction DOM as follows:
<table_tagname referer="referer_url">
<row_tagname>
<column1_tagname>extracted text from xpath</column1_tagname>
<column2_tagname>extracted text from xpath</column2_tagname>
</row_tagname>
<row_tagname>
<column1_tagname>extracted text from xpath</column1_tagname>
<column2_tagname>extracted text from xpath</column2_tagname>
</row_tagname>
</table_tagname>

Property	Type	Category	Description
Accumulate data in same table	boolean	configuration	Accumulates every data from several screens in the same table XML element.
Comment	String	configuration	Describes the object comment to include in the documentation report. This property generally contains an explanation about the object.
Display referer	boolean	configuration	Defines whether the referer URL is displayed in the output XML element. If this property is set to true, the referer URL is added as an attribute, named referer, to the XML element added by the extraction rule.
Flip table orientation	boolean	configuration	Flips the table orientation, turning lines into columns and columns into lines.
Is active	boolean	configuration	Defines whether the extraction rule is active.
Tag name	String	configuration	Defines the table tag name in the resulting XML (default tag name is XMLTable).
Description	XMLVector	selection	Describes the table structure in which data are extracted. The table is structured as a root element containing row and column child elements, which are defined through Description property. This property is a list of child elements descriptions, also named row descriptions. Each row description is composed of two properties: • Row tag name: Row element tag name in resulting DOM (the default tag name is row). All resulting nodes described by the following row XPath are tagged after this name. • Row XPath: XPath expression selecting row elements . It is often defined relatively to parent Table extraction rule XPath expression. The XPath can result in several nodes (ex .//TR) meaning that several rows are being extracted. Each row description contains a list of child elements descriptions, also named column descriptions. Each column description is composed of the following fields: • Column tag name: Columns element tag name in resulting DOM (the default name is column). • Column XPath: XPath expression selecting column elements (data to extract). It is often defined relatively to parent row Xpath expression using the following syntax: “./”. • Extract children: Indicates whether text extraction should recurse on child elements of the elements found thanks to the Xpath (by default it is set to true). As it needs more CPU if set to “true”, it is then recommended to customize your XPath (using //text() function for example) and set this property to false.
XPath	String	selection	Defines the Xpath expression of nodes on which the extraction rule applies. Depending on the extraction rule, the execution of this Xpath on the web page DOM can result in a single Node or a NodeList.