Format locator, Concept – Kofax Getting Started with Ascent Xtrata Pro User Manual

Page 179

Advertising
background image

Chapter 4

160

Ascent Xtrata Pro User's Guide

Select appropriate fields: In the project’s database settings, select only the

fields that are present on the documents. For example, your internal customer
ID will usually not be used on the customer’s correspondence.

Load database to memory: Use the “Load database to memory” option if

enough memory is available. By default, the database is loaded to memory.

Format Locator

The following sections describe the concept of the Format Locator and show how to
add and set up the locator.

Concept

The Format Locator is a rules-based locator that works with regular expressions and
keywords. Regular expressions are used to describe the format of structured data in a
general way. For example, “\d” means all single digits, “\d{4-8}” means any
number from 4 and 8 digits in length.

You can specify several formats for one locator to make it more flexible. The locator
searches the document and collects all items that match all the format definitions.
Depending on whether a format matches an entire word or just part of it, a first
evaluation is made. For the calculation of the final confidence, the keywords are
taken into account.

For example, you can specify a few formats to match a variety of dates and define a
few keywords for them, such as “Date”, “Invoice Date” and “Billing Date”. The
locator will then evaluate the relationships between the keywords and the format
matches and associate a confidence with each match. The evaluated matches are
stored as alternatives in the locator. The alternative with the highest confidence will
be assigned to the field if the field settings permit that. The other alternatives are still
available in the locator and can be further used in the script.

You can reuse the result of a format locator. For example, when you search for an
invoice date or an order date on the document, you can define a format locator that
searches for dates and then format locators that have different keywords. For
example one format locator that has “Order Date” as a keyword and another with
“Invoice Date.”

You can also use OCR Substitution for formats. Typical OCR errors like O instead of
0 or B instead of 8 can be defined in the project settings and assigned to a Format
Locator.

Advertising