Step 2: configure text classification, Crea – Kofax INDICIUS 6.0 User Manual

Page 99

Advertising
background image

Configuration

Getting Started Guide (Classification and Separation)

89

5

Click Add.

The configuration will be added to the Configurations list on the Project
Explorer panel.

Step 2: Configure Text Classification

Build Page Text Classifier

The classifier is created on the Build Page Text Classifier tab, where training options
are selected before the build process is started. Typically the text classifier is trained
on the documents in the Sample Documents set (after it has been cleaned during
document set management).

It is possible to specify whether training is restricted to pages within documents that
have been confirmed, whether extra pages are trained on, and whether to further
limit which page types are trained within each document type. Any page types that
are not used for training page text classification need to be classified using an
alternative page classification method (for example, image or templated
classification).

Note

In order for a page type to be trained successfully, at least 50 examples of that

page type are required. A warning will display in the table if there are less than 50
examples of a page type within the document set.

X

To build a page text classifier

1

Select Configuration | Build Page Text Classifier into | Configuration “Page
Classification and Separation” to display the Build Page Text Classifier tab.

Sample Documents will be selected by default in the “Training Document
Set” list and the document types within the set will be listed in the table.

Note

Some warning triangles may display. Hover the mouse over a specific

triangle to display the warning. The warnings in the tutorial are due to not
having enough examples of some page types. If these warnings were seen on
a project, you would need to ask the customer for more examples of these
page types.

2

Within the table, select “None” in the “Train Using” column for the Header
document type as this will be classified by templated classification (in this
case, using a barcode).

Advertising