Kofax INDICIUS 6.0 User Manual

Page 64

Advertising
background image

Chapter 4

54

Getting Started Guide (Classification and Separation)

Figure 4-21. Project Explorer after Select Sample Documents

Step 5: Read Page Content

At this point you need to read (OCR) each page of the documents in your sample set.
Using these reads, Transformation Studio can help you analyze the documents with
the aim of finding any that are misclassified or poor quality. These reads will also be
used when you build text classifiers and configure additional classification methods.

Although all the documents in a project could be read, this is time consuming and
often unnecessary. Reading just the documents in the Sample Documents set is
sufficient (as these documents will be used for configuration and testing).

During the read, the status bar will display the number of the page being read and
the estimated time remaining. The read can be stopped at any point; no data will be
lost but you will need to read the remaining pages in order to continue to the next
step.

The read parameters used in the production configuration should match the
parameters used when reading the page content in Transformation Studio. When
you create a new configuration the default parameters will automatically match (the
parameters in the configuration resources folder are the same as those used by
default on the Read Page Content tab). However, if you are updating a production
configuration in which you have customized the full page read, you should use these
customized full page read parameters when reading page content in Transformation
Studio.

In addition, you should use custom read parameters if:

ƒ

You have non-English language documents.

Advertising