What is separation, Recognition – Kofax INDICIUS 6.0 User Manual

Page 12

Advertising
background image

Chapter 1

2

Error! No text of specified style in document.

Classification Methods

Classification can be done using one or more of the following methods:

ƒ

Image Classification: Classification based on the overall layout and structure
of a page, including lines, boxes, logos and placement of text.

ƒ

Text Classification: Classification based on detailed analysis of the text
content of a page or document.

ƒ

Rules-Based Classification: Classification performed by searching for specific
data or keywords, independent of layout.

ƒ

Templated Classification: Classification determined by the presence of one
or more marks, barcodes or items of text in pre-defined locations.

What is Separation?

Document separation methods provide an automated approach to identifying the
boundaries between multiple documents in a single batch.

Separation Methods

Document separation is determined from the page classification results using either
of the following methods:

ƒ

Rules-based document separation One or more rules specify when new
documents are created; for example, if a page of type A is seen, create a
document of type X.

ƒ

Advanced document separation A probabilistic method that ascertains the
most likely document structure from the page classifications and their
confidence scores. This method is robust to variation in documents and mis-
classifications due to its probabilistic nature.

Classification and Separation of Documents in Production

The Recognition and Document Review modules (along with Kofax Capture Scan)
are used to classify and separate documents.

Recognition

Classification and separation are done in the same processing step, in an instance of
the Recognition module. A single solution would do one of the following:

ƒ

Document Classification

Advertising