Classification, Classification hierarchy, Layout classification – Kofax Getting Started with Ascent Xtrata Pro User Manual

Page 27: Content classification

Advertising
background image

Chapter 1

8

Ascent Xtrata Pro User's Guide

Classification

Classification is the process of determining the category (class) of a document by
identifying its relevant characteristics. The features used for classifying a document
can be geometrical or textual. The Ascent Xtrata Pro classification engine can use
either of these characteristics to make the best determination.

Classification Hierarchy

In most organizations, the manual classification of documents follows a hierarchical
scheme. First, the main category of a document is determined and then classification
is refined and performed in greater detail over several steps until the final result (the
type of document) is obtained.

With Ascent Xtrata Pro you can replicate your legacy classification hierarchy when
using automatic classification, thereby ensuring familiar results. This type of
hierarchical evaluation is designed to traverse the full extent of the classification tree
defined for a project. Different classification methods can be used at each level of the
hierarchy. Extraction can be defined for any class in the tree and is inherited by any
sub nodes of that class.

Layout Classification

Layout classification uses the geometric structure of a document to classify it. This
structure is learned automatically from a single sample page that serves as a
prototype for the geometric analysis. If the class contains documents of several
distinct layouts, layout classification can be used to match new documents with the
appropriate class.

Typically, layout classification is used for identifying forms in a batch. But, it can
also be used for recognizing the sender of a letter, if the sender’s document layout is
unique. For example, this might be the case for formal letters and invoices.

Content Classification

Content classification uses the textual content of a document to classify it. This type
of classification is trained with several dozen sample documents per class. The
Adaptive Feature Classifier (AFC) automatically determines the features that are
relevant for a class. Because the AFC is fault tolerant and evaluates words as well as
other features, even information with OCR or typing errors can be used to correctly
classify a document. The sample documents are analyzed and a classification pattern
is automatically created for use during production.

Advertising