Chapter 10 - statistics viewer, Introduction, Statistics viewer – Kofax Getting Started with Ascent Xtrata Pro User Manual

Page 604: Chapter 10

Advertising
background image

Ascent Xtrata Pro User's Guide

585

Chapter 10

Statistics Viewer

Introduction

Document classification and extraction is a process that is not deterministically
constant but deals with varying input. Therefore, the results of this process also
depend on the input data, and, by definition, are not predetermined.

The quality of the process is defined by how accurately the document class is
assigned and items on the document are recognized. This quality is measured by two
values called Recall and Precision.

Precision is the percentage of all correctly classified documents versus all classified
documents. Recall is the percentage of documents that have been correctly classified
versus documents that should been classified. Incorrectly recognized items are called
substitutions. Items that are not recognized are called rejects. Obviously the goal of
the system is to optimize the process in a way that we have maximum recall with
maximum precision.

In the document recognition process, the rules and patterns that have been defined
and learned in the Ascent Xtrata Pro project are applied to unknown documents. The
quality of these rules and patterns determines the quality of the recognition results.
For example, if additional samples are trained for classification and extraction, the
recognition quality can be expected to also increase.

The Project Builder allows you to carry out a multitude of tests to evaluate the
recognition quality when configuring the project. In contrast, the statistics viewer
provides an overview over the quality of recognition for the current production run.
This allows the system administrator to monitor the quality and refine definitions
and trained patterns, especially in areas where recognition quality is below
expectations.

The statistical reports rely on three values that are automatically recorded for each
field during runtime (for statistical purposes, the classification result is also treated as
a field value):

Advertising