Thresholds, precision, and recall – Kofax Getting Started with Ascent Xtrata Pro User Manual

Page 108

Advertising
background image

Classification

Ascent Xtrata Pro User's Guide

89

Thresholds, Precision, and Recall

The overall quality of the classification process can be expressed by precision and
recall. The classification of a document, when compared with a reference set, can lead
to one of three results:

• Correct classification
• Incorrect classification (also known as a false positive or substitution)
• No classification (or rejects)

A threshold allows for the suppression of all classification results below a certain
confidence level. The confidence is the degree of concurrence between the document
and any chosen class.

Two types of thresholds can be defined:

Absolute threshold: Absolute value (expressed as a percentage) indicating

the minimum necessary concurrence of a document with a class for the result
to be accepted. A classification process might return a confidence of 73% as a
best result, which will be accepted as a final result if the threshold setting is
73% or lower; otherwise, the result will be rejected and the document will be
left unclassified – unless there is a default class.

Relative distance: If more than one class is defined, it might be desirable to

specify a minimum difference between the best result and the next best result
that must be satisfied in order to get a unique classification result. If multiple
results are acceptable, a relative distance is not needed. However, Ascent
Xtrata Pro is designed with the goal of determining a unique class as the
classification result.

The relative distance defines the minimum required difference between the
confidences of the best result and the second best result for the class to be
accepted as the classification result. For example, a classification process
might return confidences of 73% for the best class and 62% for the second best
class. If the required relative distance is set to 11% or smaller, the result will
be accepted (if the absolute threshold criteria is also fulfilled); otherwise, it
will be rejected and the document will be left unclassified – unless there is a
default class.

Precision is the percentage of all correctly classified documents versus all classified
documents. Recall is the percentage of documents that have been correctly classified
versus documents that should been classified.

Figure --20 shows the relationship between precision and recall for a single class.

Advertising