Kofax Getting Started with Ascent Xtrata Pro User Manual

Page 402

Advertising
background image

Project Builder User Interface

Ascent Xtrata Pro User's Guide

383

Min. word length
All words that are shorter than this value are ignored during text filtering.

Training

Max. number of features
Limits the maximum number of internally generated features per class.

Min. feature length
Specifies the minimum number of characters that should be used for a feature.
This value cannot be smaller than the “Min. word length.” If this value is greater
than the minimum word length, then some words allowed by the text filter will
not be used for training.

Max. feature length
Specifies the maximum number of characters that should be used for a feature.
Should not be larger than 64 characters.

Min. feature frequency
Specifies how often a substring must appear inside the training set of a class for it
to be used as a feature for content classification.

Start features at beginning of word
If selected, a feature must always start at the beginning of a word. If not selected,
a feature can start anywhere.

Max. words per feature
Limits the number of words per feature. A value of zero means unlimited words,
although the total number of characters in the feature cannot exceed the “Max.
feature length” property.

Use fuzzy string match
Enables matching fuzziness with the disadvantage of slower classification
performance.

Fuzzy length
Used to indicate the acceptable degree of fuzziness.

Min. class entropy
This value controls the importance of a feature, depending on the number of
classes in which it appears. A value of 1.0 requires that a feature only appears
inside the sample documents of a single class; otherwise, it is not used for
classification. The lower the value, the more a feature can bleed to other classes
within the training set.

Advertising