Regular expression vs. wildcard match pattern, Word boundary, Case sensitivity – Fortinet 100A User Manual

Page 336

Advertising
background image

336

01-28007-0068-20041203

Fortinet Inc.

Configuring the banned word list

Spam filter

Regular expression vs. wildcard match pattern

In Perl regular expressions, ‘.’ character refers to any single character. It is similar to
the ‘?’ character in wildcard match pattern. As a result:

• fortinet.com not only matches fortinet.com but also matches fortinetacom,

fortinetbcom, fortinetccom and so on.

To match a special character such as '.' and ‘*’ use the escape character ‘\’. For
example:

• To mach fortinet.com, the regular expression should be: fortinet\.com

In Perl regular expressions, ‘*’ means match 0 or more times of the character before it,
not 0 or more times of any character. For example:

• forti*\.com matches fortiiii.com but does not match fortinet.com

To match any character 0 or more times, use ‘.*’ where ‘.’ means any character and
the ‘*’ means 0 or more times. For example, the wildcard match pattern forti*.com
should therefore be fort.*\.com.

Word boundary

In Perl regular expressions, the pattern does not have an implicit word boundary. For
example, the regular expression “test” not only matches the word “test” but also
matches any word that contains the “test” such as “atest”, “mytest”, “testimony”,
“atestb”. The notation “\b” specifies the word boundary. To match exactly the word
“test”, the expression should be \btest\b.

Case sensitivity

Regular expression pattern matching is case sensitive in the Web and Spam filters. To
make a word or phrase case insensitive, use the regular expression /i For example,
/bad language/i will block all instances of “bad language” regardless of case.

Table 30: Perl regular expression formats

Expression

Matches

abc

abc (that exact character sequence, but anywhere in the string)

^abc

abc at the beginning of the string

abc$

abc at the end of the string

a|b

either of a and b

^abc|abc$

the string abc at the beginning or at the end of the string

ab{2,4}c

an a followed by two, three or four b's followed by a c

ab{2,}c

an a followed by at least two b's followed by a c

ab*c

an a followed by any number (zero or more) of b's followed by a c

ab+c

an a followed by one or more b's followed by a c

ab?c

an a followed by an optional b followed by a c; that is, either abc or ac

a.c

an a followed by any single character (not newline) followed by a c

a\.c

a.c exactly

[abc]

any one of a, b and c

Advertising