Character class: [..] and, Matching line boundaries, Repetition: @ and – Crunch CRiSP File Editor 6 User Manual

Page 49

Advertising
background image

Page 49

For example, the following expression:

cat*dog

matches any line which contains the word cat followed by somewhere else on the line, the word dog.

Character Class: [..] and ..

The square bracket operators are used to match one or more characters from a class of characters. If the
expression is of the form '[..]' then a match is successful if the character being matched is any of the
characters within the square brackets. If the first character after the '[' is either a '^' or '~', then the match is
successful if the character is NOT equal to any of the characters in the matched class.

The characters within the square brackets form either an enumeration or a range of characters. '[ABC]' is an
example of an enumeration. It matches the single character 'A', or 'B', or 'C'.

'[a-z]' is an example of a range. It matches any lower case alphabetic character.

Ranges and enumerations may be combined, for example the following may be used to match a C symbol:

[_A-Za-z][_A-Za-z0-9]@

which defines a regular expression expression consisting of a single character of '_', an upper or lower case
alphabetic, followed by zero or more characters from the class '_', A-Z, a-z or 0-9.

Special characters may be enclosed in the character class construct using the \ syntax. For example, \n
matches a new-line; \t matches a tab.

The characters -, and ] may be included in the class by preceding them with a backslash (e.g. \- or \]).

The regular expression characters \< and \> can be used as word delimiters. The \< sequence matches
either a beginning of line or any non-word character. The \> sequence matches either the end of line or any
non-word character. A word-character is defined as any of: [A-Za-z0-9_]. These two regular expressions can
be used as a short hand way of finding a word without matching the word embedded in a larger word, e.g.
\<begin\> matches the word begin but will not match the word inside beginning for example.

Matching Line boundaries

CRiSP allows regular expressions to match text which spans line boundaries. Normally this is not the case.
For example, a Unix regular expression of the form: 'a.*b' means match an 'a' followed by any number of
characters followed by a 'b'. In this example, the letters 'a' and 'b' are constrained to be on the same line, i.e.
the regular expression will not span over multiple lines.

The regular expression sequence '\n' allows a match with the newline at the end of each line to succeed. For
example the regular expression: 'fred\nharry' will match the string 'fred' at the end of a line, the newline after
fred and the string 'harry' at the beginning of the next line.

The newline matching character can be used inside the character class operator, e.g. [\n] and inside more
complicated regular expressions. For example, two match all lines inside the body of a C function can be
achieved with a regular expression of the following form:

^\{.*\n\(.*\n\)*\}

Repetition: @ and +

The @ and + are used to indefinitely match a previously specified pattern. A simple regular expression
followed by '@' will be matched zero or more times; an SRE followed by '+' will be matched one or more
times.

For example, the following regular expression can be used to match a sequence of words followed by a
comma (e.g. a sub-phrase of a sentence):

{[A-Za-z]+[ ]+}+,

[A-Za-z]+ matches any word of one or more alphabetic characters; the [ ]+ matches one or more spaces
between each word. The final }+ sequence means repeat the previous expression one or more times.

The following example shows how to match the last word of one sentence and the first word of the following
sentence:

[A-Za-z]+.[ ]@[A-Z]

Advertising