What is a Regular Expression?
A regular expression is a set of characters that specify a pattern. Regular expressions are used when you want to search for specify lines of text containing a particular pattern. Most of the UNIX utilities operate on ASCII files a line at a time. Regular expressions search for patterns on a single line, and not for patterns that start on one line and end on another.
It is simple to search for a specific word or string of characters. Almost every editor on every computer system can do this. Regular expressions are more powerful and flexible. You can search for words of a certain size. You can search for a word with four or more vowels that end with an "s." Numbers, punctuation characters, you name it, a regular expression can find it. What happens once the program you are using find it is another matter. Some just search for the pattern. Others print out the line containing the pattern. Editors can replace the string with a new pattern.
Parts Of Regular Expression
There are three important parts to a regular expression.
1) Anchors are used to specify the position of the pattern in relation to a line of text.
2) Character Sets match one or more characters in a single position.
3) Modifiers specify how many times the previous character set is repeated.
A simple example that demonstrates all three parts is the regular expression "^#*."
The up arrow is an anchor that indicates the beginning of the line. The character "#" is a simple character set that matches the single character "#." The asterisk is a modifier.
Pattern | Matches |
^A | "A" at the beginning of a line |
A$ | "A" at the end of a line |
A^ | "A^" anywhere on a line |
$A | "$A" anywhere on a line |
^^ | "^" at the beginning of a line |
$$ | "$" at the end of a line |
The use of "^" and "$" as indicators of the beginning or end of a line is a convention
Matching a character with a character set :
The simplest character set is a character. The regular expression "may" contains three character sets: "m," "a" and "y." It will match any line with the string "may" inside it. This would also match the word "mayur." To prevent this, put spaces before and after the pattern: " may ."
Match any character with . (DOT)
The character "." is one of those special meta-characters. By itself it will match any character, except the end-of-line character. The pattern that will match a line with a single characters is ^.$
Specifying a Range of Characters with [...]
If you want to match specific characters, you can use the square brackets to identify the exact characters you are searching for. The pattern that will match any line of text that contains exactly one number is
^[0123456789]$
. You can also use the hyphen between two characters to specify a range:
^[0-9]$
You can also have explicit characters with character ranges. This pattern will match a single character that is a letter, number, or underscore:
[A-Za-z0-9_]
Rules in Short.
Regular Expression | Class | Type | Meaning |
_ | | | |
. | all | Character Set | A single character (except newline) |
^ | all | Anchor | Beginning of line |
$ | all | Anchor | End of line |
[...] | all | Character Set | Range of characters |
* | all | Modifier | zero or more duplicates |
\< | Basic | Anchor | Beginning of word |
\> | Basic | Anchor | End of word |
\(..\) | Basic | Backreference | Remembers pattern |
\1..\9 | Basic | Reference | Recalls pattern |
_+ | Extended | Modifier | One or more duplicates |
? | Extended | Modifier | Zero or one duplicate |
\{M,N\} | Extended | Modifier | M to N Duplicates |
(...|...) | Extended | Anchor | Shows alteration |
No comments :
Post a Comment