- Python Natural Language Processing
- Jalaj Thanaki
- 485字
- 2021-07-15 17:02:00
Basic flags
The basic flags are I, L, M, S, U, X:
- re.I: This flag is used for ignoring casing
- re.M: This flag is useful if you want to find patterns throughout multiple lines
- re.L: This flag is used to find a local dependent
- re.S: This flag is used to find dot matches
- re.U: This flag is used to work for unicode data
- re.X: This flag is used for writing regex in a more readable format
We have mainly used re.I, re.M, re.L, and re.U flags.
We are using the re.match() and re.search() functions. Both are used to find the patterns and then you can process them according to the requirements of your application.
Let's look at the differences between re.match() and re.search():
- re.match(): This checks for a match of the string only at the beginning of the string. So, if it finds the pattern at the beginning of the input string then it returns the matched pattern, otherwise; it returns a noun.
- re.search(): This checks for a match of the string anywhere in the string. It finds all the occurrences of the pattern in the given input string or data.
Refer to the code snippet given in Figure 4.13:
The output of the code snippet of Figure 4.13 is given in Figure 4.14:
The syntax is as follows:
Find the single occurrence of character a and b:
Regex: [ab]
Find characters except a and b:
Regex: [^ab]
Find the character range of a to z:
Regex: [a-z]
Find range except to z:
Regex: [^a-z]
Find all the characters a to z as well as A to Z:
Regex: [a-zA-Z]
Any single character:
Regex: .
Any whitespace character:
Regex: \s
Any non-whitespace character:
Regex: \S
Any digit:
Regex: \d
Any non-digit:
Regex: \D
Any non-words:
Regex: \W
Any words:
Regex: \w
Either match a or b:
Regex: (a|b)
Occurrence of a is either zero or one:
Regex: a? ; ? Matches zero or one occurrence not more than 1 occurrence
Occurrence of a is zero time or more than that:
Regex: a* ; * matches zero or more than that
Occurrence of a is one time or more than that:
Regex: a+ ; + matches occurrences one or more that one time
Exactly match three occurrences of a:
Regex: a{3}
Match simultaneous occurrences of a with 3 or more than 3:
Regex: a{3,}
Match simultaneous occurrences of a between 3 to 6:
Regex: a{3,6}
Starting of the string:
Regex: ^
Ending of the string:
Regex: $
Match word boundary:
Regex: \b
Non-word boundary:
Regex: \B
The basic code snippet is given in Figure 4.15:
The output of the code snippet of Figure 4.15 is given in Figure 4.16: