书名：Python Natural Language Processing
作者名：Jalaj Thanaki
本章字数：485字
更新时间：2025-02-28 13:05:45

Basic flags

The basic flags are I, L, M, S, U, X:

re.I: This flag is used for ignoring casing
re.M: This flag is useful if you want to find patterns throughout multiple lines
re.L: This flag is used to find a local dependent
re.S: This flag is used to find dot matches
re.U: This flag is used to work for unicode data
re.X: This flag is used for writing regex in a more readable format

We have mainly used re.I, re.M, re.L, and re.U flags.

We are using the re.match() and re.search() functions. Both are used to find the patterns and then you can process them according to the requirements of your application.

Let's look at the differences between re.match() and re.search():

re.match(): This checks for a match of the string only at the beginning of the string. So, if it finds the pattern at the beginning of the input string then it returns the matched pattern, otherwise; it returns a noun.
re.search(): This checks for a match of the string anywhere in the string. It finds all the occurrences of the pattern in the given input string or data.

Refer to the code snippet given in Figure 4.13:

Figure 4.13: Code snippet to see the difference between re.match() versus re.search()

The output of the code snippet of Figure 4.13 is given in Figure 4.14:

Figure 4.14: Output of the re.match() versus re.search()

The syntax is as follows:

Find the single occurrence of character a and b:

Regex: [ab]

Find characters except a and b:

Regex: [^ab]

Find the character range of a to z:

Regex: [a-z]

Find range except to z:

Regex: [^a-z]

Find all the characters a to z as well as A to Z:

Regex: [a-zA-Z]

Any single character:

Regex: .

Any whitespace character:

Regex: \s

Any non-whitespace character:

Regex: \S

Any digit:

Regex: \d

Any non-digit:

Regex: \D

Any non-words:

Regex: \W

Any words:

Regex: \w

Either match a or b:

Regex: (a|b)

Occurrence of a is either zero or one:

Regex: a? ; ? Matches  zero or one occurrence not more than 1 occurrence

Occurrence of a is zero time or more than that:

Regex: a* ; * matches zero or more than that

Occurrence of a is one time or more than that:

Regex: a+ ; + matches occurrences one or more that one time

Exactly match three occurrences of a:

Regex: a{3}

Match simultaneous occurrences of a with 3 or more than 3:

Regex: a{3,}

Match simultaneous occurrences of a between 3 to 6:

Regex: a{3,6}

Starting of the string:

Regex: ^

Ending of the string:

Regex: $

Match word boundary:

Regex: \b

Non-word boundary:

Regex: \B

The basic code snippet is given in Figure 4.15:

Figure 4.15: Basic regex functions code snippet

The output of the code snippet of Figure 4.15 is given in Figure 4.16:

Figure 4.16: Output of the basic regex function code snippet