- Python Natural Language Processing
- Jalaj Thanaki
- 247字
- 2021-07-15 17:01:59
Stop word removal
Stop word removal is an important preprocessing step for some NLP applications, such as sentiment analysis, text summarization, and so on.
Removing stop words, as well as removing commonly occurring words, is a basic but important step. The following is a list of stop words which are going to be removed. This list has been generated from nltk. Refer to the following code snippet in Figure 4.7:
The output of the preceding code is a list of stop words available in nltk, refer to Figure 4.8:
The nltk has a readily available list of stop words for the English language. You can also customize which words you want to remove according to the NLP application that you are developing.
You can see the code snippet for removing customized stop words in Figure 4.9:
The output of the code given in Figure 4.9 is as follows:
this is foo.
The code snippet in Figure 4.10 performs actual stop word removal from raw text and this raw text is in the English language:
The output of the preceding code snippet is as follows:
Input raw sentence: ""this is a test sentence. I am very happy today."" --------Stop word removal from raw text--------- test sentence. happy today.