cover
Title Page
Copyright
Python Natural Language Processing
Credits
Foreword
About the Author
Acknowledgement
About the Reviewers
www.PacktPub.com
Why subscribe?
Customer Feedback
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
Introduction
Understanding natural language processing
Understanding basic applications
Understanding advanced applications
Advantages of togetherness - NLP and Python
Environment setup for NLTK
Tips for readers
Summary
Practical Understanding of a Corpus and Dataset
What is a corpus?
Why do we need a corpus?
Understanding corpus analysis
Exercise
Understanding types of data attributes
Categorical or qualitative data attributes
Numeric or quantitative data attributes
Exploring different file formats for corpora
Resources for accessing free corpora
Preparing a dataset for NLP applications
Selecting data
Preprocessing the dataset
Formatting
Cleaning
Sampling
Transforming data
Web scraping
Summary
Understanding the Structure of a Sentences
Understanding components of NLP
Natural language understanding
Natural language generation
Differences between NLU and NLG
Branches of NLP
Defining context-free grammar
Exercise
Morphological analysis
What is morphology?
What are morphemes?
What is a stem?
What is morphological analysis?
What is a word?
Classification of morphemes
Free morphemes
Bound morphemes
Derivational morphemes
Inflectional morphemes
What is the difference between a stem and a root?
Exercise
Lexical analysis
What is a token?
What are part of speech tags?
Process of deriving tokens
Difference between stemming and lemmatization
Applications
Syntactic analysis
What is syntactic analysis?
Semantic analysis
What is semantic analysis?
Lexical semantics
Hyponymy and hyponyms
Homonymy
Polysemy
What is the difference between polysemy and homonymy?
Application of semantic analysis
Handling ambiguity
Lexical ambiguity
Syntactic ambiguity
Approach to handle syntactic ambiguity
Semantic ambiguity
Pragmatic ambiguity
Discourse integration
Applications
Pragmatic analysis
Summary
Preprocessing
Handling corpus-raw text
Getting raw text
Lowercase conversion
Sentence tokenization
Challenges of sentence tokenization
Stemming for raw text
Challenges of stemming for raw text
Lemmatization of raw text
Challenges of lemmatization of raw text
Stop word removal
Exercise
Handling corpus-raw sentences
Word tokenization
Challenges for word tokenization
Word lemmatization
Challenges for word lemmatization
Basic preprocessing
Regular expressions
Basic level regular expression
Basic flags
Advanced level regular expression
Positive lookahead
Positive lookbehind
Negative lookahead
Negative lookbehind
Practical and customized preprocessing
Decide by yourself
Is preprocessing required?
What kind of preprocessing is required?
Understanding case studies of preprocessing
Grammar correction system
Sentiment analysis
Machine translation
Spelling correction
Approach
Summary
Feature Engineering and NLP Algorithms
Understanding feature engineering
What is feature engineering?
What is the purpose of feature engineering?
Challenges
Basic feature of NLP
Parsers and parsing
Understanding the basics of parsers
Understanding the concept of parsing
Developing a parser from scratch
Types of grammar
Context-free grammar
Probabilistic context-free grammar
Calculating the probability of a tree
Calculating the probability of a string
Grammar transformation
Developing a parser with the Cocke-Kasami-Younger Algorithm
Developing parsers step-by-step
Existing parser tools
The Stanford parser
The spaCy parser
Extracting and understanding the features
Customizing parser tools
Challenges
POS tagging and POS taggers
Understanding the concept of POS tagging and POS taggers
Developing POS taggers step-by-step
Plug and play with existing POS taggers
A Stanford POS tagger example
Using polyglot to generate POS tagging
Exercise
Using POS tags as features
Challenges
Name entity recognition
Classes of NER
Plug and play with existing NER tools
A Stanford NER example
A Spacy NER example
Extracting and understanding the features
Challenges
n-grams
Understanding n-gram using a practice example
Application
Bag of words
Understanding BOW
Understanding BOW using a practical example
Comparing n-grams and BOW
Applications
Semantic tools and resources
Basic statistical features for NLP
Basic mathematics
Basic concepts of linear algebra for NLP
Basic concepts of the probabilistic theory for NLP
Probability
Independent event and dependent event
Conditional probability
TF-IDF
Understanding TF-IDF
Understanding TF-IDF with a practical example
Using textblob
Using scikit-learn
Application
Vectorization
Encoders and decoders
One-hot encoding
Understanding a practical example for one-hot encoding
Application
Normalization
The linguistics aspect of normalization
The statistical aspect of normalization
Probabilistic models
Understanding probabilistic language modeling
Application of LM
Indexing
Application
Ranking
Advantages of features engineering
Challenges of features engineering
Summary
Advanced Feature Engineering and NLP Algorithms
Recall word embedding
Understanding the basics of word2vec
Distributional semantics
Defining word2vec
Necessity of unsupervised distribution semantic model - word2vec
Challenges
Converting the word2vec model from black box to white box
Distributional similarity based representation
Understanding the components of the word2vec model
Input of the word2vec
Output of word2vec
Construction components of the word2vec model
Architectural component
Understanding the logic of the word2vec model
Vocabulary builder
Context builder
Neural network with two layers
Structural details of a word2vec neural network
Word2vec neural network layer's details
Softmax function
Main processing algorithms
Continuous bag of words
Skip-gram
Understanding algorithmic techniques and the mathematics behind the word2vec model
Understanding the basic mathematics for the word2vec algorithm
Techniques used at the vocabulary building stage
Lossy counting
Using it at the stage of vocabulary building
Applications
Techniques used at the context building stage
Dynamic window scaling
Understanding dynamic context window techniques
Subsampling
Pruning
Algorithms used by neural networks
Structure of the neurons
Basic neuron structure
Training a simple neuron
Define error function
Understanding gradient descent in word2vec
Single neuron application
Multi-layer neural networks
Backpropagation
Mathematics behind the word2vec model
Techniques used to generate final vectors and probability prediction stage
Hierarchical softmax
Negative sampling
Some of the facts related to word2vec
Applications of word2vec
Implementation of simple examples
Famous example (king - man + woman)
Advantages of word2vec
Challenges of word2vec
How is word2vec used in real-life applications?
When should you use word2vec?
Developing something interesting
Exercise
Extension of the word2vec concept
Para2Vec
Doc2Vec
Applications of Doc2vec
GloVe
Exercise
Importance of vectorization in deep learning
Summary
Rule-Based System for NLP
Understanding of the rule-based system
What does the RB system mean?
Purpose of having the rule-based system
Why do we need the rule-based system?
Which kind of applications can use the RB approach over the other approaches?
Exercise
What kind of resources do you need if you want to develop a rule-based system?
Architecture of the RB system
General architecture of the rule-based system as an expert system
Practical architecture of the rule-based system for NLP applications
Custom architecture - the RB system for NLP applications
Exercise
Apache UIMA - the RB system for NLP applications
Understanding the RB system development life cycle
Applications
NLP applications using the rule-based system
Generalized AI applications using the rule-based system
Developing NLP applications using the RB system
Thinking process for making rules
Start with simple rules
Scraping the text data
Defining the rule for our goal
Coding our rule and generating a prototype and result
Exercise
Python for pattern-matching rules for a proofreading application
Exercise
Grammar correction
Template-based chatbot application
Flow of code
Advantages of template-based chatbot
Disadvantages of template-based chatbot
Exercise
Comparing the rule-based approach with other approaches
Advantages of the rule-based system
Disadvantages of the rule-based system
Challenges for the rule-based system
Understanding word-sense disambiguation basics
Discussing recent trends for the rule-based system
Summary
Machine Learning for NLP Problems
Understanding the basics of machine learning
Types of ML
Supervised learning
Unsupervised learning
Reinforcement learning
Development steps for NLP applications
Development step for the first iteration
Development steps for the second to nth iteration
Understanding ML algorithms and other concepts
Supervised ML
Regression
Classification
ML algorithms
Exercise
Unsupervised ML
k-means clustering
Document clustering
Advantages of k-means clustering
Disadvantages of k-means clustering
Exercise
Semi-supervised ML
Other important concepts
Bias-variance trade-off
Underfitting
Overfitting
Evaluation matrix
Exercise
Feature selection
Curse of dimensionality
Feature selection techniques
Dimensionality reduction
Hybrid approaches for NLP applications
Post-processing
Summary
Deep Learning for NLU and NLG Problems
An overview of artificial intelligence
The basics of AI
Components of AI
Automation
Intelligence
Stages of AI
Machine learning
Machine intelligence
Machine consciousness
Types of artificial intelligence
Artificial narrow intelligence
Artificial general intelligence
Artificial superintelligence
Goals and applications of AI
AI-enabled applications
Comparing NLU and NLG
Natural language understanding
Natural language generation
A brief overview of deep learning
Basics of neural networks
The first computation model of the neuron
Perceptron
Understanding mathematical concepts for ANN
Gradient descent
Calculating error or loss
Calculating gradient descent
Activation functions
Sigmoid
TanH
ReLu and its variants
Loss functions
Implementation of ANN
Single-layer NN with backpropagation
Backpropagation
Exercise
Deep learning and deep neural networks
Revisiting DL
The basic architecture of DNN
Deep learning in NLP
Difference between classical NLP and deep learning NLP techniques
Deep learning techniques and NLU
Machine translation
Deep learning techniques and NLG
Exercise
Recipe summarizer and title generation
Gradient descent-based optimization
Artificial intelligence versus human intelligence
Summary
Advanced Tools
Apache Hadoop as a storage framework
Apache Spark as a processing framework
Apache Flink as a real-time processing framework
Visualization libraries in Python
Summary
How to Improve Your NLP Skills
Beginning a new career journey with NLP
Cheat sheets
Choose your area
Agile way of working to achieve success
Useful blogs for NLP and data science
Grab public datasets
Mathematics needed for data science
Summary
Installation Guide
Installing Python pip and NLTK
Installing the PyCharm IDE
Installing dependencies
Framework installation guides
Drop your queries
Summary
更新时间:2021-07-15 17:03:09