Basic NLP to End-to-end Pipeline .pptx_removed
Basic NLP to End-to-end Pipeline .pptx_removed
Introduction To NLP
Language modeling
This is the task of predicting what the next word in a sentence will be based on the history of previous words. The goal of this task is to
learn the probability of a sequence of words appearing in a given language. Language modeling is useful for building solutions for a
wide variety of problems, such as speech recognition, optical character recognition, handwriting recognition, machine translation, and
spelling correction.
Text classification
This is the task of bucketing the text into a known set of categories based on its content. Text classification is by
far the most popular task in NLP and is used in a variety of tools, from email spam identification to sentiment
analysis.
Information extraction
As the name indicates, this is the task of extracting relevant information from text, such as calendar events from
emails or the names of people mentioned in a social media post.
Information retrieval
This is the task of finding documents relevant to a user query from a large collection. Applications like
Google Search are well-known use cases of information retrieval.
Conversational agent
This is the task of building dialogue systems that can converse in human languages. Alexa, Siri, etc., are
some common applications of this task.
Text summarization
This task aims to create short summaries of longer documents while retaining the core content and
preserving the overall meaning of the text.
Question answering
This is the task of building a system that can automatically answer questions posed in natural
language.
Machine translation
This is the task of converting a piece of text from one language to another. Tools like Google Translate are
common applications of this task.
Topic modeling
This is the task of uncovering the topical structure of a large collection of documents. Topic modeling is a
common text-mining tool and is used in a wide range of domains, from literature to bioinformatics.
Difficulty in terms of developing comprehensive
solutions.
What Is Language?
Phonemes
Phonemes are the smallest units of sound in a language. They may not have any meaning by themselves but can
induce meanings when uttered in combination with other phonemes.
Morphemes and lexemes
A morpheme is the smallest unit of language that has a meaning. It is formed by a combination of phonemes. Not
all morphemes are words, but all prefixes and suffixes are morphemes.
Syntax
Syntax is a set of rules to construct grammatically correct sentences out of words and phrases in a language. Syntactic structure in
linguistics is represented in many different ways. A common approach to representing sentences is a parse tree. In this representation,
N stands for noun, V for verb, and P for preposition. Noun phrase is denoted by NP and verb phrase by VP.
Context
Context is how various parts in a language come together to convey a particular meaning. Context
includes long-term references, world knowledge, and common sense along with the literal meaning
of words and phrases.
Complex NLP tasks such as sarcasm detection, summarization, and topic modeling are some of
tasks that use context heavily.
Building blocks of language and their applications
Approaches to NLP
The different approaches used to solve NLP problems commonly fall into three
categories: heuristics, machine learning, and deep learning.
Heuristics-Based NLP
Similar to other early AI systems, early attempts at designing NLP systems were based on
building rules for the task at hand.
Examples:
● Regular Expression
● Wordnet
● Open Mind Common Sense
Machine Learning for NLP
Machine learning techniques are applied to textual data just as they’re used on other forms of data, such as
images, speech, and structured data. Supervised machine learning techniques such as classification and
regression methods are heavily used for various NLP tasks.
● Naive Bayes
● Support vector machine
● Hidden Markov Model
Deep Learning for NLP
Huge surge in using neural networks to deal with complex, unstructured data. Language is inherently complex and
unstructured. herefore, we need models with better representation and learning capability to understand and solve language
tasks. Here are a few popular deep neural network architectures that have become the status quo in NLP.
The ambiguity and creativity of human language are just two of the characteristics that
make NLP a demanding area to work in.
Ambiguity
Ambiguity means uncertainty of meaning!
Common knowledge
A key aspect of any human language is “common knowledge.” It is the set of all facts that most
humans are aware of.
Example:
consider two sentences: “man bit dog” and “dog bit man.”
Creativity
Language is not just rule driven; there is also a creative aspect to it. Various styles,
dialects, genres, and variations are used in any language. Poems are a great example
of creativity in language. Making machines understand creativity is a hard problem not
just in NLP, but in AI in general.
● Introduction To NLP
● Why NLP Useful?
● NLP in the Real World/Core applications:
● Various NLP Tasks
● Difficulty in terms of developing comprehensive solutions.
● What Is Language?
● Building Blocks of Language
● Approaches to NLP
● Why Is NLP Challenging?
End-to-end NLP Pipeline
What is NLP Pipeline?
Break the problem down into several sub-problems, then try to develop a step-by-step procedure to solve them. Since
language processing is involved, we would also list all the forms of text processing needed at each step. This step-by-step
processing of text is known as a pipeline.
● Data acquisition
● Text Preparation
● Text Cleanup
● Basic Preprocessing
● Advance Preprocessing
● Feature engineering
● Modeling
● Evaluation
● Deployment
● Monitoring and model updating
Points to Remember
Books: