0% found this document useful (0 votes)
5 views

NLP_PPT

Natural Language Processing (NLP) is a subset of Artificial Intelligence aimed at bridging the communication gap between humans and computers through the analysis and understanding of natural language. The document outlines the history, importance, methods, and applications of NLP, detailing its challenges and the various stages involved in processing language data. Key areas of application include sentiment analysis, chatbots, and machine translation, with ongoing advancements in techniques such as deep learning and machine learning.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

NLP_PPT

Natural Language Processing (NLP) is a subset of Artificial Intelligence aimed at bridging the communication gap between humans and computers through the analysis and understanding of natural language. The document outlines the history, importance, methods, and applications of NLP, detailing its challenges and the various stages involved in processing language data. Key areas of application include sentiment analysis, chatbots, and machine translation, with ongoing advancements in techniques such as deep learning and machine learning.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 41

NATURAL

LANGUAGE
PROCESSING(NLP)
Overview of NLP: Definition, Scope, Applications of NLP

Dr. Siron Anita Susan T


Assistant Professor
SRM Institute of Technology Tiruchirappalli
HISTORY OF NLP

1948 First NLP Application - Dictionary Look-up system at Birkbeck college

1957 Chomsky's Syntactic Structures – Revolution linguistics (influence BNF &RE)

1966 ALPAC Report - machine translation, advances in grammar and semantics

1970s NLP influenced by AI – LUNAR , Prolog

1980s Statistical NLP Emerges – Hidden Markov model

1982 Project Jabberwacky – early chatbox


HISTORY OF NLP

1988 FrameNet Project – Shallow semantic parsing

2001 Word Embeddings Introduced – NN for language modelling

2003 Latent Dirichlet Allocation (LDA) – topic modelling in ML

2013 NLP influenced by AI – improves Word Embeddings (CNN)

Mar 2016 Microsoft’s Tay Chatbot – highlight ethical challenges in AI

Sep 2016 Google Neural Machine Translation (NMT) –reduce translation errors(deep LSTM)
HOW NLP?

• Natural languages-humans use to communicate


• Computers have their own programming languages and were not meant to understand
natural languages.
• Why not speak to the computer and let it respond in a natural language? This is one of the
aims of Natural Language Processing (NLP) – machine Translation
• NLP is rooted in the theory of linguistics
• The process of computer analysis of input provided in a human language (natural
language), and conversion of this input into a useful form of representation.
WHAT IS NLP?

• Natural Language Processing is a subset technique of Artificial Intelligence that


is used to narrow the communication gap between the Computer and Human.
• Techniques from machine learning and deep neural networks have also been
successfully applied to NLP problems.
• While many practical applications of NLP already exist, NLP has many
unsolved problems.
• The field of NLP is primarily concerned with getting computers to perform
useful and interesting tasks with human languages.
• The field of NLP is secondarily concerned with helping us come to a better
understanding of human language.
• Eg: Customer support service for a product
WHY IT IS IMPORTANT?

• Faster response
• Without Bias
• Manage more volume of data
• To learn more GOALS
• Scientific- Computer to understand
• Practical – Using available data
WHY DO COMPUTERS HAVE DIFFICULTY WITH NLP?

• Computers - dealing with structured data(organized, indexed


and referenced)
• In NLP, we often deal with unstructured data.
• Eg: Social media posts, news articles, emails, and product
reviews are examples of text-based unstructured data.
• To process such text, NLP has to learn the structure and
grammar of the natural language.
• Importantly, 80% of enterprise data is unstructured.
HOW IT IS WORKING: GENERAL SKETCH

Input Convert
USER MACHINE TEXT

ML

Out
RESPONSE PROCESS

Audio/Text
TERMS TO UNDERSTAND

Natural Language
Processing is divided
into sub-areas, i.e.,
Natural Language
Generation and
Natural Language
Understanding, which
are, as the name
suggests, associated
with the generation
and understanding of
the text.
SYNTAX

• Definition: The set of rules that governs the structure and order of words in sentences.
• Importance: Helps in understanding sentence structure and grammar.
• Example:
• Correct Syntax: "The cat sits on the mat."
• Incorrect Syntax: "Cat the mat on sits."

• Applications in NLP:
• Part-of-Speech (POS) tagging.
• Syntactic parsing to analyze grammatical structure.
MORPHOLOGY

• Definition: The study of the structure and formation of words, including roots, prefixes, suffixes,
and inflections.
• Types:
• Inflectional Morphology: Changes in word form to express tense, number, gender, etc. (e.g., "walk"
→ "walked").
• Derivational Morphology: Formation of new words by adding affixes (e.g., "happy" → "happiness").

• Importance: Helps in breaking words into meaningful components.


• Applications in NLP:
• Lemmatization and stemming.
• Spell correction and word segmentation.
SEMANTICS

• Definition: The study of meaning in language, including word meanings and sentence
meanings.
• Importance: Focuses on understanding the meaning of text.
• Example:
• "The bank is by the river" (financial institution vs. riverbank).

• Applications in NLP:
• Word sense disambiguation.
• Named entity recognition (NER).
• Sentiment analysis.
PRAGMATICS

• Definition: The study of how context influences the interpretation of meaning in language.
• Importance: Goes beyond literal meanings to understand implied meanings and situational
context.
• Example:
• Literal: "Can you pass the salt?" (Yes, I can.)
• Pragmatic: (Actually passing the salt).

• Applications in NLP:
• Chatbots and conversational agents.
• Sarcasm and irony detection.
• Context-aware language generation.
PHONOLOGY

• Definition: The study of the sound systems of a language and how sounds
are organized and used.
• Importance: Relevant in speech-related NLP tasks.
• Applications in NLP:
• Speech recognition (converting speech to text).
• Text-to-speech systems (TTS).
• Pronunciation correction tools.
NLU and NLG

• Natural Language Understanding (NLU):


• involves converting speech or text into useful representations on which analysis can be performed.
• Goal- to resolve ambiguities, obtain context and understand the meaning of what's being said.
analysis
• NLU is about semantic relationships and meaning.
• NLU tackles the complexities of language beyond the basic sentence structure.

• Natural Language Generation (NLG):


• Given an internal representation.
• involves selecting the right words, forming phrases and sentences. synthesi
s
• Sentences need to ordered so that information is conveyed correctly.
MAIN APPROACHES ADOPTED BY NLP

• Symbolic approach (from the 1950s) • Statistical approach(from the 1980s)


• rooted in linguistics. • Rules were learned and they had associated probabilities.
• Given the rules of syntax and grammar - obtain the • ML models came in with support vector machines and
structure of text. logistic regression.
• Using logic, we could obtain the meaning.
• More recently, Deep Learning (DL) models that employ a
• But rules had to be hand-crafted and were often neural network of many layers have brought better
numerous. accuracy.
• They didn't handle colloquial text well. • This success is partly due to the more efficient
• Rules worked well for specific use cases but couldn't be representations given by word embeddings.
generalized.
NLP involves different levels or scope of
analysis.
STAGES OF NLP
LEXICAL AND MORPHOLOGICAL
ANALYSIS
• The lexical phase - involves scanning text and breaking it down into smaller units (tokens) .
• Tokenization is essential for understanding and processing text at the word level.

• In addition to tokenization, various data cleaning and feature extraction techniques are applied,
including: Lemmatization, Stopwords Removal, Correcting Misspelled Words.

• Morphological Analysis - focusing on identifying morphemes, Understanding morphemes is


vital for grasping the structure of words and their relationships.

• Types of Morphemes: Free Morphemes and Bound Morphemes


• Importance of Morphological Analysis- Understanding Word Structure, Predicting Word Forms,
Improving Accuracy
SYNTACTIC ANALYSIS (PARSING)

• essential for understanding the structure of a sentence and assessing its


grammatical correctness.
Sentence: "John eats an apple."
• It involves analyzing the relationships between words and ensuring
their logical consistency by comparing their arrangement against POS Tags:
standard grammatical rules. • John: Proper Noun (NNP)
• Role- examines the grammatical structure and relationships within a • eats: Verb (VBZ)
given text and assigns Parts-Of-Speech (POS) tags to each word • an: Determiner (DT)
• apple: Noun (NN)
• This tagging is crucial for understanding how words relate to each other
syntactically and helps in avoiding ambiguity
SEMANTIC ANALYSIS

• focusing on extracting the meaning from text.


• concerned with the literal and contextual meaning of words, phrases, and sentences.
• determines whether the arrangement of words in a sentence makes logical sense
• helps in finding context and logic by ensuring the semantic coherence of sentences.
• Key Tasks:
• Named Entity Recognition (NER): identifies and classifies entities within the text
• Word Sense Disambiguation (WSD): determines the correct meaning of ambiguous words based on context

• Example- “Orange eats a Mary” - grammatically correct but does not make sense semantically.
DISCOURSE INTEGRATION

• comprehending the relationship between the current sentence and earlier sentences or the
larger context.
• contextualizing text and understanding the overall message conveyed.
• Role- examines how words, phrases, and sentences relate to each other within a larger
context.
• assesses the impact a word or sentence and how the combination of sentences affects the
overall meaning.
• helps in understanding implicit references and the flow of information across sentences.
• Example: "This is unfair!“ - "this" - need to examine the preceding or following sentences
PRAGMATIC ANALYSIS

• focusing on interpreting the inferred meaning of a text beyond its literal content.
• Role- aims to grasp these deeper meanings in communication. i.e what the writer
or speaker truly intends to convey?
• Importance of Understanding Intentions - the word "Hello" can have various
interpretations depending on the tone and context in which it is spoken.
• Example: "Hello! What time is it?“ -might be a straightforward request for the
current time, but it could also imply concern about being late.
NLP PIPELINE

• NLP pipeline is a sequence of interconnected steps that


systematically transform raw text data into a desired
output.
• It’s analogous to a factory assembly line, where each
step refines the material until it reaches its final form.
• This pipeline is not universal.
• This is ML pipeline and deep learning pipelines are
slightly different.
• NLP pipeline is non-linear (that means stages can have
more dynamic connections, allowing for branching and
iteration).
DATA ACQUISITION

Objective: Collect raw text data for creating a robust dataset.


• Data from External Sources:
• Data Available:
• Use public datasets (Kaggle, UCI, government
• On Your Desk: Begin text preprocessing immediately. repositories).

• In Databases: Collaborate with data engineers to retrieve • Extract data using web scraping (e.g., BeautifulSoup,
data. Scrapy).

• Insufficient Data: Use data augmentation techniques: • Access APIs (e.g., Twitter, Reddit, news aggregators).

• Synonym replacement. • Extract text from PDFs (e.g., PyPDF2, PDFMiner).

• Bigram flip. • No Existing Data:

• • Collaborate with trusted clients for anonymized data.


Back translation.
• Generate synthetic data through surveys, interviews,
• Adding noise.
or user-generated content.
TEXT PREPROCESSING

Objective: Clean and standardize text for meaningful analysis. • Convert text to lowercase.
Steps:
• Detect the text’s language.
• Basic Cleaning:
• Advanced Preprocessing:
• Remove HTML tags and irrelevant formatting.
• Perform Part-of-Speech (POS) tagging.
• Handle emojis (convert or remove).
• Conduct parsing for grammatical
• Perform spell checks for consistency. structure.
• Basic Preprocessing:
• Resolve coreferences for coherent
• Tokenize text into words or sentences. understanding.
• Remove stop words (e.g., “the,” “is”).
• Apply stemming or lemmatization.
FEATURE ENGINEERING

Objective: Convert text into numerical features for models.


• Techniques:
• Bag of Words (BoW): Frequency-based representation of unique words.
• TF-IDF: Weighs word importance based on frequency and rarity.
• One-Hot Encoding: Binary vectors for words, effective for small vocabularies.
• Word Embeddings: Dense vector representations capturing semantic meaning (e.g., Word2Vec,
GloVe, FastText).
• N-Gram Models: Capture sequences of adjacent words (bigrams, trigrams).
• Dependency Parsing: Capture relationships between words through syntactic dependencies.
MODELLING

Objective: Train models to perform NLP tasks.


• Approaches:
• Heuristic Models:
• Rule-based systems for specific patterns (e.g., keyword matching).
• Machine Learning:
• Support Vector Machines (SVM): Effective for text classification.
• Random Forests: Suitable for sentiment analysis or categorization.
• Deep Learning:
• RNNs: Handle sequence-based tasks like language modeling.
• Transformers: Capture long-range dependencies for tasks like translation and summarization.
• Cloud APIs:
• Google Cloud, Microsoft Azure: Provide pre-trained models for rapid prototyping.
EVALUATION

Objective: Assess model performance.


• Types:
• Intrinsic Evaluation:
• Accuracy, Precision, Recall, F1-Score.
• BLEU for translation, Perplexity for language models.
• Extrinsic Evaluation:
• Business metrics (e.g., customer satisfaction, revenue impact).
• Task-specific metrics (e.g., classification accuracy).
• User-centric evaluation (feedback, surveys).
DEPLOYMENT

Objective: Implement the model in real-world applications. Updates:


Steps:
• Adapt to dynamic data and retrain
• Deployment: models periodically.
• Integrate the model into production systems.
• Maintain version control for
• Set up infrastructure for scalability and reliability.
transparency.
• Validate functionality through testing.
• Address evolving user needs based on
• Monitoring:
feedback.
• Continuously monitor performance and behavior.
• Implement alerts for deviations or anomalies.
CHALLENGES IN NLP

• Diversity in Language and Communication


• Challenges in Sourcing and Preparing Training Data
• Time and Resource Demands for NLP Development
• Dealing with Ambiguity in Phrasing and Meaning
• Correcting Spelling and Grammar Errors
• Addressing Bias and Fairness in NLP Models
• Handling Lexical Ambiguity and Multiple Meanings
• Overcoming Multilingual and Cross-Cultural Barriers
• Minimizing Uncertainty and False Positive Predictions
• Enabling Seamless and Ongoing Conversations
HOW TO OVERCOME NLP CHALLENGES

• Enhance Data Quantity and Quality


• Use high-quality, diverse datasets to train NLP models effectively.
• Apply techniques like data augmentation, data synthesis, and crowdsourcing to address data scarcity.

• Handle Ambiguity in Language


• Train NLP algorithms to disambiguate words and phrases using context and semantic analysis.

• Address Out-of-Vocabulary (OOV) Words


• Implement techniques like tokenization, character-level modeling, and vocabulary expansion to manage
OOV words.

• Tackle Lack of Annotated Data


• Use transfer learning and pre-training to leverage large datasets and apply knowledge to tasks with
limited labeled data.
SCOPE OF NLP

Text Processing and Analysis


• Sentiment Analysis: Understanding opinions and sentiments from text data (e.g., social media,
reviews).
• Text Summarization: Generating concise summaries of lengthy documents or articles.
• Topic Modeling: Identifying hidden topics within text datasets.
• Text Classification: Categorizing emails, documents, and news articles into predefined groups.
Human-Computer Interaction
• Chatbots: Enhancing customer service through conversational agents.
• Virtual Assistants: Powering voice-based systems like Siri, Alexa, and Google Assistant.
• Speech-to-Text and Text-to-Speech: Enabling accessibility for visually or hearing-impaired users.
SCOPE OF NLP (CONT…)

Healthcare Applications
• Clinical Text Analysis: Extracting insights from electronic health records (EHRs).
• Medical Chatbots: Offering basic medical advice and appointment scheduling.
• Drug Discovery: Analyzing medical literature for drug development.
• Disease Prediction: Detecting early signs of illness from patient records.
Language Translation and Localization
• Machine Translation: Tools like Google Translate for multilingual communication.
• Localization: Adapting content for cultural and regional relevance.
• Cross-Language Information Retrieval: Searching for information across languages.
SCOPE OF NLP (CONT…)

Business and Marketing


• Customer Sentiment Analysis: Understanding customer feedback and improving products/services.
• Personalized Marketing: Crafting targeted campaigns based on user behavior and preferences.
• Automated Report Generation: Summarizing business insights from data analytics.
Education and E-Learning
• Grammar and Spell Checking: Tools like Grammarly for improving written communication.
• Content Recommendation: Tailoring learning materials based on user progress and preferences.
• Language Learning: Interactive tools for acquiring new languages.
SCOPE OF NLP (CONT…)

Media and Entertainment


• Content Moderation: Detecting inappropriate or harmful content.
• Script Analysis: Generating or analyzing scripts for movies or shows.
• Automated Subtitles: Generating real-time captions for videos.
Legal and Compliance
• Document Review: Analyzing contracts and legal documents for compliance.
• Case Law Analysis: Extracting insights from legal precedents.
• Regulatory Monitoring: Keeping track of changes in compliance requirements.
SCOPE OF NLP (CONT…)

Research and Development


• Knowledge Graphs: Building relationships between entities for research purposes.
• Question-Answering Systems: Advanced AI models for research and academic purposes.
• Scientific Literature Analysis: Summarizing and categorizing research papers.
Emerging Areas
• Emotion Detection: Understanding emotions conveyed in text or speech.
• Real-Time Applications: Real-time language translation and sentiment tracking.
• Ethical AI in NLP: Developing models that mitigate bias and ensure fairness.
• Multimodal NLP: Integrating text with images, audio, and video for deeper insights.
APPLICATIONS

Chatbots
• Simulate human-like conversation using Natural Language Processing (NLP) and Machine
Learning (ML).
• Understand complex language and improve over time by learning from interactions.
• Function through two steps: understanding user input and providing appropriate responses.
Autocomplete in Search Engines
• Suggest possible completions for typed queries based on keyword predictions.
• Analyze vast datasets and patterns to provide relevant suggestions.
• NLP identifies relationships between words to predict user intent.
APPLICATIONS (CONT…..)

Voice Assistants
• Examples: Siri, Alexa, Google Assistant.
• Perform tasks such as making calls, setting reminders, and surfing the internet using voice
commands.
• Utilize speech recognition, natural language understanding, and NLP for interaction.
Language Translators
• Translate text between languages using Sequence-to-Sequence modeling.
• Transitioned from Statistical Machine Translation (SMT) to advanced NLP models for improved
accuracy.
• Examples: Google Translate, which identifies patterns and vocabulary of languages
APPLICATIONS (CONT…..)

Sentiment Analysis
• Analyze user sentiments on social media, reviews, or feedback.
• Employs NLP, text analysis, and computational linguistics to classify sentiments as positive,
negative, or neutral.
• Helps businesses gauge public opinion, understand brand perception, and improve services.
Grammar Checkers
• Enhance professional and academic writing by correcting grammar and spelling errors.
• Suggest synonyms and improve readability using NLP algorithms trained on large datasets.
• Essential for producing polished and error-free content.
APPLICATIONS (CONT…..)

Email Classification and Filtering


• Categorize emails into sections like Primary, Social, and Promotions using text classification.
• NLP identifies the context and content of emails to automate sorting.
• Improves productivity by decluttering inboxes and organizing communication
Electronic Health Records (EHR) Analysis
• Extract and organize unstructured data from clinical notes, discharge summaries, and patient
histories.
• Streamline documentation and provide physicians with actionable insights.
• Enable faster and more accurate diagnosis through data-driven decision-making..
SUMMARY OF THE SESSION

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy