0% found this document useful (0 votes)
23 views

Poeter Stemmer Algorithm

Uploaded by

lasyav9550
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Poeter Stemmer Algorithm

Uploaded by

lasyav9550
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 57

Natural Language Processing

IV B.Tech VIII Semester (Code:18CSD53)


Dr. T.Satyanarayana Murthy
Associate Professor
Department of CSE
Porter Stemmer algorithm
The Porter stemming algorithm (or 'Porter stemmer') is a process for
removing the commoner morphological and inflexional endings
from words in English. Its main use is as part of a term normalisation
process that is usually done when setting up Information Retrieval
systems.
Morphological Analysis
Word
A word can be isolated from a sentence as the single smallest element of
a sentence that carries meaning. This single smallest isolated unit is
called a word.
There exists smaller difference between word and a morpheme.
Difference between morpheme and word
Classification of morphemes
Free morphomes:

Free morphomes can stand alone and act as a word. They are also called
as unbound morphemes or free standing morphomes.
Eg: dogs, cats, town, house

Bounded Morphemes:
Bounded morphemes usually take affixes. Derivational and Inflectional
Derivational morphemes: These are identified when infixes combines with
the root and changes either the semantic meaning.
Consider: kind is a word
Eg: unkind un acts as derivational morpheme .. Changes the meaning of
the word.
Bounded Morphemes:
Consider: happy is a word
Eg: happyness ..ness acts as derivational morpheme .. Changes the
meaning of the word.
Happy – adjective
Hapinesss- noun

Inflectional Morphemes:
These are suffixes that are added to a word to assign particular
grammatical property to that word.

Eg: dogs s changes the number of dogs


Natural Language Generation
1. NLG is considered as second component of NLP.
2. NLG is defined as the process of generating NL by a machine as
output.
3. The output of the machine should be in a logical manner, meaning, what
ever NL is generated by the machine should be logical
4. In order to generate logical output, many NL systems use basic fact or
knowledge-based representation.
5. Let us take an example. You have a system that writes an essay on a
particular topic. If I am instructing my machine to generate 100 words on
the topic of The cows, and my machine generates 100 words on the
topic of the cow, then the output(here words about cow) generated by
the machine should be in form of valid sentences, all sentences should
be logically correct, and the context should also make sense.
Branches of NLP:
1. NLP involves two major branches that help us to develop NLP
applications. One is computational, the computer science branch,
and the other one is linguistics branch.
2. The linguistics branch focuses on how NL can be analyzed using
various scientific techniques. So the linguistics branch does scientific
analysis of the form, meaning, and context.
3. All linguistic analysis can be implemented with the help of CS
techniques. We can use the analysis and feed elements of analysis in
a machine learning algorithm to build an NLP application. Here, the
machine learning algorithm is a part of Computer Science and the
analysis of language is Linguistics.
Computational linguistics is a field that helps you to understand both
computer science and linguistics approaches together.
Here is a list of tools that are linguistics concepts and are
implemented with the help of computer science techniques. These
tools are often used for developing NLP applications:
For POS tagging, POS tagger are used. Famous libraries are nltk and
pycorenlp.
Morph analyzers are used to generate word-level stemming. For this,
the nltk and polyglot libraries are used.
Parsers are used to identify the structure of the sentences. For this, we
are using Standard CoreNLP and nltk to generate a parsing tree. You
can use Python package called spaCy.
Discipline Typical Problems Tools
Linguists How do words form phrases and Intuitions about well-formedness
sentences? What constrains the and meaning; mathematical
possible meanings for a sentence? models of structure and meaning

Psycholinguists How do people identify the Experimental techniques based on


structure of sentences? How are measuring human performance;
word and text meanings identified? statistical analysis of observations

Philosophers What is meaning, and how do Natural language argumentation


words and sentences acquire it? using intuition about counter-
How do words identify objects in examples; mathematical models
the world? (for example, logic and model
theory)
Computational Linguists How is the structure of sentences Algorithms, data structures; formal
identified? How can knowledge models of representation and
and reasoning be modeled? How reasoning; AI techniques (search
can language be used to and representation methods)
accomplish specific tasks?
What is Natural Language Processing (NLP)?
Natural Language Processing (NLP) is a process of manipulating or
understanding the text or speech by any software or machine. An analogy is
that humans interact and understand each other’s views and respond with
the appropriate answer. In NLP, this interaction, understanding, and
response are made by a computer instead of a human.
What is NLTK?
NLTK (Natural Language Toolkit) Library is a suite that contains libraries
and programs for statistical language processing. It is one of the most
powerful NLP libraries, which contains packages to make machines
understand human language and reply to it with an appropriate response.
History of NLP
Here, is are important events in the history of Natural Language Processing:
1950- NLP started when Alan Turing published an article called “Machine
and Intelligence.”
1950- Attempts to automate translation between Russian and English
1960- The work of Chomsky and others on formal language theory and
generative syntax
1990- Probabilistic and data-driven models had become quite standard.
2000- A Large amount of spoken and textual data become available
Components of NLP
Five main Component of Natural Language processing in AI are:
• Morphological and Lexical Analysis
• Syntactic Analysis
• Semantic Analysis
• Discourse Integration
• Pragmatic Analysis
Morphological and Lexical Analysis
Lexical analysis is a vocabulary that includes its words and expressions.
It depicts analyzing, identifying and description of the structure of words.
It includes dividing a text into paragraphs, words and the sentences
Individual words are analyzed into their components, and nonword tokens
such as punctuations are separated from the words.
Context Free Grammar in NLP
 The context-free grammar (CFG) is a list of rules that define the set of
all well-formed sentences in a language. ... CFGs are, in fact, the orign
of the device called BNF (Backus-Naur Form) for describing the syntax of
programming languages. CFGs were invented by the linguist Noam
Chomsky in 1957.
A context free grammar (CFG) is a forma grammar which is used to
generate all the possible patterns of strings in a given formal
language. G is a grammar, which consists of a set of production rules. It is
used to generate the strings of a language.
Context Free Grammer in NLP
Context Free Grammer in NLP
Context Free Grammer in NLP
Context Free Grammer in NLP
Context Free Grammer in NLP
Context Free Grammar in NLP
Context Free Grammar in NLP
Context Free Grammer in NLP
Morphological and Lexical Analysis
Lexical analysis is a vocabulary that includes its words and expressions.
It depicts analyzing, identifying and description of the structure of words.
It includes dividing a text into paragraphs, words and the sentences
Individual words are analyzed into their components, and nonword tokens
such as punctuations are separated from the words.
Semantic Analysis
Semantic Analysis is a structure created by the syntactic analyzer which
assigns meanings. This component transfers linear sequences of words into
structures. It shows how the words are associated with each other.
Semantics focuses only on the literal meaning of words, phrases, and
sentences. This only abstracts the dictionary meaning or the real meaning
from the given context. The structures assigned by the syntactic analyzer
always have assigned meaning
E.g.. “colorless green idea.” This would be rejected by the Symantec
analysis as colorless Here; green doesn’t make any sense.
Pragmatic Analysis
Pragmatic Analysis deals with the overall communicative and social
content and its effect on interpretation.
It means abstracting or deriving the meaningful use of language in
situations.
 In this analysis, the main focus always on what was said in reinterpreted
on what is meant.
Pragmatic analysis helps users to discover this intended effect by
applying a set of rules that characterize cooperative dialogues.
E.g., “close the window?” should be interpreted as a request instead of an
order.
Syntax analysis
The words are commonly accepted as being the smallest units of syntax.
The syntax refers to the principles and rules that govern the sentence
structure of any individual languages.
Syntax focus about the proper ordering of words which can affect its
meaning. This involves analysis of the words in a sentence by following
the grammatical structure of the sentence. The words are transformed into
the structure to show hows the word are related to each other.
Discourse Integration
It means a sense of the context. The meaning of any single sentence
which depends upon that sentences. It also considers the meaning of the
following sentence.
For example, the word “that” in the sentence “He wanted that” depends
upon the prior discourse context.Next in this NLP tutorial, we will learn
about NLP and writing systems.
NLP and Writing Systems
The kind of writing system used for a language is one of the deciding
factors in determining the best approach for text pre-processing. Writing
systems can be
Logographic: a Large number of individual symbols represent words.
Example Japanese, Mandarin
Syllabic: Individual symbols represent syllables
Alphabetic: Individual symbols represent sound
Majority of the writing systems use the Syllabic or Alphabetic system. Even
English, with its relatively simple writing system based on the Roman
alphabet, utilizes logographic symbols which include Arabic numerals,
Currency symbols (S, £), and other special symbols.
This pose following challenges
Extracting meaning(semantics) from a text is a challenge
NLP in AI is dependent on the quality of the corpus. If the domain is vast,
it’s difficult to understand context.
There is a dependence on the character set and language
How to Implement NLP
Below, given are popular methods used for Natural Learning Process:
Machine learning: The learning nlp procedures used during machine
learning. It automatically focuses on the most common cases. So when we
write rules by hand, it is often not correct at all concerned about human
errors.
Statistical inference: NLP can make use of statistical inference algorithms.
It helps you to produce models that are robust. e.g., containing words or
structures which are known to everyone.
NLP Examples
Today, Natual process learning technology is widely used technology.
Here, are common Natural Language Processing techniques:
Information retrieval & Web Search
Google, Yahoo, Bing, and other search engines base their machine translation technology on NLP deep
learning models. It allows algorithms to read text on a webpage, interpret its meaning and translate it to
another language.
Grammar Correction:
NLP technique is widely used by word processor software like MS-word for spelling correction & grammar
check.
Question Answering
Type in keywords to ask Questions in Natural Language.
Text Summarization
The process of summarising important information from a source to produce a shortened version
Machine Translation
Use of computer applications to translate text or speech from one natural language to another.
Sentiment analysis
NLP helps companies to analyze a large number of reviews on a product. It also allows their customers to
give a review of the particular product.
Future of NLP
Human readable natural language processing is the biggest Al- problem. It
is all most same as solving the central artificial intelligence problem and
making computers as intelligent as people.
Future computers or machines with the help of NLP will able to learn from
the information online and apply that in the real world, however, lots of work
need to on this regard.
Natural language toolkit or nltk become more effective
Combined with natural language generation, computers will become more
capable of receiving and giving useful and resourceful information or data.
Advantages of NLP
Users can ask questions about any subject and get a direct response within
seconds.
NLP system provides answers to the questions in natural language
NLP system offers exact answers to the questions, no unnecessary or
unwanted information
The accuracy of the answers increases with the amount of relevant
information provided in the question.
NLP process helps computers communicate with humans in their language
and scales other language-related tasks
Allows you to perform more language-based data compares to a human
being without fatigue and in an unbiased and consistent way.
Structuring a highly unstructured data source
Disadvantages of NLP
Complex Query Language- the system may not be able to provide the
correct answer it the question that is poorly worded or ambiguous.
The system is built for a single and specific task only; it is unable to adapt to
new domains and problems because of limited functions.
NLP system doesn’t have a user interface which lacks features that allow
users to further interact with the system

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy