0% found this document useful (0 votes)
14 views

Natural Language Processing

The document discusses natural language processing (NLP) including common tasks like text classification, language generation, and language interaction. It also covers related concepts like word clouds, n-gram models, and grammars.

Uploaded by

apoorvashetty82
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Natural Language Processing

The document discusses natural language processing (NLP) including common tasks like text classification, language generation, and language interaction. It also covers related concepts like word clouds, n-gram models, and grammars.

Uploaded by

apoorvashetty82
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Natural Language Processing

What is NLP?
Natural Language Processing (NLP) is a field of Artificial Intelligence (AI)
and Computer Science that is concerned with the interactions between
computers and humans in natural language. The goal of NLP is to develop
algorithms and models that enable computers to understand, interpret,
generate, and manipulate human languages.

Eg: Spam Filters: Gmail uses natural language processing (NLP) to discern
which emails are legitimate and which are spam. These spam filters look
at the text in all the emails you receive and try to figure out what it means
to see if it’s spam or not.
NLP tasks
Text and speech processing: This includes Speech recognition, text-&-
speech processing, encoding(i.e converting speech or text to machine-
readable language), etc.
Text classification: This includes Sentiment Analysis in which the machine
can analyze the qualities, emotions, and sarcasm from text and also
classify it accordingly.
Language generation: This includes tasks such as machine translation,
summary writing, essay writing, etc. which aim to produce coherent and
fluent text.
Language interaction: This includes tasks such as dialogue systems, voice
assistants, and chatbots, which aim to enable natural communication
between humans and computers.
Word Clouds
Word clouds are a popular visualization technique in data science used to
represent text data
Useful for summarizing large amounts of text data by highlighting the
most frequently occurring words.
One approach to visualizing words and counts is word clouds, which
artistically depict the words at sizes proportional to their counts.
eg:
This looks neat but doesn’t really tell us anything. A more interesting
approach might be to scatter them so that horizontal position indicates
posting popularity and vertical position indicates résumé popularity, which
produces a visualization that conveys a few insights.
N gram model
N-gram language models are a type of statistical language model used in
natural language processing (NLP).
They are based on the concept of predicting the probability of a word (or
sequence of words) occurring in a given context based on the history of
preceding words.
In an N-gram model, "N" refers to the number of words considered as
context. For example:
-> Unigram (N=1): Considers only a single word as context.
-> Bigram (N=2): Considers pairs of consecutive words as context.
-> Trigram (N=3): Considers triplets of consecutive words as context.
-> N-gram (N>3): Considers sequences of N consecutive words.
we’ll use the Requests and Beautiful Soup libraries to retrieve the data but
there are couple of issues:

i) The first is that the apostrophes (‘) in the text are actually the Unicode
character u"\u2019". We’ll create a helper function to replace them with
normal apostrophes:

For example, if you have a string containing Unicode characters


representing apostrophes like "It\u2019s a beautiful day", applying this
function to the string would result in "It's a beautiful day", where the
Unicode apostrophe \u2019 has been replaced with a standard apostrophe
(')
ii) The second issue is that once we get the text of the web page, we’ll
want to split it into a sequence of words and periods (so that we can tell
where sentences end). We can do this using re.findall:
If the content "It\u2019s a beautiful day" is present in the article-body and
we tokenize it using the provided code snippet, the document list will
contain the individual words and punctuation marks from the text.
The document list will look like:
Grammar
A different approach to modeling language is with grammars, rules for
generating acceptable sentences.
names starting with underscores refer to rules that need further
expanding, and that other names are terminals that don’t need further
processing.
"_S" is the “sentence” rule, which produces an "_NP" (“noun phrase”) rule
followed by a "_VP" (“verb phrase”) rule.
The verb phrase rule can produce either the "_V" (“verb”) rule, or the verb
rule followed by the noun phrase rule.
"_NP" rule contains itself in one of its productions. Grammars can be
recursive, which allows even finite grammars like this to generate infinitely
many different sentences.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy