0% found this document useful (0 votes)
2K views

NLP Question Paper Solution

The document defines Markov property as a process that has the property where the probability of moving to a future state depends only on the present state and not on the past. It then provides a short note on Markov models, including that they are stochastic methods for randomly changing systems that have the Markov property. It discusses two commonly applied types of Markov models - Markov chains and hidden Markov models. It also discusses how Markov models can be represented using equations, transition matrices, or graphs.

Uploaded by

Sruja Koshti
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2K views

NLP Question Paper Solution

The document defines Markov property as a process that has the property where the probability of moving to a future state depends only on the present state and not on the past. It then provides a short note on Markov models, including that they are stochastic methods for randomly changing systems that have the Markov property. It discusses two commonly applied types of Markov models - Markov chains and hidden Markov models. It also discusses how Markov models can be represented using equations, transition matrices, or graphs.

Uploaded by

Sruja Koshti
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 27

Q1. Define markov property? Write a short note on the Markov model.

Ans. Markov Property: A process has the Markov property if the probability of moving to a future
state depends only on the present state and not on the past.

 A Markov model is a stochastic method for randomly changing systems that possess the
Markov property.
 This means that, at any given time, the next state is only dependent on the current state and
is independent of anything in the past.
 Two commonly applied types of Markov model are used when the system being represented
is autonomous -- that is, when the system isn't influenced by an external agent.
 These are as follows:

1. Markov chains.

 These are the simplest type of Markov model and are used to represent systems
where all states are observable.
 Markov chains show all possible states, and between states, they show the
transition rate, which is the probability of moving from one state to another per unit
of time.
 Applications of this type of model include prediction of market crashes, speech
recognition and search engine algorithms.

2. Hidden Markov models.

 These are used to represent systems with some unobservable states.


 In addition to showing states and transition rates, hidden Markov models also
represent observations and observation likelihoods for each state.
 Hidden Markov models are used for a range of applications, including
thermodynamics, finance and pattern recognition.
 (Not the part of hidden Markov model) Another two commonly applied types of
Markov model is used when the system being represented is controlled -- that
is, when the system is influenced by a decision-making agent.
 These are as follows:

1. Markov decision processes.

 These are used to model decision-making in discrete, stochastic, sequential


environments.
 In these processes, an agent makes decisions based on reliable information.
 These models are applied to problems in artificial intelligence (AI), economics and
behavioural sciences.

2. Partially observable Markov decision processes.

 These are used in cases like Markov decision processes but with the assumption
that the agent doesn't always have reliable information.
 Applications of these models include robotics, where it isn't always possible to
know the location.
 Another application is machine maintenance, where reliable information on
machine parts can't be obtained because it's too costly to shut down the
machine to get the information.
(It is a new topic) How is Markov analysis applied?

 Markov analysis is a probabilistic technique that uses Markov models to predict


the future behaviour of some variable based on the current state.
 Markov analysis is used in many domains, including the following:
 Markov chains are used for several business applications, including predicting
customer brand switching for marketing, predicting how long people will remain
in their jobs for human resources, predicting time to failure of a machine in
manufacturing, and forecasting the future price of a stock in finance.
 Markov analysis is also used in natural language processing (NLP) and in
machine learning.
 For NLP, a Markov chain can be used to generate a sequence of words that form
a complete sentence, or a hidden Markov model can be used for named-entity
recognition and tagging parts of speech.
 For machine learning, Markov decision processes are used to represent reward
in reinforcement learning.
 A recent example of the use of Markov analysis in healthcare was in Kuwait.
 A continuous-time Markov chain model was used to determine the optimal
timing and duration of a full COVID-19 lockdown in the country, minimizing both
new infections and hospitalizations.
 The model suggested that a 90-day lockdown beginning 10 days before the
epidemic peak was optimal.
 How are Markov models represented? (It is a new topic)
 The simplest Markov model is a Markov chain, which can be expressed in
equations, as a transition matrix or as a graph.
 A transition matrix is used to indicate the probability of moving from each state
to each other state.
 Generally, the current states are listed in rows, and the next states are
represented as columns.
 Each cell then contains the probability of moving from the current state to the
next state.
 For any given row, all the cell values must then add up to one.
 A graph consists of circles, each of which represents a state, and directional
arrows to indicate possible transitions between states.
 The directional arrows are labeled with the transition probability.
 The transition probabilities on the directional arrows coming out of any given
circle must add up to one.
 Other Markov models are based on the chain representations but with added
information, such as observations and observation likelihoods.
 The transition matrix below represents shifting gears in a car with a manual
transmission.
 Six states are possible, and a transition from any given state to any other state
depends only on the current state -- that is, where the car goes from second
gear isn't influenced by where it was before second gear.
 Such a transition matrix might be built from empirical observations that show,
for example, that the most probable transitions from first gear are to second or
neutral.

Q2. Define context free grammar. Explain the role of CFG in the NLP.

Ans. A context-free grammar (CFG) is a list of rules that define the set of all well-formed sentences in
a language.

 Each rule has a left-hand side, which identifies a syntactic category, and a right-hand side,
which defines its alternative component parts, reading from left to right.
 Syntax Modelling: -
 CFGs are used to define the syntactic rules of a language.
 In NLP, a CFG can represent the syntax of a natural language, specifying how words and
phrases can be combined to form grammatical sentences.
 This is crucial for understanding the structure of sentences and the relationships between
different linguistic elements.
 Parsing:-
 CFGs are employed in parsing algorithms, which are fundamental for determining the
grammatical structure of a sentence.
 Parsing involves breaking down a sentence into its constituent parts (e.g., phrases and
clauses) and constructing a syntactic tree or structure that represents the grammatical
relationships between these parts.
 CFGs serve as the basis for many parsing algorithms, including top-down and bottom-up
parsing techniques.
 Grammatical Analysis: -
 CFGs allow linguists and NLP researchers to analyse the grammatical properties of a
language.
 By defining CFG rules, linguists can identify and categorize various sentence structures,
including the identification of nouns, verbs, adjectives, and other parts of speech.
 This analysis is vital for developing linguistic theories and understanding the nuances of
language.
 Language Generation: -
 CFGs are used in reverse for language generation tasks.
 Instead of parsing sentences, CFGs can be employed to generate syntactically correct
sentences or text.
 This is crucial in applications like machine translation, text summarization, and chatbot
responses, where coherent and grammatical language output is required.
 Ambiguity Resolution: -
 CFGs help in handling syntactic ambiguity in natural language.
 Many sentences can have multiple valid parse trees due to inherent ambiguities in language.
 CFGs provide a structured way to represent these ambiguities, which is important for
disambiguation and improving the accuracy of NLP systems.
 Grammar Checking:
 CFGs can be used in grammar checking tools to identify and correct grammatical errors in
text.
 By comparing a given sentence against the CFG rules, it's possible to flag and suggest
corrections for violations of the language's syntax.
 Machine Learning Models:
 CFGs are sometimes integrated into machine learning models for NLP tasks.
 They can serve as a source of structured linguistic knowledge, which can be combined with
statistical and neural network-based approaches to enhance the performance of various NLP
applications.

Q3. Explain word morphology

Ans. In linguistics, morphology is the study of the internal structure and functions of the words, and
how the words are formed from smaller meaningful units called morphemes.

 Morphology, the study of the correspondences between grammatical information, meaning,


and form inside words, is one of the central linguistic disciplines.
 In simpler terms, morphology is the knowledge of meaningful components of the words.
 It is the study of the following different aspects of natural language;
(Important parts of a morphological processor)
 Lexicon
 It refers to the dictionary of words (stem/root word), their categories (noun, verb, adjective,
etc.), their sub-categories (singular noun, plural noun, etc.) and the affixes that can be
attached to these stems.
 Orthographic rules
 It refers to the spelling rules used in a particular language to model the spelling changes that
occur in a word.
 For example, when a stem ‘cook’ is attached with the morpheme ‘ing’, it becomes a new
valid word with the spelling ‘cooking’.
 What happened here was the concatenation of two strings. This is not true for all words.
 If we attach the morpheme ‘ing’ with the word ‘care’, the spelling changes to ‘caring’ rather
than ‘careing’.
 In this case, it was not just concatenation but it also involved removal of the letter ‘e’.
 Morphotactics
 In a natural language, a word may have many morphemes attached to it as affixes (prefix,
suffix, infix, circumfix).
 Morphotactics is about placing morphemes with stem to form a meaningful word.
 It gives the details about which classes of morpheme can follow which other classes
of morpheme.
 For example, in English language the plural morpheme (s or es) follows the stem.
 Types of morphology:
 The following are the broad classes of morphology;
1. Inflectional morphology - Inflection creates different forms of the same word
 It is one of the ways to combine morphemes with stems.
 Inflectional morphology conveys grammatical information, such as number, tense,
agreement or case.
 Due to inflectional morphological process, the meaning and categories of the new inflected
words usually do not change.
 That is, a noun can be inflected to a noun while adding affixes, a verb can be inflected to a
verb in different tense in English.
 We would say, the root word (stem) is inflected to form other words of same meaning and
category.
 Inflection creates different forms of the same word.
 In English, only nouns and verbs can be inflected (sometimes adjectives too). Inflectional
morphemes are very less when compared with some other languages.
 Example:
Categor Stem Affixes Inflected word
y
Noun word -s words
box -es boxes
Verb treat -s treats
-ing treating
-ed treated

 In the above example,


 The inflectional morpheme '-s' is combined with the noun stem 'word' to create plural
noun 'words'.
 The inflectional morpheme '-ing' is combined with the verb stem 'treat' to create a
gerund 'treating'.
2. Derivational morphology - Derivation creates different words from the same lemma
 Derivation is the process of creating new words from a stem/base form of a word.
 One of the most common ways to derive new words is to combine derivational affixes
with root words (stems).
 The new words formed through derivational morphology may be a stem for another
affix.
 We would say, new words are derived from the root words in this type of morphology.
 English derivation is one of the complex derivations. It is because of one or more of the
following reasons;
 Less productive. That is, a morpheme added with a set of verbs to make new meaningful
words cannot always be added with all verbs.
 For example, the base word 'summarize' can be added with the grammatical
morpheme '-ation' results in a word 'summarization' whereas this morpheme cannot be
added with all the verbs to make similar effects.
 Complex meaning differences among nominalizing suffixes.
 For example, the words 'conformation' and 'conformity' both derived from the word
stem 'conform' but meanings are entirely different.
 Derivation creates different words from the same lemma.
 Example:
Category Stem Affixes Derived word Target category
Noun vapor -ize vaporize Verb
Verb read -er reader Noun

Adjective real -ize realize Verb


Noun mouth -ful mouthful Adjective

3. Compounding - Combination of multiple word stems together


4. Cliticization - Combination of a word stem with a clitic
Q4. Write a short note on machine translation.

Ans. What is machine translation?

 Machine translation is the process of using artificial intelligence to automatically translate


text from one language to another without human involvement.
 Modern machine translation goes beyond simple word-to-word translation to communicate
the full meaning of the original language text in the target language.
 It analyzes all text elements and recognizes how the words influence one another.
 What are the benefits of machine translation?
 Human translators use machine translation services to translate faster and more efficiently.
 We give some benefits of machine translation below:
1. Automated translation assistance
 Machine translation provides a good starting point for professional human
translators.
 Many translation management systems integrate one or more machine translation
models into their workflow.
 They have settings to run translations automatically, then send them to human
translators for post-editing.
2. Speed and volume
 Machine translation works very fast, translating millions of words almost
instantaneously.
 It can translate large amounts of data, such as real-time chat or large-scale legal
cases.
 It can also process documents in a foreign language, search for relevant terms, and
remember those terms for future applications.
3. Large language selection
 Many major machine translation providers offer support for 50-100+ languages.
 Translations also happen simultaneously for multiple languages, which is useful for
global product rollouts and documentation updates.
4. Cost-effective translation
 Machine translation increases productivity and the ability to deliver translations
faster, reducing the time to market.
 There is less human involvement in the process as machine translation provides
basic but valuable translations, reducing both the cost and time of delivery.
 For example, in high-volume projects, you can integrate machine translation with
your content management systems to automatically tag and organize the content
before translating it to different languages.
 What are some use cases of machine translation?
 here are several use cases of machine translation, such as those given below:
1. Internal communication
 For a company operating in different countries across the world,
communication can be difficult to manage.
 Language skills can vary from employee to employee, and some may not
understand the company’s official language well enough.
 Machine translation helps to lower or eliminate the language barrier in
communication. Individuals quickly obtain a translation of the text and
understand the content's core message.
 You can use it to translate presentations, company bulletins, and other
common communication.
2. External communication
 Companies use machine translation to communicate more efficiently with
external stakeholders and customers.
 For instance, you can translate important documents into different languages
for global partners and customers.
 If an online store operates in many different countries, machine translation can
translate product reviews so customers can read them in their own language.
3. Data analysis
 Some types of machine translation can process millions of user-generated
comments and deliver highly accurate results in a short timeframe.
 Companies translate the large amount of content posted on social media and
websites every day, and translate it for analytics.
 For example, they can automatically analyze customer opinions written in various
languages.
4. Online customer service
 With machine translation, brands can interact with customers all over the world, no
matter what language they speak.
 For example, they can use machine translation to:
 Accurately translate requests from customers all over the world

 Increase the scale of live chat and automate customer service emails
 Improve the customer experience without hiring more employees

5. Legal research
 The legal department uses machine translation for preparing legal documents in
different countries.
 With machine translation, a large amount of content becomes available for analysis
that would have been difficult to process in different languages.

What are the different approaches to machine translation?


 In machine translation, the original text or language is called source language, and the
language you want to translate it to is called the target language.
 Machine translation works by following a basic two-step process:

1. Decode the source language meaning of the original text


2. Encode the meaning into the target language
 We give some common approaches on how language translation technology implements
this machine translation process.

1. Rule-based machine translation

 Language experts develop built-in linguistic rules and bilingual dictionaries for specific
industries or topics.
 Rule-based machine translation uses these dictionaries to translate specific content
accurately.
 The steps in the process are:

1. The machine translation software parses the input text and creates a transitional
representation
2. It converts the representation into target language using the grammar rules and dictionaries
as a reference
 Pros and cons
 Rule-based machine translation can be customized to a specific industry or topic.
 It is predictable and provides quality translation. However, it produces poor results if the
source text has errors or uses words not present in the built-in dictionaries.
 The only way to improve it is by manually updating dictionaries regularly.

2. Statistical machine translation

 Instead of relying on linguistic rules, statistical machine translation uses machine learning to
translate text.
 The machine learning algorithms analyze large amounts of human translations that already
exist and look for statistical patterns.
 The software then makes an intelligent guess when asked to translate a new source text.
 It makes predictions on the basis of the statistical likelihood that a specific word or phrase
will be with another word or phrase in the target language.

3. Syntax-based machine translation

 Syntax-based machine translation is a sub-category of statistical machine translation.


 It uses grammatical rules to translate syntactic units.
 It analyzes sentences to incorporate syntax rules into statistical translation models.
 Pros and cons
 Statistical methods require training on millions of words for every language pair.
 However, with sufficient data the machine translations are accurate.

4. Neural machine translation

 Neural machine translation uses artificial intelligence to learn languages, and to continuously
improve that knowledge using a specific machine learning method called neural networks.
 It often works in combination with statistical translation methods.
 Neural network
 A neural network is an interconnected set of nodes inspired by the human brain.
 It is an information system where input data passes through several interconnected nodes to
generate an output.
 Neural machine translation software uses neural networks to work with enormous datasets.
 Each node makes one attributed change of source text to target text until the output node
gives the final result.
 Neural machine translation vs other translation methods
 Neural networks consider the whole input sentence at each step when producing the output
sentence, Other machine translation models break an input sentence into sets of words and
phrases, mapping them to a word or sentence in the target language.
 Neural machine translation systems can address many limitations of other methods and
often produce better quality translations.

5. Hybrid machine translation

 Hybrid machine translation tools use two or more machine translation models on one piece
of software.
 You can use the hybrid approach to improve the effectiveness of a single translation model.
 This machine translation process commonly uses rule-based and statistical machine
translation subsystems.
 The final translation output is the combination of the output of all subsystems.
 Pros and cons
 Hybrid machine translation models successfully improve translation quality by overcoming
the issues linked with single translation methods.
 What is a computer-assisted translation tool?
 Computer-assisted translation (CAT) tools work alongside machine translation software to
support text translation.
 CAT tools automate translation-related tasks such as editing, managing, and storing
translations.
 Text is inputted into the CAT software and divided into segments, such as phrases,
sentences, or paragraphs.
 The software saves each segment and its translation in a database, speeding up the
translation process and guaranteeing consistency with previous translations.
 Many global companies use CAT software tools to automate projects that require
translation.
 Automated translation
 Automated translation refers to any automation built into the CAT tool to carry out
repetitive translation-related tasks.
 Automated translation works with triggers embedded in the text that tell the system to use
automation.
 For example, you can use it to insert commonly used text into documents from a database.
 What is the most accurate machine translation technology?
 Neural machine translation is universally accepted as the most accurate, versatile, and fluent
machine translation approach.
 Since its invention in the mid-2010s, neural machine translation has become the most
advanced machine translation technology.
 It is more accurate than statistical machine translation, from fluency to generalization.
 It is now considered the standard in machine translation development.
 The performance of a machine translator depends on several factors, including the:

1. Machine translation engine or technology


2. Language pair
3. Available training data
4. Text types for translation. As the software performs more translations for a specific
language or domain, it will produce higher quality output. Once trained, neural
machine translation becomes more accurate, faster, and easier to add languages

 Can machine translation replace human translation?

 Machine translation can replace human translation in a few instances where it makes sense
and is required in high volumes.

 For example, many service-related companies use machine translation to help customers via
an instant chat feature or quickly respond to emails.

 However, if you translate more in-depth content, such as web pages or mobile applications,
the translation may be inaccurate.

 It is important to have a human translator edit the content before use

Q5. Name two popular application of natural language programming and explain them briefly.

Ans. 1) Chatbots
 Chatbots are a form of artificial intelligence that are programmed to interact with humans
in such a way that they sound like humans themselves.
 Depending on the complexity of the chatbots, they can either just respond to specific
keywords or they can even hold full conversations that make it tough to distinguish them
from humans.
 Chatbots are created using Natural Language Processing and Machine Learning, which
means that they understand the complexities of the English language and find the actual
meaning of the sentence and they also learn from their conversations with humans and
become better with time.
 Chatbots work in two simple steps.
 First, they identify the meaning of the question asked and collect all the data from the
user that may be required to answer the question.
 Then they answer the question appropriately.
2) Autocomplete in Search Engines
 Have you noticed that search engines tend to guess what you are typing and automatically
complete your sentences?
 For example, on typing “game” in Google, you may get further suggestions for “game of
thrones”, “game of life” or if you are interested in maths then “game theory”.
 All these suggestions are provided using autocomplete that uses Natural Language
Processing to guess what you want to ask.
 Search engines use their enormous data sets to analyze what their customers are probably
typing when they enter particular words and suggest the most common possibilities.
 They use Natural Language Processing to make sense of these words and how they are
interconnected to form different sentences.
3) Voice Assistants
 These days voice assistants are all the rage! Whether its Siri, Alexa, or Google Assistant,
almost everyone uses one of these to make calls, place reminders, schedule meetings, set
alarms, surf the internet, etc.
 These voice assistants have made life much easier.
 But how do they work?
 They use a complex combination of speech recognition, natural language understanding,
and natural language processing to understand what humans are saying and then act on it.
 The long term goal of voice assistants is to become a bridge between humans and the
internet and provide all manner of services based on just voice interaction.
 However, they are still a little far from that goal seeing as Siri still can’t understand what
you are saying sometimes!
4) Language Translator
 Want to translate a text from English to Hindi but don’t know Hindi?
 Well, Google Translate is the tool for you!
 While it’s not exactly 100% accurate, it is still a great tool to convert text from one
language to another.
 Google Translate and other translation tools as well as use Sequence to sequence
modeling that is a technique in Natural Language Processing.
 It allows the algorithm to convert a sequence of words from one language to another
which is translation.
 Earlier, language translators used Statistical machine translation (SMT) which meant they
analyzed millions of documents that were already translated from one language to
another (English to Hindi in this case) and then looked for the common patterns and basic
vocabulary of the language.
 However, this method was not that accurate as compared to Sequence to sequence
modeling.
5) Sentiment Analysis
 Almost all the world is on social media these days! And companies can use sentiment
analysis to understand how a particular type of user feels about a particular topic,
product, etc.
 They can use natural language processing, computational linguistics, text analysis, etc. to
understand the general sentiment of the users for their products and services and find out
if the sentiment is good, bad, or neutral.
 Companies can use sentiment analysis in a lot of ways such as to find out the emotions of
their target audience, to understand product reviews, to gauge their brand sentiment, etc.
 And not just private companies, even governments use sentiment analysis to find popular
opinion and also catch out any threats to the security of the nation.
6) Grammar Checkers
 Grammar and spelling is a very important factor while writing professional reports for your
superiors even assignments for your lecturers.
 After all, having major errors may get you fired or failed! That’s why grammar and spell
checkers are a very important tool for any professional writer.
 They can not only correct grammar and check spellings but also suggest better synonyms
and improve the overall readability of your content.
 And guess what, they utilize natural language processing to provide the best possible piece
of writing!
 The NLP algorithm is trained on millions of sentences to understand the correct format.
 That is why it can suggest the correct verb tense, a better synonym, or a clearer sentence
structure than what you have written.
 Some of the most popular grammar checkers that use NLP include Grammarly,
WhiteSmoke, ProWritingAid, etc.
7) Email Classification and Filtering
 Emails are still the most important method for professional communication.
 However, all of us still get thousands of promotional Emails that we don’t want to read.
 Thankfully, our emails are automatically divided into 3 sections namely, Primary, Social,
and Promotions which means we never have to open the Promotional section!
 But how does this work? Email services use natural language processing to identify the
contents of each Email with text classification so that it can be put in the correct section.
 This method is not perfect since there are still some Promotional newsletters in Primary,
but its better than nothing. In more advanced cases, some companies also use specialty
anti-virus software with natural language processing to scan the Emails and see if there
are any patterns and phrases that may indicate a phishing attempt on the employees.
Q6. Write a short note on constituency parsing
Ans. Constituency Parsing is the process of analyzing the sentences by breaking down it into sub-
phrases also known as constituents.
 These sub-phrases belong to a specific category of grammar like NP (noun phrase) and
VP(verb phrase).
 Constituency parsing is a natural language processing technique that is used to analyze the
grammatical structure of sentences.
 It is a type of syntactic parsing that aims to identify the constituents, or subparts, of a
sentence and the relationships between them.
 The output of a constituency parser is typically a parse tree, which represents the
hierarchical structure of the sentence.
 The process of constituency parsing involves identifying the syntactic structure of a
sentence by analyzing its words and phrases.
 This typically involves identifying the noun phrases, verb phrases, and other constituents,
and then determining the relationships between them.
 The parser uses a set of grammatical rules and a grammar model to analyze the sentence
and construct a parse tree.
 Constituency parsing is an important step in natural language processing and is used in a
wide range of applications, such as natural language understanding, machine translation,
and text summarization.
 Constituency parsing is different from dependency parsing, which aims to identify the
syntactic relations between words in a sentence.
 Constituency parsing focuses on the hierarchical structure of the sentence, while
dependency parsing focuses on the linear structure of the sentence.
 Both techniques have their own advantages and can be used together to better
understand a sentence.
 Some challenges in Constituency Parsing are long-distance dependencies, syntactic
ambiguity, and the handling of idiomatic expressions, which makes the parsing process
more complex.
 Applications of Constituency Parsing
 Constituency parsing is a process of identifying the constituents (noun phrases, verbs,
clauses, etc.) in a sentence and grouping them into a tree-like structure that represents
the grammatical relationships among them.
 The following are some of the applications of constituency parsing:
1. Natural Language Processing (NLP) – It is used in various NLP tasks such as text
summarization, machine translation, question answering, and text classification.
2. Information Retrieval – It is used to extract information from large corpora and to
index it for efficient retrieval.
3. Text-to-Speech – It helps in generating human-like speech by understanding the
grammar and structure of the text.
4. Sentiment Analysis – It helps in determining the sentiment of a text by identifying
positive, negative, or neutral sentiments in the constituents.
5. Text-based Games and Chatbots – It helps in generating more human-like responses in
text-based games and chatbots.
6. Text Summarization – It is used to summarize large texts by identifying the most
important constituents and representing them in a compact form.
7. Text Classification – It is used to classify text into predefined categories by analyzing
the constituent structure and relationships.
Q7. Write a note on Word sense disambiguation.
Ans. Word Sense Disambiguation (WSD) is a subfield of Natural Language Processing (NLP) that
deals with determining the intended meaning of a word in a given context.
 It is the process of identifying the correct sense of a word from a set of possible senses,
based on the context in which the word appears.
 WSD is important for natural language understanding and machine translation, as it can
improve the accuracy of these tasks by providing more accurate word meanings.
 Some common approaches to WSD include using WordNet, supervised machine learning,
and unsupervised methods such as clustering.
 The noun ‘star’ has eight different meanings or senses. An idea can be mapped to each
sense of the word. For example,
 “He always wanted to be a Bollywood star.” The word ‘star’ can be described as
“A famous and good singer, performer, sports player, actor, personality, etc.”
 “The Milky Way galaxy contains between 200 and 400 billion stars”. In this, the
word star means “a big ball of burning gas in space that we view as a point of light
in the night sky.”
 Difficulties in Word Sense Disambiguation
 There are some difficulties faced by Word Sense Disambiguation (WSD).
1. Different Text-Corpus or Dictionary:
 One issue with word sense disambiguation is determining what the senses are because
different dictionaries and thesauruses divide words into distinct senses.
 Some academics have proposed employing a specific lexicon and its set of senses to
address this problem.
 In general, however, research findings based on broad sense distinctions have
outperformed those based on limited ones.
 The majority of researchers are still working on fine-grained WSD.
2. PoS Tagging:
 Part-of-speech tagging and sense tagging have been shown to be very tightly coupled
in any real test, with each potentially constraining the other.
 Both disambiguating and tagging with words are involved in WSM part-of-speech
tagging.
 However, algorithms designed for one do not always work well for the other, owing to
the fact that a word’s part of speech is mostly decided by the one to three words
immediately adjacent to it, whereas a word’s sense can be determined by words
further away.
 Sense Inventories for Word Sense Disambiguation
 Sense Inventories are the collection of abbreviations and acronyms with their possible
senses.
 Some of the examples used in Word Sense Disambiguation are:
1. Princeton WordNet: is a vast lexicographic database of English and other languages that is
manually curated.
 For WSD, this is the de facto standard inventory.
 Its well-organized Synsets, or clusters of contextual synonyms, are nodes in a network.
2. BabelNet: is a multilingual dictionary that covers both lexicographic and encyclopedic
terminology.
 It was created by semi-automatically mapping numerous resources, including
WordNet, multilingual versions of WordNet, and Wikipedia.
3. Wiktionary: a collaborative project aimed at creating a dictionary for each language
separately, is another inventory that has recently gained popularity.
Approaches for Word Sense Disambiguation
 There are many approaches to Word Sense Disambiguation. The three main
approaches are given below:
1. Supervised: The assumption behind supervised approaches is that the context can
supply enough evidence to disambiguate words on its own (hence, world
knowledge and reasoning are deemed unnecessary).
 Supervised methods for Word Sense Disambiguation (WSD) involve training a model
using a labeled dataset of word senses.
 The model is then used to disambiguate the sense of a target word in new text.
 Some common techniques used in supervised WSD include:
1. Decision list: A decision list is a set of rules that are used to assign a sense to a target word
based on the context in which it appears.
2. Neural Network: Neural networks such as feedforward networks, recurrent neural
networks, and transformer networks are used to model the context-sense relationship.
3. Support Vector Machines: SVM is a supervised machine learning algorithm used for
classification and regression analysis.
4. Naive Bayes: Naive Bayes is a probabilistic algorithm that uses Bayes’ theorem to classify
text into predefined categories.
5. Decision Trees: Decision Trees are a flowchart-like structure in which an internal node
represents feature(or attribute), the branch represents a decision rule, and each leaf node
represents the outcome.
 Random Forest: Random Forest is an ensemble learning method for classification,
regression, and other tasks that operate by constructing a multitude of decision trees at
training time and outputting the class that is the mode of the classes.
 Supervised WSD Exploiting Glosses: Textual definitions are a prominent source of
information in sense inventories (also known as glosses).
 Definitions, which follow the format of traditional dictionaries, are a quick and easy
way to clarify sense distinctions
 Purely Data-Driven WSD: In this case, a token tagger is a popular baseline model that
generates a probability distribution over all senses in the vocabulary for each word in a
context.
 Supervised WSD Exploiting Other Knowledge: Additional sources of knowledge, both
internal and external to the knowledge base, are also beneficial to WSD models.
 Some researchers use BabelNet translations to fine-tune the output of any WSD
system by comparing the output senses’ translations to the target’s translations
provided by an NMT system.

2. Unsupervised:
 The underlying assumption is that similar senses occur in similar contexts, and thus senses
can be induced from the text by clustering word occurrences using some measure of
similarity of context.
 Using fixed-size dense vectors (word embeddings) to represent words in context has
become one of the most fundamental blocks in several NLP systems.
 Traditional word embedding approaches can still be utilized to improve WSD, despite the
fact that they conflate words with many meanings into a single vector representation.
 Lexical databases (e.g., WordNet, ConceptNet, BabelNet) can also help unsupervised
systems map words and their senses as dictionaries, in addition to word embedding
techniques.
3. Knowledge-Based:
 It is built on the idea that words used in a text are related to one another, and that this
relationship can be seen in the definitions of the words and their meanings.
 The pair of dictionary senses having the highest word overlap in their dictionary meanings
are used to disambiguate two (or more) words.
 Lesk Algorithm is the classical algorithm based on Knowledge-Based WSD.
 Lesk algorithm assumes that words in a given “neighborhood” (a portion of text) will have
a similar theme.
 The dictionary definition of an uncertain word is compared to the terms in its
neighborhood in a simplified version of the Lesk algorithm.
Q8. Explain information extraction with example.
Ans. Information extraction (IE) is the automated retrieval of specific information related to a
selected topic from a body or bodies of text.

 Information extraction tools make it possible to pull information from text documents,
databases, websites or multiple sources.
 IE may extract info from unstructured, semi-structured or structured, machine-readable
text.
 Usually, however, IE is used in natural language processing (NLP) to extract structured from
unstructured text.
 Information extraction depends on named entity recognition (NER), a sub-tool used to find
targeted information to extract.
 NER recognizes entities first as one of several categories such as location (LOC), persons
(PER) or organizations (ORG).
 Once the information category is recognized, an information extraction utility extracts the
named entity’s related information and constructs a machine-readable document from it,
which algorithms can further process to extract meaning.
 IE finds meaning by way of other subtasks including co-reference resolution, relationship
extraction, language and vocabulary analysis and sometimes audio extraction.
 IE dates back to the early days of Natural Language Processing of the 1970’s.
 JASPER is a system for IE that for Reuters by Carnegie Melon University is an early example.
 Current efforts in multimedia document processing in IE include automatic annotation and
content recognition and extraction from images and video could be seen as IE as well.
 Because of the complexity of language, high-quality IE is a challenging task for artificial
intelligence (AI) systems.

OR

 Information extraction (IE)


 The task of automatically extracting structured information from unstructured and/or semi-
structured machine-readable documents and other electronically represented sources is
known as information extraction (IE).

 When there are a significant number of customers, manually assessing Customer Feedback,
for example, can be tedious, error-prone, and time-consuming.
 There's a good chance we'll overlook a dissatisfied consumer.
 Fortunately, sentiment analysis can aid in the improvement of customer support
interactions' speed and efficacy.
 By doing sentiment analysis on all the incoming tickets and prioritizing them above the
others, one can quickly identify the most dissatisfied customers or the most important
issues.
 One might also allocate tickets to the appropriate individual or team to handle them.
 As a result, Consumer satisfaction will improve dramatically.
 General Pipeline of the Information Extraction Process
 The following steps are often involved in extracting structured information from unstructured
texts:
1. Initial processing.
2. Proper names identification.
3. Parsing.
4. Extraction of events and relations.
5. Anaphora resolution.
6. Output result generation.

1) Initial processing
 The first step is to break down a text into fragments such as zones, phrases, segments, and
tokens.
 This function can be performed by tokenizers, text zoners, segmenters, and splitters, among
other components.
 In the initial processing stage, part-of-speech tagging, and phrasal unit identification (noun
or verb phrases) are usually the next tasks.
2) Proper names identification
 One of the most important stages in the information extraction chain is the identification of
various classes of proper names, such as names of people or organizations, dates, monetary
amounts, places, addresses, and so on.
 They may be found in practically any sort of text and are widely used in the extraction
process.
 Regular expressions, which are a collection of patterns, are used to recognize these names.
3) Parsing
 The syntactic analysis of the sentences in the texts is done at this step.
 After recognizing the fundamental entities in the previous stage, the sentences are
processed to find the noun groups that surround some of those entities and verb groups.
 At the pattern matching step, the noun and verb groupings are utilized as sections to begin
working on.
4) Extraction of events and relations
 This stage establishes relations between the extracted ideas.
 This is accomplished by developing and implementing extraction rules that describe various
patterns.
 The text is compared to certain patterns, and if a match is discovered, the text element is
labeled and retrieved later.
5) Coreference or Anaphora resolution
 Coreference resolution is used to identify all of the ways the entity is named throughout the
text.
 The step where noun phrases are decided if they relate to the same entity or not is
called coreference or anaphora resolution.
6) Output results generation
 This stage entails converting the structures collected during the preceding processes into
output templates that follow the format defined by the user.
 It might comprise a variety of normalization processes.
7) Spacy:
 Spacy is a Python library for advanced natural language processing.
 It is designed for production use and aids in the development of applications that process
and understand large amounts of text.
 It can be used to create information extraction or natural language understanding systems,
as well as to preprocess text for deep learning.
 Information Extraction Techniques Using Natural Language Processing
1. Regular Expression.
2. Part-of-speech tagging.
3. Named Entity Recognition.
4. Topic Modeling.
5. Rule-Based Matching.

1. Regular Expression:

 A regular expression (abbreviated regex or regexp, and sometimes known as a rational


expression) is a string of characters that defines a search pattern.
 A regular expression, in other words, is a pattern that characterizes a collection of strings.
 Regular expressions are built similarly to arithmetic expressions, by combining smaller
expressions with various operators.
 Although regular expressions precisely identify fine information, circumstances around the
underlying fine information, which might assist in precisely locating information, are
overlooked when using regular expressions alone.
 As a result, regular expressions are commonly considered as a fundamental approach that
must be applied appropriately in order to achieve high extraction performance.

2. Part-of-speech tagging:

 Part-of-speech is a term that refers to a certain part of speech. Part-of-speech Tagging is a


common Natural Language Processing procedure that involves classifying words in a text
(corpus) in accordance with a certain part of speech, based on the word's definition and
context.
 The default tagging is the first phase in the part-of-speech tagging process.
 It's done with the help of the DefaultTagger class.
 The single parameter to the DefaultTagger class is 'tag.'
 A single noun is designated by the abbreviation NN.
 A trained component contains binary data that is generated by giving a system enough
instances for it to make language-specific predictions — for example, in English, a word after
“the” is most often a noun.

3. Named Entity Recognition:

 Named-entity recognition (NER) is a subtask of information extraction that seeks to locate


and classify named entities mentioned in unstructured text into pre-defined categories such
as person names, organizations, locations, and so on.
 Any word or group of words that consistently refers to the same item is considered
an entity.
 A two-step approach lies at the heart of each NER model:

1. Detect a named entity.

2. Categorize the entity.

 NER is appropriate for any scenario where a high-level overview of a significant amount of
text is required.
 You can rapidly categorize texts based on their relevancy or similarity with NER and
comprehend the subject or topic.

4. Topic Modeling:

 A topic model is a form of the statistical model used in machine learning and natural
language processing to find abstract "topics" that appear in a collection of documents.
 Topic Modeling is an unsupervised learning method for clustering documents and identifying
topics based on their contents.
 It works in the same way as the K-Means algorithm and Expectation-Maximization.
 We will have to evaluate individual words in each document to uncover topics and assign
values to each depending on the distribution of these terms because we are clustering texts.

5. Rule-Based Matching:

 When opposed to using regular expressions on raw text, rule-based matcher engines and
components allow you to access not only the words and phrases you want but also the
tokens and their relationships within the document.
 This enables easy access to and examination of the tokens in the area, as well as the merging
of spans into single tokens and the addition of entries to defined entities.
6. Token-based matching
 Token annotations can be referred to by rules in this case.
 You can also use the rule matcher to pass in a custom callback to act on matches.
 You may also attach patterns to entity IDs to provide basic entity linking and disambiguation.
 You may utilize the PhraseMatcher, which takes Doc objects as match patterns, to match big
terminology lists.
6. Pharse Matching
 If you need to match large terminology lists, instead of token patterns, you may use
the Phrase Matcher to generate Doc objects, which is much more efficient in the
long run.
 Doc patterns may contain one or more tokens.
Q9. Write a short note on dependency parsing

 Dependency parsing is a natural language processing technique that is used to analyze the
grammatical structure of sentences.
 It is a type of syntactic parsing that aims to identify the relationships, or dependencies,
between words in a sentence.
 The output of a dependency parser is typically a dependency tree or a graph, which
represents the relationships between the words in the sentence.
 The process of dependency parsing involves identifying the syntactic relationships
between words in a sentence.
 This typically involves identifying the subject, object, and other grammatical elements, and
then determining the relationships between them.
 The parser uses a set of grammatical rules and a grammar model to analyze the sentence
and construct a dependency tree or graph.
 Dependency parsing is an important step in natural language processing and is used in a
wide range of applications, such as natural language understanding, machine translation,
and text summarization.
 Dependency parsing is different from constituency parsing, which aims to identify the
hierarchical structure of a sentence.
 Dependency parsing focuses on the linear structure of the sentence and the relationships
between words, while constituency parsing focuses on the hierarchical structure of the
sentence.
 Both techniques have their own advantages and can be used together to better
understand a sentence.
 Some challenges in Dependency Parsing are the handling of long-distance dependencies,
syntactic ambiguity, and the handling of idiomatic expressions, which makes the parsing
process more complex.
 Applications of Dependency Parsing
 Dependency parsing is a process of analyzing the grammatical structure of a sentence by
identifying the dependencies between the words in a sentence and representing them as a
directed graph.
 The following are some of the applications of dependency parsing:
1. Named Entity Recognition (NER) – It helps in identifying and classifying named entities in
a text such as people, places, and organizations.
2. Part-of-Speech (POS) Tagging – It helps in identifying the parts of speech of each word in a
sentence and classifying them as nouns, verbs, adjectives, etc.
3. Sentiment Analysis – It helps in determining the sentiment of a sentence by analyzing the
dependencies between the words and the sentiment associated with each word.
4. Machine Translation – It helps in translating sentences from one language to another by
analyzing the dependencies between the words and generating the corresponding
dependencies in the target language.
5. Text Generation – It helps in generating text by analyzing the dependencies between the
words and generating new words that fit into the existing structure.
6. Question Answering – It helps in answering questions by analyzing the dependencies
between the words in a question and finding relevant information in a corpus.

Q10. What vanishing gradient problem? How did LSTM overcome this problem?

Ans. The vanishing gradient problem is a critical issue in neural network training, particularly in the
field of Natural Language Processing (NLP).
 It occurs when gradients become extremely small as they are back-propagated through the
layers of a deep neural network.
 This problem can hinder the training of deep networks, making it challenging to learn long-
range dependencies in sequences, a common requirement in NLP tasks. Long Short-Term
Memory (LSTM) networks were specifically designed to address the vanishing gradient
problem and have revolutionized the field of NLP by enabling the modeling of sequential
data with long-term dependencies.
 Vanishing Gradient Problem: To understand the vanishing gradient problem, we must delve
into how neural networks learn.
 During training, gradients are computed with respect to the loss function, and these
gradients guide weight updates using gradient descent or its variants.
 In deep networks, when back-propagating gradients through multiple layers, the chain rule
is applied repeatedly.
 If the weights in these layers are small (e.g., close to zero) and the activation functions used
(e.g., sigmoid or hyperbolic tangent) squash their inputs to small ranges, the gradient values
can become vanishingly small as they are propagated backward. Consequently, the early
layers of the network receive negligible updates, impeding learning.
 LSTM: A Solution to the Vanishing Gradient Problem in NLP: Long Short-Term Memory
(LSTM) networks were introduced to address the vanishing gradient problem while modeling
sequential data.
 LSTM is a type of recurrent neural network (RNN) that incorporates specialized mechanisms
to learn and retain information over long sequences. It achieves this through three key
components:

1. Memory Cells: LSTMs are equipped with memory cells that can store information for
extended periods. These cells have self-connections, which allow them to regulate the flow
of information. This enables the network to decide when to update, read, or forget
information, making it resistant to the vanishing gradient problem.
2. Gates: LSTMs employ three gates - the input gate, forget gate, and output gate - each
controlled by a sigmoid activation function. These gates control the flow of information into
and out of the memory cell. The forget gate determines what information should be
discarded from the cell, the input gate decides what information to add, and the output gate
controls what information to output. The use of these gates enables LSTM to learn long-
term dependencies effectively.
3. Gradient Flow: LSTMs use the chain rule during backpropagation, but the introduction of the
gates and memory cells helps in mitigating the vanishing gradient problem. The gating
mechanisms allow gradients to flow more freely, as they can prevent unnecessary updates
when needed and retain crucial information over long sequences. This controlled gradient
flow is pivotal for training deep networks in NLP tasks.

 Benefits of LSTM in NLP: LSTMs have significantly improved NLP tasks that require modeling
sequences with long-range dependencies. Here are some key benefits:

1. Language Modeling: LSTMs excel in language modeling tasks, such as predicting the next
word in a sentence or generating coherent text. They can capture the intricate structure of
language and produce contextually relevant outputs.
2. Machine Translation: In tasks like machine translation, where understanding the context of
a sentence is crucial, LSTMs enable the modeling of dependencies that span entire
sentences, making translations more accurate.
3. Speech Recognition: LSTMs have found applications in speech recognition systems, where
they handle the challenges of variable-length audio input and complex phonetic patterns,
thanks to their ability to capture long-range dependencies.
4. Sentiment Analysis: When analyzing sentiment in text, LSTMs can capture nuanced
sentiment shifts that may occur over extended pieces of text, leading to more accurate
sentiment analysis.

OR

 The vanishing gradient problem is a significant challenge in training deep neural networks,
particularly in the field of Natural Language Processing (NLP).
 To understand this problem, let's delve into the basics of neural networks and gradient
descent.
 Neural networks are composed of layers of interconnected neurons, and during training,
they learn by adjusting their internal parameters to minimize a loss function.
 This adjustment is achieved through a process called gradient descent, where gradients
(partial derivatives) of the loss with respect to the network's parameters are computed and
used to update these parameters.
 However, in deep neural networks with many layers, especially recurrent neural networks
(RNNs) and traditional feedforward networks, the gradients can become extremely small as
they are backpropagated through the network during training.
 When the gradients become too small, it becomes difficult for the network to make
meaningful weight updates, resulting in slow or stalled training.
 In NLP, the vanishing gradient problem is particularly pronounced due to the sequential
nature of text data.
 Consider a recurrent neural network (RNN) processing a long sentence.
 During backpropagation, gradients are calculated for each time step, and these gradients are
multiplied at each step as they are backpropagated through time.
 If the model encounters a long sequence of words, the gradients can become vanishingly
small as they propagate backward through the sequence.
 This means that the model struggles to capture long-range dependencies and contextual
information in the text.
 The vanishing gradient problem severely limits the ability of traditional RNNs to capture
information from distant time steps, which is crucial for many NLP tasks like language
modeling, machine translation, and sentiment analysis.
 LSTM (Long Short-Term Memory) is a type of recurrent neural network architecture
designed to address the vanishing gradient problem.
 It was introduced by Hochreiter and Schmidhuber in 1997.
 LSTM networks achieve this by introducing specialized memory cells and gating mechanisms.
 Here's how LSTM overcomes the vanishing gradient problem:

1. Memory Cells: LSTMs are equipped with memory cells that can store information over long
sequences.
 These memory cells are responsible for capturing and retaining relevant information,
thus mitigating the vanishing gradient issue.
2. Gating Mechanisms: LSTMs have three gating mechanisms: the input gate, forget gate, and
output gate.
 These gates regulate the flow of information into and out of the memory cell.
 The forget gate decides what information to discard from the cell state, the input gate
decides what information to store, and the output gate controls what information to use
for predictions.
 These gates are learnable and adaptively adjust during training.
3. Skip Connections: LSTMs also employ skip connections that allow the gradient to flow more
easily during backpropagation.
 This helps in capturing long-range dependencies by reducing the vanishing gradient
effect.
 The combination of memory cells, gating mechanisms, and skip connections in LSTMs
enables them to capture and maintain information over longer sequences, effectively
overcoming the vanishing gradient problem in NLP tasks.
 The introduction of LSTMs has had a profound impact on the field of NLP, as well as in
other areas of deep learning.
 LSTMs have been instrumental in various NLP applications, such as:

1. Machine Translation: LSTMs have significantly improved the quality of machine translation
models by allowing them to capture the context and nuances of source and target languages
over longer sentences.
2. Language Modeling: LSTMs excel in language modeling tasks by modeling dependencies
between words across long documents, making them capable of generating coherent and
contextually relevant text.
3. Speech Recognition: LSTMs are also used in speech recognition systems to model the
temporal dependencies in audio data, leading to more accurate transcriptions.
4. Sentiment Analysis: In sentiment analysis, LSTMs help capture the nuances of sentiment
expressions by considering the context of entire sentences or paragraphs.

Q11. Compare recurrent neural network and feed forward neural network.

ANS.

Aspect Recurrent Neural Networks (RNNs) Feedforward Neural Networks (FNNs)

Designed for non-sequential data


Architecture Designed for sequential data processing. processing.

Fixed-size input (e.g., images, tabular


Input Sequential data (e.g., time series, text). data).

Connections Recurrent connections between hidden No recurrent connections;


Aspect Recurrent Neural Networks (RNNs) Feedforward Neural Networks (FNNs)

units. feedforward.

Maintains a hidden state that evolves over No hidden state; processes input
Hidden State time. directly.

Can capture temporal dependencies and Does not inherently capture sequential
Information Flow sequences. info.

Time Dependency Handles variable-length sequences


Handling effectively. Requires fixed-size input.

Typically more complex due to temporal Generally simpler and computationally


Training Complexity nature. efficient.

Vanishing Gradient
Problem Vulnerable to vanishing gradient problem. Less susceptible to vanishing gradients.

Data Types Well-suited for time series, text, speech, etc. Suited for structured data, images, etc.

Natural Language Processing (NLP), speech Image classification, regression tasks,


Use Cases recognition, time series analysis. structured data analysis.

Applications Machine translation, sentiment analysis, Image classification, object detection,


Aspect Recurrent Neural Networks (RNNs) Feedforward Neural Networks (FNNs)

speech recognition. regression.

Processes data sequentially with recurrent


Sequence Processing loops. Processes data in parallel across layers.

Utilizes memory cells (e.g., LSTM, GRU) to


Memory Mechanism remember past information. No explicit memory mechanism.

Training Time Slower due to sequential processing. Faster due to parallel processing.

Limited parallelization due to sequential


Parallelization nature. Highly parallelizable across layers.

Temporal Information Limited capacity to capture temporal


Capture Captures temporal dependencies effectively. info.

Output Flexibility Can produce output at each time step. Typically produces a single final output.

Overfitting Control Prone to overfitting on long sequences. Better control over overfitting.

Model Size Larger number of parameters due to Smaller number of parameters.


Aspect Recurrent Neural Networks (RNNs) Feedforward Neural Networks (FNNs)

recurrence.

Training Data Can work with smaller datasets


Requirements Requires more data for effective training. effectively.

Prediction Time Slower for sequential prediction tasks. Faster for single-shot predictions.

Complexity Complex internal dynamics may be less


Interpretability interpretable. Simpler architecture for interpretation

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy