Q ClassX AI Ch7
Q ClassX AI Ch7
1. What is a Chabot?
A chatbot is a computer program that's designed to simulate human conversation
through voice commands or text chats or both. Eg: Mitsuku Bot, Jabberwacky etc.
OR
A chatbot is a computer program that can learn over time how to best interact with
humans. It can answer questions and troubleshoot customer problems, evaluate and
qualify prospects, generate sales leads and increase sales on an ecommerce site.
In lemmatization, the word we get after affix removal (also known as lemma) is a
meaningful one. Lemmatization makes sure that lemma is a word with meaning and
hence it takes a longer time to execute than stemming.
Script-bot Smart-bot
A scripted chatbot doesn’t carry Smart bots are built on NLP and
even a glimpse of A.I ML.
Script bots are easy to make Smart –bots are comparatively
difficult to make.
Script bot functioning is very Smart-bots are flexible and
limited as they are less powerful.
powerful. ● Smart bots work on bigger
Script bots work around a script databases and other resources
which is programmed in them directly
● NLP and Machine learning
No or little language processing skills are required.
skills ● Wide functionality
Limited functionality
Bag of Words just creates a set of vectors containing the count of word occurrences in
the document (reviews). Bag of Words vectors are easy to interpret.
Perfect Syntax, no Meaning - Sometimes, a statement can have a perfectly correct syntax
but it does not mean anything. In Human language, a perfect balance of syntax and
semantics is important for better understanding.
These are some of the challenges we might have to face if we try to teach
computers how to understand and interact in human language.
2. Through a step-by-step process, calculate TFIDF for the given corpus and mention
the word(s) having highest value.
Document 1: We are going to Mumbai
Document 2: Mumbai is a famous place.
Document 3: We are going to a famous place.
Document 4: I am famous in Mumbai.
Term Frequency
Term frequency is the frequency of a word in one document. Term frequency can easily
be found from the document vector table as in that table we mention the frequency of
each word of the vocabulary in each document.
Talking about inverse document frequency, we need to put the document frequency in
the denominator while the total number of documents is the numerator. Here, the total
number of documents are 3, hence inverse document frequency becomes:
3. Normalize the given text and comment on the vocabulary before and after the
normalization:
Raj and Vijay are best friends. They play together with other friends. Raj likes to
play football but Vijay prefers to play online games. Raj wants to be a footballer.
Vijay wants to become an online gamer.
Tokenization:
In this step, the tokens which are not necessary are removed from the token list.
So, the words and, are, to, an, (Punctuation) will be removed.
After the stop words removal, we convert the whole text into a similar case, preferably
lower case.
Here we don’t have words in different case so this step is not required for given text.
Stemming:
In this step, the remaining words are reduced to their root words. In other words,
stemming is the process in which the affixes of words are removed and the words are
converted to their base form.
Word Affixes Stem
Likes -s Like
Prefers
-s Prefer
Wants -s want
In the given text Lemmatization is not required.
Given Text
Raj and Vijay are best friends. They play together with other friends. Raj likes to play
football but Vijay prefers to play online games. Raj wants to be a footballer. Vijay wants to
become an online gamer.
Normalized Text
Raj and Vijay best friends They play together with other friends Raj likes to play football
but Vijay prefers to play online games Raj wants to be a footballer Vijay wants to become
an online gamer