AI Assignment
AI Assignment
Shruty Ahuja
ASSIGNMENT-1
UNIT-1
Q.1 What are the Issues in design of Intelligent Search Algorithms?
Intelligent search algorithms are advanced computational methods designed to find solutions or
optimal paths in complex problem spaces. They leverage principles from artificial intelligence and
machine learning to enhance the efficiency and effectiveness of the search process. Here are some
key characteristics and types of intelligent search algorithms.
o A\*: Uses heuristics to find the shortest path to the goal. It combines the actual
distance from the start and an estimated distance to the goal.
o Greedy Best-First Search: Prioritizes nodes that appear to lead most quickly to the
goal, based on a heuristic.
2. Evolutionary Algorithms:
o Ant Colony Optimization (ACO): Mimics the behaviour of ants finding paths to food,
using pheromone trails to guide the search.
o Particle Swarm Optimization (PSO): Models social behaviour of birds or fish to find
optimal solutions through a population of candidate solutions.
o Hill Climbing: Iteratively moves to neighbouring states with higher value to find the
peak of a function.
o Neural Networks: Can be used to approximate functions and guide search processes
in complex, high-dimensional spaces.
KAJAL GOSWAMI (02201022022, ECE ) Submitted to – Ms. Shruty Ahuja
Designing intelligent search algorithms comes with several challenges. Here are some key issues:
• Challenge: Managing the size of the state space can be daunting, especially for large
problems.
• Impact: Larger state spaces require more computational resources and can slow down the
search process.
• Challenge: Ensuring the search algorithm is efficient in terms of time and space complexity.
• Impact: Inefficient algorithms can lead to long processing times and high memory usage.
KAJAL GOSWAMI (02201022022, ECE ) Submitted to – Ms. Shruty Ahuja
3. Memory Management
• Challenge: Efficiently managing memory to store states and paths during the search process.
• Impact: Poor memory management can lead to memory overflow and system crashes.
• Challenge: Designing effective heuristics that guide the search towards the goal efficiently.
• Impact: Poor heuristics can result in suboptimal solutions or excessive search times.
• Impact: Static algorithms may fail in dynamic environments, requiring adaptive and robust
solutions.
6. Scalability
• Challenge: Ensuring the algorithm scales well with increasing problem size and complexity.
• Challenge: Handling situations where the goal is ambiguous or there are multiple goals.
• Impact: Ambiguity can complicate the search process and require additional logic to resolve.
• Challenge: Balancing the trade-off between performance (speed) and accuracy (quality of
solution).
• Impact: Focusing too much on speed can lead to less accurate solutions, while prioritizing
accuracy can slow down the process.
KAJAL GOSWAMI (02201022022, ECE ) Submitted to – Ms. Shruty Ahuja
• Challenge: Efficiently structuring and accessing data during the search process.
• Impact: Poor data structuring can lead to inefficiencies and increased search times.
• Impact: Slow query performance can hinder the overall efficiency of the search algorithm.
UNIT-2
Q1. What is the need of tokenization? Explain with example
Ans. Tokenization is an essential process in natural language processing (NLP) and computational
linguistics. It involves breaking down text into smaller units, called tokens, which can be words,
phrases, or symbols. This process is crucial for several reasons
Tokenization is the first step in any NLP pipeline. It has an important effect on the rest of your
pipeline. A tokenizer breaks unstructured data and natural language text into chunks of information
that can be considered as discrete elements. The token occurrences in a document can be used
directly as a vector representing that document.
This immediately turns an unstructured string (text document) into a numerical data structure
suitable for machine learning. They can also be used directly by a computer to trigger useful actions
and responses. Or they might be used in a machine learning pipeline as features that trigger more
complex decisions or behaviour
KAJAL GOSWAMI (02201022022, ECE ) Submitted to – Ms. Shruty Ahuja
• Process: By converting text into manageable pieces (tokens), it becomes easier to analyze
and manipulate.
• Example: The sentence "Hello, world!" can be tokenized into ["Hello", ",", "world", "!"],
allowing each word and punctuation mark to be treated individually.
• Process: Tokenization serves as a preliminary step for various NLP tasks, such as parsing,
sentiment analysis, and machine translation.
• Example: In sentiment analysis, tokenizing a review like "I love this product!" allows the
system to analyze individual words and understand the sentiment expressed.
3. Handling Ambiguity:
• Example: In the sentence "I saw a man with a telescope," tokenizing it helps in
understanding different interpretations based on the context.
• Example: The word "running" can be tokenized and stemmed to "run", ensuring uniformity
in text analysis.
• Example: "We're going to the park" can be tokenized into ["We", "'re", "going", "to", "the",
"park"], making it easier to handle contractions and punctuation.
KAJAL GOSWAMI (02201022022, ECE ) Submitted to – Ms. Shruty Ahuja
• ["3", "x", "^", "2", "+", "4", "x", "-", "5", "=", "0"]
Input Log Entry: "2024-11-27 19:36:42 INFO User logged in from IP 192.168.1.1" Tokenized Output:
Input Markdown: "# Heading\nThis is a paragraph with bold text and a link." Tokenized Output:
• ["#", "Heading", "\n", "This", "is", "a", "paragraph", "with", "", "bold", "text", "", "and", "[",
"a", "link", "]", "(", "https://example.com", ")"]
• ["<div", "class=\"container\">", "<h1>", "Title", "</h1>", "<p>", "This", "is", "a", "paragraph",
".", "</p>", "</div>"]
Input JSON: {"name": "John", "age": 30, "city": "New York"} Tokenized Output:
• ["{", "\"name\"", ":", "\"John\"", ",", "\"age\"", ":", "30", ",", "\"city\"", ":", "\"New York\"",
"}"]
Input SQL: SELECT * FROM users WHERE age > 25 AND city = 'New York'; Tokenized Output:
• ["SELECT", "*", "FROM", "users", "WHERE", "age", ">", "25", "AND", "city", "=", "'New York'",
";"]
KAJAL GOSWAMI (02201022022, ECE ) Submitted to – Ms. Shruty Ahuja
ANS. Stemming and lemmatization are both text normalization techniques used in natural language
processing (NLP) to reduce words to their base or root forms. However, they differ in their
approaches and outcomes.
Stemming:
Stemming is a rule-based process of removing affixes (prefixes and suffixes) from a word to reduce it
to its base form or root. It does not necessarily produce a meaningful or linguistically correct word
but a stem that can be used for further processing.
Characteristics of Stemming:
• Algorithm Simplicity: Stemming algorithms, like the Porter Stemmer, are relatively simple
and fast.
• Output: The stems produced may not be actual words. For example, the stem of "running" is
"run", but the stem of "relational" might be "relat".
Advantages of Stemming:
• Usage: Suitable for applications where a quick and dirty approximation is sufficient, like
search engines.
Disadvantages of Stemming:
• Accuracy: Can be too aggressive, removing useful parts of words and potentially changing
their meaning.
• Non-Words: Often produces stems that are not real words, which can be problematic in
some contexts.
Lemmatization:
Lemmatization is a more sophisticated process that reduces a word to its base or dictionary form
(lemma) using morphological analysis and linguistic knowledge. It considers the context and the part
of speech of the word.
Characteristics of Lemmatization:
• Output: Produces actual words that are meaningful in the context of the language. For
example, "running" is lemmatized to "run", and "better" is lemmatized to "good".
Advantages of Lemmatization:
• Accuracy: Produces valid words by considering the context, making it more accurate for text
understanding and processing.
Disadvantages of Lemmatization:
• Computationally Intensive: More resource-heavy due to the need for linguistic databases
and rules.
Word: "Studies"
Word: "Wolves"
Word: "Better"
• Stemming:
o Faster and simpler, suitable for applications where speed is critical, and perfect
accuracy is not required.
• Lemmatization:
o More accurate and context-aware, ideal for applications where the meaning of
words is important, such as text analysis and understanding.
Both techniques are crucial in NLP for tasks such as text classification, information retrieval, and
sentiment analysis, helping to standardize text data for better processing and analysis.