0% found this document useful (0 votes)
3 views

AI Assignment

The document discusses intelligent search algorithms, detailing their types such as heuristic, evolutionary, swarm intelligence, local search, and machine learning-based algorithms, along with the challenges in their design. It also covers the importance of tokenization in natural language processing, explaining its role in simplifying text analysis and enabling further processing. Additionally, it contrasts stemming and lemmatization as text normalization techniques, highlighting their characteristics, advantages, and disadvantages.

Uploaded by

sumankajalakhil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

AI Assignment

The document discusses intelligent search algorithms, detailing their types such as heuristic, evolutionary, swarm intelligence, local search, and machine learning-based algorithms, along with the challenges in their design. It also covers the importance of tokenization in natural language processing, explaining its role in simplifying text analysis and enabling further processing. Additionally, it contrasts stemming and lemmatization as text normalization techniques, highlighting their characteristics, advantages, and disadvantages.

Uploaded by

sumankajalakhil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

KAJAL GOSWAMI (02201022022, ECE ) Submitted to – Ms.

Shruty Ahuja

ASSIGNMENT-1
UNIT-1
Q.1 What are the Issues in design of Intelligent Search Algorithms?

Ans: Intelligent Search Algorithms:

Intelligent search algorithms are advanced computational methods designed to find solutions or
optimal paths in complex problem spaces. They leverage principles from artificial intelligence and
machine learning to enhance the efficiency and effectiveness of the search process. Here are some
key characteristics and types of intelligent search algorithms.

Types of Intelligent Search Algorithms:

1. Heuristic Search Algorithms:

o A\*: Uses heuristics to find the shortest path to the goal. It combines the actual
distance from the start and an estimated distance to the goal.

o Greedy Best-First Search: Prioritizes nodes that appear to lead most quickly to the
goal, based on a heuristic.

2. Evolutionary Algorithms:

o Genetic Algorithms (GA): Simulates natural evolution processes to optimize


solutions, using techniques like selection, crossover, and mutation.

o Genetic Programming (GP): Similar to GA but focuses on evolving programs or


algorithms to solve problems.

3. Swarm Intelligence Algorithms:

o Ant Colony Optimization (ACO): Mimics the behaviour of ants finding paths to food,
using pheromone trails to guide the search.

o Particle Swarm Optimization (PSO): Models social behaviour of birds or fish to find
optimal solutions through a population of candidate solutions.

4. Local Search Algorithms:

o Hill Climbing: Iteratively moves to neighbouring states with higher value to find the
peak of a function.

o Simulated Annealing: Combines hill climbing with a probability mechanism to


escape local optima and find a global optimum.

5. Machine Learning-Based Algorithms:

o Reinforcement Learning: An agent learns to make decisions by receiving rewards or


penalties for actions taken in an environment.

o Neural Networks: Can be used to approximate functions and guide search processes
in complex, high-dimensional spaces.
KAJAL GOSWAMI (02201022022, ECE ) Submitted to – Ms. Shruty Ahuja

Designing intelligent search algorithms comes with several challenges. Here are some key issues:

1. State Space Complexity

• Challenge: Managing the size of the state space can be daunting, especially for large
problems.

• Impact: Larger state spaces require more computational resources and can slow down the
search process.

2. Search Algorithm Efficiency

• Challenge: Ensuring the search algorithm is efficient in terms of time and space complexity.

• Impact: Inefficient algorithms can lead to long processing times and high memory usage.
KAJAL GOSWAMI (02201022022, ECE ) Submitted to – Ms. Shruty Ahuja

3. Memory Management

• Challenge: Efficiently managing memory to store states and paths during the search process.

• Impact: Poor memory management can lead to memory overflow and system crashes.

4. Heuristic Design and Evaluation

• Challenge: Designing effective heuristics that guide the search towards the goal efficiently.

• Impact: Poor heuristics can result in suboptimal solutions or excessive search times.

5. Handling Dynamic and Uncertain Environments

• Challenge: Adapting to changes in the environment and dealing with uncertainty.

• Impact: Static algorithms may fail in dynamic environments, requiring adaptive and robust
solutions.

6. Scalability

• Challenge: Ensuring the algorithm scales well with increasing problem size and complexity.

• Impact: Non-scalable algorithms may become impractical for large-scale problems.

7. Goal Ambiguity and Multiple Goals

• Challenge: Handling situations where the goal is ambiguous or there are multiple goals.

• Impact: Ambiguity can complicate the search process and require additional logic to resolve.

8. Performance vs. Accuracy Trade-offs

• Challenge: Balancing the trade-off between performance (speed) and accuracy (quality of
solution).

• Impact: Focusing too much on speed can lead to less accurate solutions, while prioritizing
accuracy can slow down the process.
KAJAL GOSWAMI (02201022022, ECE ) Submitted to – Ms. Shruty Ahuja

9. Data Structuring Issues

• Challenge: Efficiently structuring and accessing data during the search process.

• Impact: Poor data structuring can lead to inefficiencies and increased search times.

10. Query Performance

• Challenge: Optimizing query performance to retrieve relevant information quickly.

• Impact: Slow query performance can hinder the overall efficiency of the search algorithm.

UNIT-2
Q1. What is the need of tokenization? Explain with example

Ans. Tokenization is an essential process in natural language processing (NLP) and computational
linguistics. It involves breaking down text into smaller units, called tokens, which can be words,
phrases, or symbols. This process is crucial for several reasons

Tokenization is the first step in any NLP pipeline. It has an important effect on the rest of your
pipeline. A tokenizer breaks unstructured data and natural language text into chunks of information
that can be considered as discrete elements. The token occurrences in a document can be used
directly as a vector representing that document.

This immediately turns an unstructured string (text document) into a numerical data structure
suitable for machine learning. They can also be used directly by a computer to trigger useful actions
and responses. Or they might be used in a machine learning pipeline as features that trigger more
complex decisions or behaviour
KAJAL GOSWAMI (02201022022, ECE ) Submitted to – Ms. Shruty Ahuja

Why Tokenization is Needed:

1. Simplification of Text Analysis:

• Process: By converting text into manageable pieces (tokens), it becomes easier to analyze
and manipulate.

• Example: The sentence "Hello, world!" can be tokenized into ["Hello", ",", "world", "!"],
allowing each word and punctuation mark to be treated individually.

2. Enabling Further Processing:

• Process: Tokenization serves as a preliminary step for various NLP tasks, such as parsing,
sentiment analysis, and machine translation.

• Example: In sentiment analysis, tokenizing a review like "I love this product!" allows the
system to analyze individual words and understand the sentiment expressed.

3. Handling Ambiguity:

• Process: Tokenization helps in resolving ambiguities by isolating words and phrases.

• Example: In the sentence "I saw a man with a telescope," tokenizing it helps in
understanding different interpretations based on the context.

4. Consistency in Text Representation:

• Process: Tokenization standardizes the text, making it consistent for processing by


algorithms.

• Example: The word "running" can be tokenized and stemmed to "run", ensuring uniformity
in text analysis.

5. Facilitating Text Normalization:

• Process: Tokenization aids in normalizing text by separating meaningful units.

• Example: "We're going to the park" can be tokenized into ["We", "'re", "going", "to", "the",
"park"], making it easier to handle contractions and punctuation.
KAJAL GOSWAMI (02201022022, ECE ) Submitted to – Ms. Shruty Ahuja

. Example 1: Tokenizing Mathematical Expressions

Input Expression: "3x^2 + 4x - 5 = 0" Tokenized Output:

• ["3", "x", "^", "2", "+", "4", "x", "-", "5", "=", "0"]

Example 2: Tokenizing Log Files

Input Log Entry: "2024-11-27 19:36:42 INFO User logged in from IP 192.168.1.1" Tokenized Output:

• ["2024-11-27", "19:36:42", "INFO", "User", "logged", "in", "from", "IP", "192.168.1.1"]

Example 3: Tokenizing Chemical Formulas

Input Formula: "H2SO4 + NaOH -> Na2SO4 + H2O" Tokenized Output:

• ["H2SO4", "+", "NaOH", "->", "Na2SO4", "+", "H2O"]

Example 4: Tokenizing Markdown Text

Input Markdown: "# Heading\nThis is a paragraph with bold text and a link." Tokenized Output:

• ["#", "Heading", "\n", "This", "is", "a", "paragraph", "with", "", "bold", "text", "", "and", "[",
"a", "link", "]", "(", "https://example.com", ")"]

Example 5: Tokenizing HTML Content

Input HTML: <div class="container"><h1>Title</h1><p>This is a paragraph.</p></div> Tokenized


Output:

• ["<div", "class=\"container\">", "<h1>", "Title", "</h1>", "<p>", "This", "is", "a", "paragraph",
".", "</p>", "</div>"]

Example 6: Tokenizing JSON Data

Input JSON: {"name": "John", "age": 30, "city": "New York"} Tokenized Output:

• ["{", "\"name\"", ":", "\"John\"", ",", "\"age\"", ":", "30", ",", "\"city\"", ":", "\"New York\"",
"}"]

Example 7: Tokenizing a Music Chord Progression

Input Chords: "C G Am F" Tokenized Output:

• ["C", "G", "Am", "F"]

Example 8: Tokenizing a Structured Query Language (SQL) Command

Input SQL: SELECT * FROM users WHERE age > 25 AND city = 'New York'; Tokenized Output:

• ["SELECT", "*", "FROM", "users", "WHERE", "age", ">", "25", "AND", "city", "=", "'New York'",
";"]
KAJAL GOSWAMI (02201022022, ECE ) Submitted to – Ms. Shruty Ahuja

Q2. Difference between Stemming and Lemmatization. Explain with example.

ANS. Stemming and lemmatization are both text normalization techniques used in natural language
processing (NLP) to reduce words to their base or root forms. However, they differ in their
approaches and outcomes.

Stemming:

Stemming is a rule-based process of removing affixes (prefixes and suffixes) from a word to reduce it
to its base form or root. It does not necessarily produce a meaningful or linguistically correct word
but a stem that can be used for further processing.

Characteristics of Stemming:

• Algorithm Simplicity: Stemming algorithms, like the Porter Stemmer, are relatively simple
and fast.

• Language Independence: Stemming can be applied to many languages without in-depth


linguistic knowledge.

• Output: The stems produced may not be actual words. For example, the stem of "running" is
"run", but the stem of "relational" might be "relat".

• Common Algorithms: Porter Stemmer, Snowball Stemmer, Lancaster Stemmer.

Advantages of Stemming:

• Speed: Because it involves simple rules, stemming is computationally efficient.

• Usage: Suitable for applications where a quick and dirty approximation is sufficient, like
search engines.

Disadvantages of Stemming:

• Accuracy: Can be too aggressive, removing useful parts of words and potentially changing
their meaning.

• Non-Words: Often produces stems that are not real words, which can be problematic in
some contexts.

Lemmatization:

Lemmatization is a more sophisticated process that reduces a word to its base or dictionary form
(lemma) using morphological analysis and linguistic knowledge. It considers the context and the part
of speech of the word.

Characteristics of Lemmatization:

• Algorithm Complexity: Lemmatization algorithms require more sophisticated, language-


specific rules and dictionaries.

• Language Dependence: Requires knowledge of the language's grammar and vocabulary.

• Output: Produces actual words that are meaningful in the context of the language. For
example, "running" is lemmatized to "run", and "better" is lemmatized to "good".

• Common Libraries: WordNet Lemmatizer, spaCy, NLTK's Lemmatizer.


KAJAL GOSWAMI (02201022022, ECE ) Submitted to – Ms. Shruty Ahuja

Advantages of Lemmatization:

• Accuracy: Produces valid words by considering the context, making it more accurate for text
understanding and processing.

• Meaning Preservation: Maintains the grammatical structure and meaning of words.

Disadvantages of Lemmatization:

• Computationally Intensive: More resource-heavy due to the need for linguistic databases
and rules.

• Complexity: Implementation is more complex compared to stemming.

Detailed Comparison with Examples:

Word: "Studies"

• Stemming Output: "studi"

• Lemmatization Output: "study" (noun), "study" (verb)

Word: "Wolves"

• Stemming Output: "wolv"

• Lemmatization Output: "wolf"

Word: "Better"

• Stemming Output: "better" (no change)

• Lemmatization Output: "good" (considering it as an adjective)

When to Use Which:

• Stemming:

o Faster and simpler, suitable for applications where speed is critical, and perfect
accuracy is not required.

• Lemmatization:

o More accurate and context-aware, ideal for applications where the meaning of
words is important, such as text analysis and understanding.

Both techniques are crucial in NLP for tasks such as text classification, information retrieval, and
sentiment analysis, helping to standardize text data for better processing and analysis.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy