0% found this document useful (0 votes)

3 views

AI Assignment

The document discusses intelligent search algorithms, detailing their types such as heuristic, evolutionary, swarm intelligence, local search, and machine learning-based algorithms, along with the challenges in their design. It also covers the importance of tokenization in natural language processing, explaining its role in simplifying text analysis and enabling further processing. Additionally, it contrasts stemming and lemmatization as text normalization techniques, highlighting their characteristics, advantages, and disadvantages.

Uploaded by

sumankajalakhil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

AI Assignment

Uploaded by

sumankajalakhil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

KAJAL GOSWAMI (02201022022, ECE ) Submitted to – Ms.

Shruty Ahuja

ASSIGNMENT-1
UNIT-1
Q.1 What are the Issues in design of Intelligent Search Algorithms?

Ans: Intelligent Search Algorithms:

Intelligent search algorithms are advanced computational methods designed to find solutions or
optimal paths in complex problem spaces. They leverage principles from artificial intelligence and
machine learning to enhance the efficiency and effectiveness of the search process. Here are some
key characteristics and types of intelligent search algorithms.

Types of Intelligent Search Algorithms:

1. Heuristic Search Algorithms:

o A\*: Uses heuristics to find the shortest path to the goal. It combines the actual
distance from the start and an estimated distance to the goal.

o Greedy Best-First Search: Prioritizes nodes that appear to lead most quickly to the
goal, based on a heuristic.

2. Evolutionary Algorithms:

o Genetic Algorithms (GA): Simulates natural evolution processes to optimize

solutions, using techniques like selection, crossover, and mutation.

o Genetic Programming (GP): Similar to GA but focuses on evolving programs or

algorithms to solve problems.

3. Swarm Intelligence Algorithms:

o Ant Colony Optimization (ACO): Mimics the behaviour of ants finding paths to food,
using pheromone trails to guide the search.

o Particle Swarm Optimization (PSO): Models social behaviour of birds or fish to find
optimal solutions through a population of candidate solutions.

4. Local Search Algorithms:

o Hill Climbing: Iteratively moves to neighbouring states with higher value to find the
peak of a function.

o Simulated Annealing: Combines hill climbing with a probability mechanism to

escape local optima and find a global optimum.

5. Machine Learning-Based Algorithms:

o Reinforcement Learning: An agent learns to make decisions by receiving rewards or

penalties for actions taken in an environment.

o Neural Networks: Can be used to approximate functions and guide search processes
in complex, high-dimensional spaces.
KAJAL GOSWAMI (02201022022, ECE ) Submitted to – Ms. Shruty Ahuja

Designing intelligent search algorithms comes with several challenges. Here are some key issues:

1. State Space Complexity

• Challenge: Managing the size of the state space can be daunting, especially for large
problems.

• Impact: Larger state spaces require more computational resources and can slow down the
search process.

2. Search Algorithm Efficiency

• Challenge: Ensuring the search algorithm is efficient in terms of time and space complexity.

• Impact: Inefficient algorithms can lead to long processing times and high memory usage.
KAJAL GOSWAMI (02201022022, ECE ) Submitted to – Ms. Shruty Ahuja

3. Memory Management

• Challenge: Efficiently managing memory to store states and paths during the search process.

• Impact: Poor memory management can lead to memory overflow and system crashes.

4. Heuristic Design and Evaluation

• Challenge: Designing effective heuristics that guide the search towards the goal efficiently.

• Impact: Poor heuristics can result in suboptimal solutions or excessive search times.

5. Handling Dynamic and Uncertain Environments

• Challenge: Adapting to changes in the environment and dealing with uncertainty.

• Impact: Static algorithms may fail in dynamic environments, requiring adaptive and robust
solutions.

6. Scalability

• Challenge: Ensuring the algorithm scales well with increasing problem size and complexity.

• Impact: Non-scalable algorithms may become impractical for large-scale problems.

7. Goal Ambiguity and Multiple Goals

• Challenge: Handling situations where the goal is ambiguous or there are multiple goals.

• Impact: Ambiguity can complicate the search process and require additional logic to resolve.

8. Performance vs. Accuracy Trade-offs

• Challenge: Balancing the trade-off between performance (speed) and accuracy (quality of
solution).

• Impact: Focusing too much on speed can lead to less accurate solutions, while prioritizing
accuracy can slow down the process.
KAJAL GOSWAMI (02201022022, ECE ) Submitted to – Ms. Shruty Ahuja

9. Data Structuring Issues

• Challenge: Efficiently structuring and accessing data during the search process.

• Impact: Poor data structuring can lead to inefficiencies and increased search times.

10. Query Performance

• Challenge: Optimizing query performance to retrieve relevant information quickly.

• Impact: Slow query performance can hinder the overall efficiency of the search algorithm.

UNIT-2
Q1. What is the need of tokenization? Explain with example

Ans. Tokenization is an essential process in natural language processing (NLP) and computational
linguistics. It involves breaking down text into smaller units, called tokens, which can be words,
phrases, or symbols. This process is crucial for several reasons

Tokenization is the first step in any NLP pipeline. It has an important effect on the rest of your
pipeline. A tokenizer breaks unstructured data and natural language text into chunks of information
that can be considered as discrete elements. The token occurrences in a document can be used
directly as a vector representing that document.

This immediately turns an unstructured string (text document) into a numerical data structure
suitable for machine learning. They can also be used directly by a computer to trigger useful actions
and responses. Or they might be used in a machine learning pipeline as features that trigger more
complex decisions or behaviour
KAJAL GOSWAMI (02201022022, ECE ) Submitted to – Ms. Shruty Ahuja

Why Tokenization is Needed:

1. Simplification of Text Analysis:

• Process: By converting text into manageable pieces (tokens), it becomes easier to analyze
and manipulate.

• Example: The sentence "Hello, world!" can be tokenized into ["Hello", ",", "world", "!"],
allowing each word and punctuation mark to be treated individually.

2. Enabling Further Processing:

• Process: Tokenization serves as a preliminary step for various NLP tasks, such as parsing,
sentiment analysis, and machine translation.

• Example: In sentiment analysis, tokenizing a review like "I love this product!" allows the
system to analyze individual words and understand the sentiment expressed.

3. Handling Ambiguity:

• Process: Tokenization helps in resolving ambiguities by isolating words and phrases.

• Example: In the sentence "I saw a man with a telescope," tokenizing it helps in
understanding different interpretations based on the context.

4. Consistency in Text Representation:

• Process: Tokenization standardizes the text, making it consistent for processing by

algorithms.

• Example: The word "running" can be tokenized and stemmed to "run", ensuring uniformity
in text analysis.

5. Facilitating Text Normalization:

• Process: Tokenization aids in normalizing text by separating meaningful units.

• Example: "We're going to the park" can be tokenized into ["We", "'re", "going", "to", "the",
"park"], making it easier to handle contractions and punctuation.
KAJAL GOSWAMI (02201022022, ECE ) Submitted to – Ms. Shruty Ahuja

. Example 1: Tokenizing Mathematical Expressions

Input Expression: "3x^2 + 4x - 5 = 0" Tokenized Output:

• ["3", "x", "^", "2", "+", "4", "x", "-", "5", "=", "0"]

Example 2: Tokenizing Log Files

Input Log Entry: "2024-11-27 19:36:42 INFO User logged in from IP 192.168.1.1" Tokenized Output:

• ["2024-11-27", "19:36:42", "INFO", "User", "logged", "in", "from", "IP", "192.168.1.1"]

Example 3: Tokenizing Chemical Formulas

Input Formula: "H2SO4 + NaOH -> Na2SO4 + H2O" Tokenized Output:

• ["H2SO4", "+", "NaOH", "->", "Na2SO4", "+", "H2O"]

Example 4: Tokenizing Markdown Text

Input Markdown: "# Heading\nThis is a paragraph with bold text and a link." Tokenized Output:

• ["#", "Heading", "\n", "This", "is", "a", "paragraph", "with", "", "bold", "text", "", "and", "[",
"a", "link", "]", "(", "https://example.com", ")"]

Example 5: Tokenizing HTML Content

Input HTML: <div class="container"><h1>Title</h1><p>This is a paragraph.</p></div> Tokenized

Output:

• ["<div", "class=\"container\">", "<h1>", "Title", "</h1>", "<p>", "This", "is", "a", "paragraph",
".", "</p>", "</div>"]

Example 6: Tokenizing JSON Data

Input JSON: {"name": "John", "age": 30, "city": "New York"} Tokenized Output:

• ["{", "\"name\"", ":", "\"John\"", ",", "\"age\"", ":", "30", ",", "\"city\"", ":", "\"New York\"",
"}"]

Example 7: Tokenizing a Music Chord Progression

Input Chords: "C G Am F" Tokenized Output:

• ["C", "G", "Am", "F"]

Example 8: Tokenizing a Structured Query Language (SQL) Command

Input SQL: SELECT * FROM users WHERE age > 25 AND city = 'New York'; Tokenized Output:

• ["SELECT", "*", "FROM", "users", "WHERE", "age", ">", "25", "AND", "city", "=", "'New York'",
";"]
KAJAL GOSWAMI (02201022022, ECE ) Submitted to – Ms. Shruty Ahuja

Q2. Difference between Stemming and Lemmatization. Explain with example.

ANS. Stemming and lemmatization are both text normalization techniques used in natural language
processing (NLP) to reduce words to their base or root forms. However, they differ in their
approaches and outcomes.

Stemming:

Stemming is a rule-based process of removing affixes (prefixes and suffixes) from a word to reduce it
to its base form or root. It does not necessarily produce a meaningful or linguistically correct word
but a stem that can be used for further processing.

Characteristics of Stemming:

• Algorithm Simplicity: Stemming algorithms, like the Porter Stemmer, are relatively simple
and fast.

• Language Independence: Stemming can be applied to many languages without in-depth

linguistic knowledge.

• Output: The stems produced may not be actual words. For example, the stem of "running" is
"run", but the stem of "relational" might be "relat".

• Common Algorithms: Porter Stemmer, Snowball Stemmer, Lancaster Stemmer.

Advantages of Stemming:

• Speed: Because it involves simple rules, stemming is computationally efficient.

• Usage: Suitable for applications where a quick and dirty approximation is sufficient, like
search engines.

Disadvantages of Stemming:

• Accuracy: Can be too aggressive, removing useful parts of words and potentially changing
their meaning.

• Non-Words: Often produces stems that are not real words, which can be problematic in
some contexts.

Lemmatization:

Lemmatization is a more sophisticated process that reduces a word to its base or dictionary form
(lemma) using morphological analysis and linguistic knowledge. It considers the context and the part
of speech of the word.

Characteristics of Lemmatization:

• Algorithm Complexity: Lemmatization algorithms require more sophisticated, language-

specific rules and dictionaries.

• Language Dependence: Requires knowledge of the language's grammar and vocabulary.

• Output: Produces actual words that are meaningful in the context of the language. For
example, "running" is lemmatized to "run", and "better" is lemmatized to "good".

• Common Libraries: WordNet Lemmatizer, spaCy, NLTK's Lemmatizer.

KAJAL GOSWAMI (02201022022, ECE ) Submitted to – Ms. Shruty Ahuja

Advantages of Lemmatization:

• Accuracy: Produces valid words by considering the context, making it more accurate for text
understanding and processing.

• Meaning Preservation: Maintains the grammatical structure and meaning of words.

Disadvantages of Lemmatization:

• Computationally Intensive: More resource-heavy due to the need for linguistic databases
and rules.

• Complexity: Implementation is more complex compared to stemming.

Detailed Comparison with Examples:

Word: "Studies"

• Stemming Output: "studi"

• Lemmatization Output: "study" (noun), "study" (verb)

Word: "Wolves"

• Stemming Output: "wolv"

• Lemmatization Output: "wolf"

Word: "Better"

• Stemming Output: "better" (no change)

• Lemmatization Output: "good" (considering it as an adjective)

When to Use Which:

• Stemming:

o Faster and simpler, suitable for applications where speed is critical, and perfect
accuracy is not required.

• Lemmatization:

o More accurate and context-aware, ideal for applications where the meaning of
words is important, such as text analysis and understanding.

Both techniques are crucial in NLP for tasks such as text classification, information retrieval, and
sentiment analysis, helping to standardize text data for better processing and analysis.

Complete Black Book
57% (7)
Complete Black Book
59 pages
Reimbursement Expenses Receipt
No ratings yet
Reimbursement Expenses Receipt
2 pages
AIML 2 Marks
No ratings yet
AIML 2 Marks
6 pages
ML2
No ratings yet
ML2
23 pages
Shiksha System Design (1)
No ratings yet
Shiksha System Design (1)
9 pages
Artificial Intelligence
75% (4)
Artificial Intelligence
434 pages
AI Notes
No ratings yet
AI Notes
6 pages
Solution Manual To Third Edition AI
No ratings yet
Solution Manual To Third Edition AI
62 pages
Solution Manual To Third Edition AI
86% (73)
Solution Manual To Third Edition AI
62 pages
Syllabus For Guest Lecture Subject
No ratings yet
Syllabus For Guest Lecture Subject
4 pages
AI Book
No ratings yet
AI Book
121 pages
1.AI - Copy
No ratings yet
1.AI - Copy
62 pages
AI Notes
No ratings yet
AI Notes
7 pages
AI PYQ Short Notes
No ratings yet
AI PYQ Short Notes
20 pages
AI and ML Lab Manual 2022
No ratings yet
AI and ML Lab Manual 2022
37 pages
AI Subfields
No ratings yet
AI Subfields
18 pages
AI - Fall2024 (Week2 Informed Search Complete)
No ratings yet
AI - Fall2024 (Week2 Informed Search Complete)
46 pages
Week1 Slide ECE4010
No ratings yet
Week1 Slide ECE4010
301 pages
Algorithm To Find Sum of The Digits of Num
No ratings yet
Algorithm To Find Sum of The Digits of Num
14 pages
On The Applicability of Deep Learning To Construct Process Models From Natural Text 16 05
No ratings yet
On The Applicability of Deep Learning To Construct Process Models From Natural Text 16 05
66 pages
ToA - Lecture 01 02 - Introduction
No ratings yet
ToA - Lecture 01 02 - Introduction
25 pages
01_IA_Uninformed_Search_CP
No ratings yet
01_IA_Uninformed_Search_CP
15 pages
Lecture 0 - CS50's Introduction To Artificial Intelligence With Python
No ratings yet
Lecture 0 - CS50's Introduction To Artificial Intelligence With Python
13 pages
PROBLEM SOLVING PPT (2)
No ratings yet
PROBLEM SOLVING PPT (2)
49 pages
Automatic Paper Corrector Using NLP - 1650875208
No ratings yet
Automatic Paper Corrector Using NLP - 1650875208
4 pages
Essay Group 11
No ratings yet
Essay Group 11
10 pages
Definitions in Lec10 and Lec11
No ratings yet
Definitions in Lec10 and Lec11
3 pages
IEEE XTREME Problem Set
No ratings yet
IEEE XTREME Problem Set
35 pages
5 unit in AI
No ratings yet
5 unit in AI
49 pages
Introduction To Artificial Intelligent Artificial Intelligence (AI) Is The Branch or Field of Computer Science That Is Concerned With
No ratings yet
Introduction To Artificial Intelligent Artificial Intelligence (AI) Is The Branch or Field of Computer Science That Is Concerned With
51 pages
Cleaning & Preprocessing Data by Khushmandeep Kaur
No ratings yet
Cleaning & Preprocessing Data by Khushmandeep Kaur
11 pages
Orange_AI417_10_MS (P1)
No ratings yet
Orange_AI417_10_MS (P1)
4 pages
Coppin Chapter03
No ratings yet
Coppin Chapter03
28 pages
PB3 X Ai MS
No ratings yet
PB3 X Ai MS
4 pages
3 remaining CommonLisp AI
No ratings yet
3 remaining CommonLisp AI
6 pages
AIML Lab Manual
67% (3)
AIML Lab Manual
31 pages
Ai Unit 2 QB
No ratings yet
Ai Unit 2 QB
12 pages
CT-1_Sem6
No ratings yet
CT-1_Sem6
6 pages
Computational Thinking v0.1 13-Oct-2020 PDF
100% (6)
Computational Thinking v0.1 13-Oct-2020 PDF
185 pages
Module 05 - Learners Guide
No ratings yet
Module 05 - Learners Guide
31 pages
unit III
No ratings yet
unit III
93 pages
4786_planning_with_large_language_m
No ratings yet
4786_planning_with_large_language_m
28 pages
Introduction To Problem Solving
No ratings yet
Introduction To Problem Solving
48 pages
Day 2 - Material - 19-09-2022
No ratings yet
Day 2 - Material - 19-09-2022
52 pages
Search, Uninformed Algo
No ratings yet
Search, Uninformed Algo
34 pages
Strategies: Search Control
No ratings yet
Strategies: Search Control
17 pages
aiinterface
No ratings yet
aiinterface
8 pages
3649409.3691090
No ratings yet
3649409.3691090
2 pages
4-Uninformed Search Breadth First Search-02!01!2025
No ratings yet
4-Uninformed Search Breadth First Search-02!01!2025
75 pages
Machine Learning Lab File
No ratings yet
Machine Learning Lab File
48 pages
MTP-T2-A.I.-9 2425
No ratings yet
MTP-T2-A.I.-9 2425
5 pages
20101128, 20101123, 20101115, 20101346_CSE
No ratings yet
20101128, 20101123, 20101115, 20101346_CSE
52 pages
8339869-class 10 postmidterm QP-final-with AK
No ratings yet
8339869-class 10 postmidterm QP-final-with AK
15 pages
AI ct-1
No ratings yet
AI ct-1
8 pages
AI Answer bank
No ratings yet
AI Answer bank
70 pages
Module 1
No ratings yet
Module 1
91 pages
AI imp question
No ratings yet
AI imp question
13 pages
AI Summary
No ratings yet
AI Summary
4 pages
Learning Advanced Programming
From Everand
Learning Advanced Programming
IT Campus Academy
No ratings yet
Artificial intelligence: AI in the technologies synthesis of creative solutions
From Everand
Artificial intelligence: AI in the technologies synthesis of creative solutions
Alexander V. Andreichikov
No ratings yet
Learning with AI : Intelligent Optimisation
From Everand
Learning with AI : Intelligent Optimisation
Coleman Colman
No ratings yet
Liquidation 4
No ratings yet
Liquidation 4
7 pages
M Maj M M: Drums
No ratings yet
M Maj M M: Drums
1 page
The One Hundred A Guide to the Pieces Every Stylish Woman Must Own Digital EPUB Download
100% (11)
The One Hundred A Guide to the Pieces Every Stylish Woman Must Own Digital EPUB Download
17 pages
Psychiatric Ment Health Nurs - 2023 - Berg - Nurses Perceptions of Nurse Patien
No ratings yet
Psychiatric Ment Health Nurs - 2023 - Berg - Nurses Perceptions of Nurse Patien
14 pages
Electron Configuration and The Periodic Table
100% (1)
Electron Configuration and The Periodic Table
43 pages
Booktok Puzzle Pages
No ratings yet
Booktok Puzzle Pages
83 pages
Tabel 2.7 Standar Harga Motor Tempel Dan Kelompok Chainsaw
No ratings yet
Tabel 2.7 Standar Harga Motor Tempel Dan Kelompok Chainsaw
2 pages
DVC Cheatsheet
No ratings yet
DVC Cheatsheet
1 page
Guitar Notes Explained
No ratings yet
Guitar Notes Explained
30 pages
Full Download (Ebook) Chemical principles in the laboratory by Emil J Slowinski; Wayne C Wolsey; William L Masterton ISBN 9780534424534, 0534424538 PDF DOCX
100% (4)
Full Download (Ebook) Chemical principles in the laboratory by Emil J Slowinski; Wayne C Wolsey; William L Masterton ISBN 9780534424534, 0534424538 PDF DOCX
49 pages
Time Value of Money: - PV FV/ (1+r) - PVA AMT ( (1 - (1+r) ) /R) - FV PV (1+r) - FVA AMT ( ( (1+r) - 1) /R)
No ratings yet
Time Value of Money: - PV FV/ (1+r) - PVA AMT ( (1 - (1+r) ) /R) - FV PV (1+r) - FVA AMT ( ( (1+r) - 1) /R)
16 pages
Instant Access to Handbook of Clinical Anesthesia Seventh Edition Barash Paul G Cullen Md Bruce F Stoelting Md Robert K Cahalan Md Michael K Stock Md M Christine Ortega Md Rafael ebook Full Chapters
100% (1)
Instant Access to Handbook of Clinical Anesthesia Seventh Edition Barash Paul G Cullen Md Bruce F Stoelting Md Robert K Cahalan Md Michael K Stock Md M Christine Ortega Md Rafael ebook Full Chapters
55 pages
Consequently, : Rites, Invest India, Mapmyindia, Workbench Projects) - We Composed A Transport, Legal, Socio
No ratings yet
Consequently, : Rites, Invest India, Mapmyindia, Workbench Projects) - We Composed A Transport, Legal, Socio
2 pages
La Rosa March 6 C
No ratings yet
La Rosa March 6 C
4 pages
Cashiering
No ratings yet
Cashiering
13 pages
CMC Calculations For Force Measurements Rev6
No ratings yet
CMC Calculations For Force Measurements Rev6
40 pages
Rise of E-Commerce in India and Its Impact 000000
No ratings yet
Rise of E-Commerce in India and Its Impact 000000
18 pages
Arpit Jain
No ratings yet
Arpit Jain
4 pages
Week 9 - Skeletal Muscle Histology
No ratings yet
Week 9 - Skeletal Muscle Histology
26 pages
Palm Mill Overview. New
No ratings yet
Palm Mill Overview. New
317 pages
4.DB Flats (1-7) & SMDB Typical
No ratings yet
4.DB Flats (1-7) & SMDB Typical
9 pages
Ilovepdf Merged 3
No ratings yet
Ilovepdf Merged 3
57 pages
SGRM
No ratings yet
SGRM
3,592 pages
Alan Petersen - Emotions Online - Feelings and Affordances of Digital Media-Routledge (2022)
100% (1)
Alan Petersen - Emotions Online - Feelings and Affordances of Digital Media-Routledge (2022)
192 pages
BSW English
0% (1)
BSW English
99 pages
RSGK - FS Audited 31 Des 2023 - Released
No ratings yet
RSGK - FS Audited 31 Des 2023 - Released
108 pages
Dkt. #1-3 Exhibit FRAUD
100% (1)
Dkt. #1-3 Exhibit FRAUD
11 pages
Profile Summary: Bachelors of Technology-Information Technology
No ratings yet
Profile Summary: Bachelors of Technology-Information Technology
2 pages
De Thi Anh Ngu STB Philosophy 2022
No ratings yet
De Thi Anh Ngu STB Philosophy 2022
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

AI Assignment

Uploaded by

AI Assignment

Uploaded by

KAJAL GOSWAMI (02201022022, ECE ) Submitted to – Ms.

Ans: Intelligent Search Algorithms:

Types of Intelligent Search Algorithms:

1. Heuristic Search Algorithms:

o Genetic Algorithms (GA): Simulates natural evolution processes to optimize

o Genetic Programming (GP): Similar to GA but focuses on evolving programs or

3. Swarm Intelligence Algorithms:

4. Local Search Algorithms:

o Simulated Annealing: Combines hill climbing with a probability mechanism to

5. Machine Learning-Based Algorithms:

o Reinforcement Learning: An agent learns to make decisions by receiving rewards or

1. State Space Complexity

2. Search Algorithm Efficiency

4. Heuristic Design and Evaluation

5. Handling Dynamic and Uncertain Environments

• Challenge: Adapting to changes in the environment and dealing with uncertainty.

• Impact: Non-scalable algorithms may become impractical for large-scale problems.

7. Goal Ambiguity and Multiple Goals

8. Performance vs. Accuracy Trade-offs

9. Data Structuring Issues

10. Query Performance

• Challenge: Optimizing query performance to retrieve relevant information quickly.

Why Tokenization is Needed:

1. Simplification of Text Analysis:

2. Enabling Further Processing:

• Process: Tokenization helps in resolving ambiguities by isolating words and phrases.

4. Consistency in Text Representation:

• Process: Tokenization standardizes the text, making it consistent for processing by

5. Facilitating Text Normalization:

• Process: Tokenization aids in normalizing text by separating meaningful units.

. Example 1: Tokenizing Mathematical Expressions

Input Expression: "3x^2 + 4x - 5 = 0" Tokenized Output:

Example 2: Tokenizing Log Files

• ["2024-11-27", "19:36:42", "INFO", "User", "logged", "in", "from", "IP", "192.168.1.1"]

Example 3: Tokenizing Chemical Formulas

Input Formula: "H2SO4 + NaOH -> Na2SO4 + H2O" Tokenized Output:

• ["H2SO4", "+", "NaOH", "->", "Na2SO4", "+", "H2O"]

Example 4: Tokenizing Markdown Text

Example 5: Tokenizing HTML Content

Input HTML: <div class="container"><h1>Title</h1><p>This is a paragraph.</p></div> Tokenized

Example 6: Tokenizing JSON Data

Example 7: Tokenizing a Music Chord Progression

Input Chords: "C G Am F" Tokenized Output:

• ["C", "G", "Am", "F"]

Example 8: Tokenizing a Structured Query Language (SQL) Command

Q2. Difference between Stemming and Lemmatization. Explain with example.

• Language Independence: Stemming can be applied to many languages without in-depth

• Common Algorithms: Porter Stemmer, Snowball Stemmer, Lancaster Stemmer.

• Speed: Because it involves simple rules, stemming is computationally efficient.

• Algorithm Complexity: Lemmatization algorithms require more sophisticated, language-

• Language Dependence: Requires knowledge of the language's grammar and vocabulary.

• Common Libraries: WordNet Lemmatizer, spaCy, NLTK's Lemmatizer.

• Meaning Preservation: Maintains the grammatical structure and meaning of words.

• Complexity: Implementation is more complex compared to stemming.

Detailed Comparison with Examples:

• Stemming Output: "studi"

• Lemmatization Output: "study" (noun), "study" (verb)

• Stemming Output: "wolv"

• Lemmatization Output: "wolf"

• Stemming Output: "better" (no change)

• Lemmatization Output: "good" (considering it as an adjective)

When to Use Which:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.