0% found this document useful (0 votes)

5 views

SMA02

Uploaded by

Sachin Yadav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

SMA02

Uploaded by

Sachin Yadav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Vidyavardhini’s College of Engineering &

Technology
Department of Computer Engineering

Name: Sachin Yadav

Roll No.: 22
Experiment No. 2
To perform web crawling, scraping and parsing using Instant
data scraper, Netlytic and Octoparse
Date of Performance: 14/02/2024
Date of Submission: 14/02/2024

CSDL8023: Social Media Analytics Lab

Vidyavardhini’s College of Engineering &
Technology
Department of Computer Engineering

Aim: To perform web crawling, scraping and parsing using Instant data scraper, Netlytic and
Octoparse.

Objective: To apply web crawling, scraping, and parsing techniques to extract data from Google
reviews using Instant Data Scraper, extract data from YouTube comments using Netlytic, and set
up and run web scraping tasks to extract data using Octoparse.

Theory:

Web crawling: Web crawling is the process of automatically browsing the internet and indexing
web pages. It is typically done by search engines to discover new content and update their
indexes. Web crawlers, also known as spiders or bots, follow links from one page to another and
download the content of each page for indexing. While web crawling is not the same as web
scraping, web scraping often involves web crawling to navigate through a website and extract
data from multiple pages.

Web scraping: This is the process of extracting specific information from websites. It involves
using software or programming scripts to access the HTML of web pages and extract the desired
data, such as text, images, or links. Web scraping can be done manually or automatically, and it
is used for various purposes, including data collection, market research, and price monitoring.

Parsing: Parsing is the process of analyzing the structure of a document or data file to extract
meaningful information. In the context of web scraping, parsing is used to extract specific data
elements from the HTML or other markup languages used to create web pages. This process
involves identifying the patterns and structures of the data and using techniques like regular
expressions or HTML parsers to extract the desired information.

Instant Data Scraper: Instant Data Scraper is a Chrome extension that allows scraping data
from websites directly in your browser. It provides a simple interface for selecting and extracting
data elements, and it can export the data in various formats like CSV or Excel. Instant Data
Scraper is useful for quick and easy web scraping tasks, but it may have limitations compared to
more advanced scraping tools.

CSDL8023: Social Media Analytics Lab

Vidyavardhini’s College of Engineering &
Technology
Department of Computer Engineering

Netlytic: Netlytic is a cloud-based text and social network analyzer that allows users to collect,
analyze, and visualize social media data. It can be used to study online communities, track social
media trends, and analyze text data from various sources, including Twitter, Facebook,
YouTube, and web forums. Netlytic offers features for data collection, text analysis, and network
analysis, making it a versatile tool for social media research and analysis.

Octoparse: Octoparse is a web scraping tool that allows you to extract data from websites
without the need for programming. It provides a visual interface for selecting the data to scrape
and offers features like scheduled scraping, cloud extraction, and data export options. It's
commonly used for tasks such as web data collection, price monitoring, and market research.

Implementation and Output:

Scrape Google Reviews
Step 1 : Install the Google Chrome extension Instant Data Scraper to scrape Google reviews for
any local business
Step 2 : Go to Google Maps and look for a business that interests you
Step 3 : Choose the reviews and launch Instant Data Scraper to crawl Google reviews. Wait until
all reviews have been scraped

CSDL8023: Social Media Analytics Lab

Vidyavardhini’s College of Engineering &
Technology
Department of Computer Engineering

Scrape YouTube Comments using Netlytic

CSDL8023: Social Media Analytics Lab

Vidyavardhini’s College of Engineering &
Technology
Department of Computer Engineering

Step 1 : Sign up for Netlytic

Step 2 : Click “New Dataset”
Step 3 : Select "YouTube" as the data source
Step 4 : Copy the YouTube video ID you want to scrape comments from and paste it into
Netlytic, also enter Dataset Name and click import
Step 5 : Go to “My Datasets” tab where you can find your dataset

CSDL8023: Social Media Analytics Lab

Vidyavardhini’s College of Engineering &
Technology
Department of Computer Engineering

Web Scraping using Octoparse

CSDL8023: Social Media Analytics Lab

Vidyavardhini’s College of Engineering &
Technology
Department of Computer Engineering

Step 1 : Go to web page

Step 2 : Create pagination
Step 3 : Build a loop item
Step 4 : Extract the data
Step 5 : Run the task and get the data

CSDL8023: Social Media Analytics Lab

Vidyavardhini’s College of Engineering &
Technology
Department of Computer Engineering

CSDL8023: Social Media Analytics Lab

Vidyavardhini’s College of Engineering &
Technology
Department of Computer Engineering

CSDL8023: Social Media Analytics Lab

Vidyavardhini’s College of Engineering &
Technology
Department of Computer Engineering

CSDL8023: Social Media Analytics Lab

Vidyavardhini’s College of Engineering &
Technology
Department of Computer Engineering

Conclusion: In conclusion, this experiment showcased the practical application of web crawling,
scraping, and parsing techniques using Instant Data Scraper, Netlytic, and Octoparse. Instant
Data Scraper proved useful for extracting Google reviews with its user-friendly interface, while
Netlytic demonstrated its efficiency in analyzing social media data by extracting YouTube
comments effectively. Octoparse's flexibility and automation features made it ideal for complex
web scraping tasks, including pagination and data extraction from multiple pages. These tools
collectively offer a range of capabilities for web data extraction, catering to different needs and
skill levels, and can be valuable assets in research, analysis, and data-driven decision-making
processes.

CSDL8023: Social Media Analytics Lab

Upadhyay (2017) - Articulating The Construction of A Web Scraper For
No ratings yet
Upadhyay (2017) - Articulating The Construction of A Web Scraper For
4 pages
Web Scraping
86% (7)
Web Scraping
12 pages
Web Scraping Ganesh
0% (1)
Web Scraping Ganesh
20 pages
Exp 2 -SMA
No ratings yet
Exp 2 -SMA
4 pages
sma exp2
No ratings yet
sma exp2
3 pages
A Web Scraper For Extracting Alumni Information From Social
No ratings yet
A Web Scraper For Extracting Alumni Information From Social
4 pages
Seminar Completed
No ratings yet
Seminar Completed
22 pages
43_710 (1)
No ratings yet
43_710 (1)
4 pages
Com 059
No ratings yet
Com 059
6 pages
20 - 3 - A Study
No ratings yet
20 - 3 - A Study
5 pages
Text-Processing-For-NLP-Web-Scrapping (5)
No ratings yet
Text-Processing-For-NLP-Web-Scrapping (5)
18 pages
Diouf 2019
No ratings yet
Diouf 2019
3 pages
Crawling Through Web To Extract The Data From Social Networking Site - Twitter
No ratings yet
Crawling Through Web To Extract The Data From Social Networking Site - Twitter
6 pages
Arindam Manna, Financial Analytics
No ratings yet
Arindam Manna, Financial Analytics
9 pages
Web Scraping With Python and Selenium: Sarah Fatima, Shaik Luqmaan Nuha Abdul Rasheed
No ratings yet
Web Scraping With Python and Selenium: Sarah Fatima, Shaik Luqmaan Nuha Abdul Rasheed
5 pages
1.8 Data Scrapping PDF
No ratings yet
1.8 Data Scrapping PDF
42 pages
AI-Powered Web Scraping in 2024: Best Practices & Use Cases
No ratings yet
AI-Powered Web Scraping in 2024: Best Practices & Use Cases
5 pages
Web Scraping
No ratings yet
Web Scraping
16 pages
Engineering-A Review Web Data Scrapping
No ratings yet
Engineering-A Review Web Data Scrapping
4 pages
Scraping
100% (1)
Scraping
25 pages
DADS404 Unit-02 - V1.1
No ratings yet
DADS404 Unit-02 - V1.1
23 pages
Data Analysis by Web Scraping Using Python
No ratings yet
Data Analysis by Web Scraping Using Python
6 pages
Web Scraping: Applications and Tools
100% (2)
Web Scraping: Applications and Tools
31 pages
Introduction To Web Scraping
100% (1)
Introduction To Web Scraping
3 pages
Rohan report
No ratings yet
Rohan report
25 pages
Web Scrapping: Dept - of CS&E, BIET, Davangere Page - 1
No ratings yet
Web Scrapping: Dept - of CS&E, BIET, Davangere Page - 1
8 pages
SMA Experiment
No ratings yet
SMA Experiment
29 pages
218R1A6747
No ratings yet
218R1A6747
10 pages
Sing Rodia 2019
No ratings yet
Sing Rodia 2019
6 pages
DSE 3 Unit 3
No ratings yet
DSE 3 Unit 3
4 pages
EJMCM Volume7 Issue3 Pages433-442
No ratings yet
EJMCM Volume7 Issue3 Pages433-442
11 pages
Final Report
No ratings yet
Final Report
39 pages
Final Publish Paper
No ratings yet
Final Publish Paper
4 pages
20_BeautifulSoup Library for Web Scraping
No ratings yet
20_BeautifulSoup Library for Web Scraping
12 pages
Web Scraping
No ratings yet
Web Scraping
11 pages
2023 Guide How To Scrape Social Media Data Using Python
No ratings yet
2023 Guide How To Scrape Social Media Data Using Python
12 pages
Sma 2
No ratings yet
Sma 2
9 pages
Web Scraping 2
No ratings yet
Web Scraping 2
14 pages
Web Scraping
No ratings yet
Web Scraping
4 pages
Mini Project
No ratings yet
Mini Project
13 pages
Final report (4)
No ratings yet
Final report (4)
17 pages
web_scrapping_final[1]
No ratings yet
web_scrapping_final[1]
7 pages
E-Commerce Review Scrapper: Python Mini Project On
No ratings yet
E-Commerce Review Scrapper: Python Mini Project On
15 pages
Seminar Report
No ratings yet
Seminar Report
6 pages
Web Data Scraping
No ratings yet
Web Data Scraping
5 pages
Web Scraping Report
No ratings yet
Web Scraping Report
14 pages
Data Aggregation by Web Scraping Using Python
No ratings yet
Data Aggregation by Web Scraping Using Python
48 pages
Exploring Autodesk Revit 2021 for MEP, 7th Edition
From Everand
Exploring Autodesk Revit 2021 for MEP, 7th Edition
Prof. Sham Tickoo
No ratings yet
Exemple-rapport-Stage-web-scraping-2022-ZD
0% (1)
Exemple-rapport-Stage-web-scraping-2022-ZD
26 pages
Summary Paper 13 14 15
No ratings yet
Summary Paper 13 14 15
2 pages
Web Crawling State of ArtTechniques ApproachesandApplication
No ratings yet
Web Crawling State of ArtTechniques ApproachesandApplication
26 pages
Summary Paper 10 11 12
No ratings yet
Summary Paper 10 11 12
3 pages
Web Scraping or Web Crawling: State of Art, Techniques, Approaches and Application
No ratings yet
Web Scraping or Web Crawling: State of Art, Techniques, Approaches and Application
25 pages
Web Scraping - Unit 1
100% (1)
Web Scraping - Unit 1
31 pages
Web Mining
No ratings yet
Web Mining
13 pages
Mastering Modern AI Tools
From Everand
Mastering Modern AI Tools
Jean Claude AI
No ratings yet
Internship
No ratings yet
Internship
10 pages
Data Science, AI, and Blockchain: Integrated Approaches
From Everand
Data Science, AI, and Blockchain: Integrated Approaches
Ekaaksh Deshpande
No ratings yet
Relative Insertion of Business To Customer URL by Discover Web Information Schemas
No ratings yet
Relative Insertion of Business To Customer URL by Discover Web Information Schemas
4 pages
Data Collection
No ratings yet
Data Collection
14 pages
fortiSwitchOS 6.0.2 CLI Ref PDF
No ratings yet
fortiSwitchOS 6.0.2 CLI Ref PDF
324 pages
Cyber Security and Fraud Analytics
No ratings yet
Cyber Security and Fraud Analytics
5 pages
DLSZ Notes and Homework Grade 6
100% (1)
DLSZ Notes and Homework Grade 6
6 pages
Mastering Ember - Js Sample Chapter
No ratings yet
Mastering Ember - Js Sample Chapter
13 pages
Annexure 1: I. Personal Information
No ratings yet
Annexure 1: I. Personal Information
23 pages
Web Service Magfa
100% (2)
Web Service Magfa
11 pages
En 05 10076
No ratings yet
En 05 10076
321 pages
User Manual Motors
No ratings yet
User Manual Motors
31 pages
Cyber Threat Intelligence Program Plan
No ratings yet
Cyber Threat Intelligence Program Plan
22 pages
Nmap
No ratings yet
Nmap
20 pages
Magnet Goblin Hackers Use 1-Day Flaws To Drop Custom Linux Malware
No ratings yet
Magnet Goblin Hackers Use 1-Day Flaws To Drop Custom Linux Malware
4 pages
E-Gold Secrets: Free)
No ratings yet
E-Gold Secrets: Free)
19 pages
Anonymous Presents: Operation Epik Fail
No ratings yet
Anonymous Presents: Operation Epik Fail
3 pages
Zimbra Collaboration Suite Open Source Edition On CentOS
No ratings yet
Zimbra Collaboration Suite Open Source Edition On CentOS
26 pages
Cryptography Answer
No ratings yet
Cryptography Answer
4 pages
Week 4 Quiz
No ratings yet
Week 4 Quiz
5 pages
ZeroTier - IOTstack
No ratings yet
ZeroTier - IOTstack
35 pages
Cookie Manipulation
No ratings yet
Cookie Manipulation
5 pages
IPUC User Manual
No ratings yet
IPUC User Manual
23 pages
DFD
No ratings yet
DFD
1 page
Spring 2023 - IT430 - 1
No ratings yet
Spring 2023 - IT430 - 1
4 pages
Jstor: Psychological Well-Being in Adult Life
No ratings yet
Jstor: Psychological Well-Being in Adult Life
4 pages
Question Bank Cyber Forensics in Engineering Study
No ratings yet
Question Bank Cyber Forensics in Engineering Study
10 pages
MOD 6 SMA
No ratings yet
MOD 6 SMA
7 pages
ZXSEC US IPSec VPN User Guide
No ratings yet
ZXSEC US IPSec VPN User Guide
229 pages
DTS Adaptive-MFA
No ratings yet
DTS Adaptive-MFA
2 pages
OV Sample QB - HTML5 - 1 PDF
No ratings yet
OV Sample QB - HTML5 - 1 PDF
5 pages
Installation Instructions For The Standard & Premium Versions of Kwikpop For Multicharts
No ratings yet
Installation Instructions For The Standard & Premium Versions of Kwikpop For Multicharts
10 pages
❖ Laws of UX: Reference Cards (Sao)
No ratings yet
❖ Laws of UX: Reference Cards (Sao)
23 pages
Searching in Discord
No ratings yet
Searching in Discord
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

SMA02

Uploaded by

SMA02

Uploaded by

Vidyavardhini’s College of Engineering &

Name: Sachin Yadav

CSDL8023: Social Media Analytics Lab

CSDL8023: Social Media Analytics Lab

Implementation and Output:

CSDL8023: Social Media Analytics Lab

Scrape YouTube Comments using Netlytic

CSDL8023: Social Media Analytics Lab

Step 1 : Sign up for Netlytic

CSDL8023: Social Media Analytics Lab

Web Scraping using Octoparse

CSDL8023: Social Media Analytics Lab

Step 1 : Go to web page

CSDL8023: Social Media Analytics Lab

CSDL8023: Social Media Analytics Lab

CSDL8023: Social Media Analytics Lab

CSDL8023: Social Media Analytics Lab

CSDL8023: Social Media Analytics Lab

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.