0% found this document useful (0 votes)

6 views

Python Programming

The document outlines a final project for a price comparison web scraper developed using Python and Scrapy, designed to automate the retrieval of product information from eBay and Amazon. It details the project's purpose, scope, design, implementation, testing, and future improvements, emphasizing user convenience and data accuracy. The scraper effectively addresses challenges such as website structure changes and legal considerations while providing a foundation for further enhancements like a user-friendly interface and database integration.

Uploaded by

mana79rhm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as ODT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Python Programming

Uploaded by

mana79rhm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as ODT, PDF, TXT or read online on Scribd

You are on page 1/ 11

Python Programming

Final Project

Price Comparison
By
Mana Rahimi
202180090100

Zhengzhou University

1
Contents
1.0Introduction…………………………………………………………3
1.1Purpose………………………………………………………………3
1.2Scope of Project……………………………………..………………3
2.0 Design………………………………………………………………3
2.1 Technologies Used…………………….……………………………3
2.2 Price Comparison St…………………………………………………4
2.3.0 User Interface Design………………………………………………4
2.4 DFD Diagram…………………………………………………………5
2.5 Flowchart Diagram: ……………………….…………………………6
3.0 Implementation………………………………………………………7
3.1 Setting Up the Environment…………………………….……………7
3.2 the Spider Class……………………..………………………………7
3.3 Custom Settings………………………………………………………7
3.4 Starting the Scraping Process…………………………………………7
3.5 Parsing the Data………………………………………………………7-8
3.6 Saving the Data………………………………………………………8
4.0 Testing …………………..……………………………………………8
5. Conclusion…………………………..…………………………………11

2
1.0 Introduction
1.1 Purpose
This report explains how we built a web scraper using Scrapy, a user-friendly tool in Python
designed for gathering data from websites. The main goal of this project was to create a handy
tool that allows users to search for products on eBay and Amazon, collecting important details
like product names, prices, and links. This report will discuss the design, implementation and
testing process, and plan to improve this scraper in the future.

1.2 Scope of Project

In today's digital world, online shopping has changed the way we buy things. With so many
options available at our fingertips, it can be overwhelming for shoppers to compare prices and
features of products.
However, comparing prices across multiple e-commerce platforms can be time-consuming and
challenging. The Price Comparison System addresses this issue by automating the process of
retrieving and comparing product prices from major online retailers like eBay and Amazon.
This system is designed to enhance user convenience by providing accurate and up-to-date
information on product prices, links, and other relevant details. Users only need to input the
name of the desired product, and the system efficiently scrapes data from multiple platforms,
presenting it in a structured format such as JSON or CSV.

2.0 Design
2.1 Technologies Used
The following technologies are utilized to build the scraper:
• Python Language: This serves as the primary programming language.
• Scrapy: An open-source framework that simplifies data gathering from websites.
• XPath/CSS Selectors: These tools aid in navigating and extracting data from HTML
documents.
• Virtual Environment: This is set up to manage dependencies and ensure smooth operation
across different systems.

3
2.2 Price Comparison Structure
The system is designed to be efficient and easy to understand. Its main components include:
2.2.1 User Input Module: This module collects the product name from the user.
2.2.2 Spider Class: This special class in Scrapy handles requests (asking for data) and responses
(receiving data back).
2.2.3 Request Handling: This section constructs the appropriate requests to eBay and Amazon
based on the product name provided.
2.2.4 Data Extraction Logic: This component utilizes tools called XPath and CSS selectors to
read HTML responses and extract the necessary product information.
2.2.5 Output Management: This part saves the scraped data in various formats as specified by
the user.

2.3.0 User Interface Design

The user interface of the scraper is designed to be simple and clean. It prompts the user for only
one piece of information—the product name—making it accessible for anyone without requiring
detailed instructions or technical skills.

2.3.1 Data Flows

The data flow through the Price Comparison system is as follows:
2.3.1 User Input: The user types in a product name.
2.3.2 URL Construction: The spider creates search URLs for both eBay and Amazon using the
provided product name.
2.3.3 Concurrent Requests: The scraper sends requests to both websites simultaneously to
achieve faster results.
2.3.4 Response Handling: The spider receives HTML responses and processes them using
predefined methods.
2.3.5 Data Extraction: Relevant product information is extracted and organized into structured
data.
2.3.6 Data Output: The scraped data is saved in the selected format (JSON, CSV, or JSON
Lines) for later use.

4
2.4 DFD Diagram

Figure 2 : Diagram showing the flow of data and the interaction between components

5
2.5 Flowchart Diagram:

Figure 1 : The diagram shows a sequential process for scraping and presenting product prices.

1. Start :
• This marks the beginning of the program or process.
2. Enter the Desired Product Name in Search Bar:
• The user inputs the name of the product they want to search for.
• This input will be used to fetch product details from the specified websites.
3. Web Scraping from eBay and Amazon:
• The program sends requests to Amazon and eBay, retrieves the HTML of the product search
results, and extracts relevant data like the product name, price, and links.
• This is where the scraping logic is implemented.
4. Show All the Prices of That Product From Both Websites :
• The extracted data (product details, prices and links) from both websites is displayed to the
user in a files.
5. End :
• This marks the completion of the program or workflow.
6
3.0 Implementation
3.1 Setting Up the Environment
The first step in building the scraper involves setting up the environment. A virtual environment
is created using venv to keep all project files organized and separate from other projects. Scrapy
is installed using pip, along with other necessary libraries like requests for handling web
requests.The system will operate on various platforms, including Windows, macOS, and Linux.
It will require an internet connection for web scraping and updating show information. The
software will be compatible with most modern web browsers for user interaction.

3.2 the Spider Class

The main component of the web scraper is the ProductSearcherSpider class. This class is built
using Scrapy and is tasked with sending requests to eBay and Amazon while understanding the
responses received.

3.3 Custom Settings

To ensure that the scraper behaves like a legitimate web browser (which helps avoid getting
blocked), custom settings are configured within the spider class. A user-agent string is added to
identify the scraper as a standard web browser.

3.4 Starting the Scraping Process

The start_requests method initiates the scraping process by generating search URLs for eBay and
Amazon based on user input. Python’s string formatting features are utilized to construct these
URLs.

3.5 Parsing the Data

Two distinct methods are created to handle the data returned from each website:
7
• eBay Parsing: This method employs XPath selectors to extract product details from eBay’s
search results page, pulling out information such as product names , prices from each listing and
product links.
• Amazon Parsing: Similar to the eBay parsing method, this approach uses XPath selectors to
retrieve product information from Amazon’s results page, including product names, prices, and
links.

3.6 Saving the Data

To save the scraped data, Scrapy’s feed exports feature is utilized. The CrawlerProcess class is
configured with options for data storage and format. Users have the flexibility to save their data
as JSON, CSV, or JSON Lines.

4.0 Testing
4.1 Creating Test Cases
During testing, several test cases were created to ensure the scraper operates effectively in
various scenarios:
1. Valid Product Name: When a valid product name is inputted, the system should return relevant
results from both eBay and Amazon.
2. Non-existent Product Testing: For product names that do not exist, the system should return no
results while avoiding any errors.
3. Special Characters Handling: The scraper's ability to manage product names containing
special characters or spaces was tested.

4.2 Running the Tests

Tests were conducted in a controlled environment by inputting various product names into the
scraper. The results were compared with manual searches conducted on eBay and Amazon's
websites.

8
4.3 Results of Testing
Testing results indicated that the scraper functioned well under typical conditions:
• Valid product names yielded accurate results, displaying product names and prices correctly.
• Non-existent products returned no results without errors, demonstrating robustness in
challenging scenarios.
• Special characters were processed correctly through URL encoding, allowing for diverse search
queries.

4.4 Performance Check

Performance assessments focused on the scraper's speed and data retrieval efficiency. The
scraper demonstrated efficient operation, with response times comparable to standard web
browsing speeds. However, excessive requests within a short timeframe could lead to temporary
blocks from e-commerce sites.

4.5 Discussion of Challenges

Several challenges were encountered during development:

1. Changing Website Structures: Frequent updates to eBay and Amazon's website layouts
affected data extraction methods using XPath and CSS selectors. Continuous updates to the
scraper are necessary to adapt to these changes.

4.6 Data Quality Assurance

Ensuring high-quality data is essential. Inconsistencies in website structures may lead to

variations in HTML formatting, potentially resulting in missing information during data
scraping. Extensive testing was conducted to verify that the collected data remains accurate and
complete.

9
4.7 Legal Considerations
Web scraping must be approached with caution regarding the terms of service established by
various websites. Some sites explicitly prohibit scraping, making it crucial to understand and
adhere to these regulations during scraper development.

4.8 Future Improvements

Several enhancements could be implemented to improve the Product Searcher:
1. User-Friendly Interface: Developing a graphical user interface (GUI) would facilitate easier
use for individuals who may not be familiar with command-line operations.
2. Better Filtering Options: Incorporating filters for price range or seller ratings would assist
users in finding specific products more effectively.
3. Database Storage: Transitioning from simple file storage to a database system would enhance
data management and search capabilities.
4. Improved Error Handling: Implementing robust error handling mechanisms would allow the
scraper to navigate issues such as layout changes or network problems more smoothly.
5. Faster Performance: Utilizing multi-threading could significantly accelerate the scraping
process, especially when processing large volumes of data.
6. Scheduled Scraping: Adding a feature for automatic scraping at designated times would
alleviate the need for manual intervention.
7. Data Visualization: Integrating visualization tools to create graphs and charts would enable
users to easily identify trends and comparisons in the data.

10
5 Conclusion
The web scraping system effectively simplifies the price comparison process for users by
automating data collection from e-commerce platforms. By leveraging Python, Scrapy, and
advanced data extraction techniques, the system ensures accurate and efficient performance.
Testing confirms its robustness in handling various scenarios, including special characters and
non-existent products.
The project provides a solid foundation for future enhancements, including the addition of a
graphical user interface (GUI), database integration, and advanced filtering options. While
challenges such as changing website structures and legal considerations were encountered, these
can be mitigated with continuous updates and adherence to ethical scraping practices.

AWS Solution Architect Certification Exam Practice Paper 2019
From Everand
AWS Solution Architect Certification Exam Practice Paper 2019
Tech Interviews
3.5/5 (3)
SRS - How to build a Pen Test and Hacking Platform
From Everand
SRS - How to build a Pen Test and Hacking Platform
alasdair gilchrist
2/5 (1)
Modern JavaScript Applications
From Everand
Modern JavaScript Applications
Narayan Prusty
No ratings yet
Getting started with Spring Framework: A Hands-on Guide to Begin Developing Applications Using Spring Framework
From Everand
Getting started with Spring Framework: A Hands-on Guide to Begin Developing Applications Using Spring Framework
Ashish Sarin
4.5/5 (2)
Force.com Enterprise Architecture
From Everand
Force.com Enterprise Architecture
Andrew Fawcett
4.5/5 (2)
Mastering Django: Core
From Everand
Mastering Django: Core
Nigel George
3/5 (1)
Learning Informatica PowerCenter 9.x
From Everand
Learning Informatica PowerCenter 9.x
Rahul Malewar
3/5 (4)
SAP Solution Manager
From Everand
SAP Solution Manager
equitypress
4/5 (10)
19-5E8 Tushara Priya
No ratings yet
19-5E8 Tushara Priya
23 pages
Python_PPT(5)[1]
No ratings yet
Python_PPT(5)[1]
27 pages
Online Shopping Comparison On E Commerce Sites Using Web Scrapping Approach Ijariie11458
No ratings yet
Online Shopping Comparison On E Commerce Sites Using Web Scrapping Approach Ijariie11458
4 pages
Web Scraping - PPT-1
100% (2)
Web Scraping - PPT-1
9 pages
Visual Basic 2010 Coding Briefs Data Access
From Everand
Visual Basic 2010 Coding Briefs Data Access
Kevin Hough
5/5 (1)
Microsoft Power Platform For Dummies
From Everand
Microsoft Power Platform For Dummies
Jack A. Hyman
No ratings yet
Team 7 Cse - B Journal Paper
No ratings yet
Team 7 Cse - B Journal Paper
6 pages
How To Scrape Product Data From Amazon - A Complete Guide - Oxylabs
No ratings yet
How To Scrape Product Data From Amazon - A Complete Guide - Oxylabs
19 pages
Intermediate Load Runner With Oracle/Apex Concepts.
From Everand
Intermediate Load Runner With Oracle/Apex Concepts.
Rohan Gordon
No ratings yet
web_scraping_report[1]
No ratings yet
web_scraping_report[1]
47 pages
Price Comparision Website For Online Shopping
No ratings yet
Price Comparision Website For Online Shopping
13 pages
Mastering CryENGINE
From Everand
Mastering CryENGINE
Sascha Gundlach
No ratings yet
Go Programming Blueprints - Second Edition
From Everand
Go Programming Blueprints - Second Edition
Mat Ryer
4.5/5 (3)
Java: Tips and Tricks to Programming Code with Java: Java Computer Programming, #2
From Everand
Java: Tips and Tricks to Programming Code with Java: Java Computer Programming, #2
Charlie Masterson
No ratings yet
Java: Tips and Tricks to Programming Code with Java
From Everand
Java: Tips and Tricks to Programming Code with Java
Charlie Masterson
No ratings yet
Microsoft .NET Interview Questions: MS .NET Certification Review
From Everand
Microsoft .NET Interview Questions: MS .NET Certification Review
equitypress
No ratings yet
Learning ASP.NET Core MVC Programming
From Everand
Learning ASP.NET Core MVC Programming
Mugilan T. S. Ragupathi
5/5 (4)
C++ Basics for New Programmers: A Practical Guide with Examples
From Everand
C++ Basics for New Programmers: A Practical Guide with Examples
William E. Clark
No ratings yet
JavaScript Essentials For Dummies
From Everand
JavaScript Essentials For Dummies
Paul McFedries
No ratings yet
Salesforce Certified Platform Developer I CRT-450 Exam Preparation
From Everand
Salesforce Certified Platform Developer I CRT-450 Exam Preparation
Georgio Daccache
No ratings yet
Agile Web Application Development with Yii1.1 and PHP5
From Everand
Agile Web Application Development with Yii1.1 and PHP5
Jeffrey Winesett
3.5/5 (1)
Learning Dynamics NAV Patterns: Create solutions that are easy to maintain, are quick to upgrade, and follow proven concepts and design
From Everand
Learning Dynamics NAV Patterns: Create solutions that are easy to maintain, are quick to upgrade, and follow proven concepts and design
Marije Brummel
No ratings yet
AI E Commerce Chatbot Report
No ratings yet
AI E Commerce Chatbot Report
3 pages
REPORT.pdf (2)
No ratings yet
REPORT.pdf (2)
32 pages
Web Scraping with Python Step by Step: A Practical Guide with Examples
From Everand
Web Scraping with Python Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
Ext.NET Web Application Development
From Everand
Ext.NET Web Application Development
Anup Shah
No ratings yet
Java / J2EE Interview Questions You'll Most Likely Be Asked
From Everand
Java / J2EE Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Real-World Web Development with .NET 9: Build websites and services using mature and proven ASP.NET Core MVC, Web API, and Umbraco CMS
From Everand
Real-World Web Development with .NET 9: Build websites and services using mature and proven ASP.NET Core MVC, Web API, and Umbraco CMS
Mark J. Price
No ratings yet
Visual SourceSafe 2005 Software Configuration Management in Practice
From Everand
Visual SourceSafe 2005 Software Configuration Management in Practice
Aleksandar Seovic
No ratings yet
Oracle Hyperion Interactive Reporting 11 Expert Guide
From Everand
Oracle Hyperion Interactive Reporting 11 Expert Guide
Edward J. Cody
No ratings yet
Microsoft Sharepoint Interview Questions: Share Point Certification Review
From Everand
Microsoft Sharepoint Interview Questions: Share Point Certification Review
Equity Press
5/5 (2)
Applied Architecture Patterns on the Microsoft Platform Second Edition
From Everand
Applied Architecture Patterns on the Microsoft Platform Second Edition
Andre Dovgal
No ratings yet
Building Websites with Microsoft Content Management Server
From Everand
Building Websites with Microsoft Content Management Server
Lim Mei Ying
3/5 (2)
Building Websites with OpenCms
From Everand
Building Websites with OpenCms
Matt Butcher
No ratings yet
Mango Details Web Scrapping: Project
No ratings yet
Mango Details Web Scrapping: Project
3 pages
Core Data iOS Essentials
From Everand
Core Data iOS Essentials
B. M. Harwani
No ratings yet
DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Preparation
From Everand
DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Preparation
Georgio Daccache
No ratings yet
Assignment: Submitted To
No ratings yet
Assignment: Submitted To
4 pages
FuelPHP Application Development Blueprints
From Everand
FuelPHP Application Development Blueprints
Sébastien Drouyer
No ratings yet
SAP XI Exchange Infrastructure
From Everand
SAP XI Exchange Infrastructure
equitypress
1/5 (3)
B_2 CIE Web Scraping
No ratings yet
B_2 CIE Web Scraping
8 pages
Mastering Ext JS - Second Edition
From Everand
Mastering Ext JS - Second Edition
Loiane Groner
No ratings yet
Code::Blocks Essentials: Definitive Reference for Developers and Engineers
From Everand
Code::Blocks Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
.Net Framework and Programming in ASP.NET
From Everand
.Net Framework and Programming in ASP.NET
Priyanka Agarwal
No ratings yet
Applied Architecture Patterns on the Microsoft Platform
From Everand
Applied Architecture Patterns on the Microsoft Platform
Richard Seroter
No ratings yet
Alfresco Developer Guide
From Everand
Alfresco Developer Guide
Jeff Potts
No ratings yet
Spring Boot Intermediate Microservices: Resilient Microservices with Spring Boot 2 and Spring Cloud
From Everand
Spring Boot Intermediate Microservices: Resilient Microservices with Spring Boot 2 and Spring Cloud
Jens Boje
No ratings yet
Spring MVC Blueprints
From Everand
Spring MVC Blueprints
Sherwin John Calleja Tragura
No ratings yet
IBM Cognos 8 Planning
From Everand
IBM Cognos 8 Planning
Jason Edwards
No ratings yet
Odoo 10 Development Essentials
From Everand
Odoo 10 Development Essentials
Daniel Reis
No ratings yet
C# 2010 Coding Briefs Data Access
From Everand
C# 2010 Coding Briefs Data Access
Kevin Hough
No ratings yet
Salesforce Platform App Builder Certification Handbook
From Everand
Salesforce Platform App Builder Certification Handbook
Siddhesh Kabe
4/5 (1)
2731171_E
No ratings yet
2731171_E
2 pages
Design of Guyed Electrical Transmission Structures: ASCE Manuals and Reports On Engineering Practice No. 91
No ratings yet
Design of Guyed Electrical Transmission Structures: ASCE Manuals and Reports On Engineering Practice No. 91
2 pages
Hartex Hydraulic Hose Flyer A5 (16 LMR) (Update 28 Agt 2020)
No ratings yet
Hartex Hydraulic Hose Flyer A5 (16 LMR) (Update 28 Agt 2020)
16 pages
Digital Clock Project
No ratings yet
Digital Clock Project
26 pages
Condition Assessment Manual: Automation System Checklist and Inspection Form
No ratings yet
Condition Assessment Manual: Automation System Checklist and Inspection Form
27 pages
Ecdis Mang Level
No ratings yet
Ecdis Mang Level
115 pages
m24256e-f
No ratings yet
m24256e-f
48 pages
Data Analyst SQL Tableau in San Francisco Bay CA Resume Rod Cyrus
No ratings yet
Data Analyst SQL Tableau in San Francisco Bay CA Resume Rod Cyrus
2 pages
Persistent Forecasting of Disruptive Technologies Report 2 Committee On Forecasting Future Disruptive Technologies 2024 Scribd Download
100% (4)
Persistent Forecasting of Disruptive Technologies Report 2 Committee On Forecasting Future Disruptive Technologies 2024 Scribd Download
71 pages
Hsd130 Manual Operador
No ratings yet
Hsd130 Manual Operador
233 pages
IoT Based Intelligent Billboard
No ratings yet
IoT Based Intelligent Billboard
4 pages
Satellite Vs Bigfix Blog
No ratings yet
Satellite Vs Bigfix Blog
2 pages
Aa 7 3 RL
100% (1)
Aa 7 3 RL
45 pages
Embrizon PROJECTS MINI MAJOR
No ratings yet
Embrizon PROJECTS MINI MAJOR
5 pages
Underwater Single Image Restoration Using Cyclegan
No ratings yet
Underwater Single Image Restoration Using Cyclegan
16 pages
PSC Deficiency Cards
No ratings yet
PSC Deficiency Cards
70 pages
Definition of Basic Insulation Level BIL
100% (1)
Definition of Basic Insulation Level BIL
5 pages
Complete Download Ruby On Rails Up and Running 1st Edition Bruce Tate PDF All Chapters
100% (7)
Complete Download Ruby On Rails Up and Running 1st Edition Bruce Tate PDF All Chapters
70 pages
A36 Annunciator Light Diagram
No ratings yet
A36 Annunciator Light Diagram
3 pages
A02 e SEBA Hydrometrie in Profile PDF
No ratings yet
A02 e SEBA Hydrometrie in Profile PDF
44 pages
Manual Mazda PDF
100% (4)
Manual Mazda PDF
707 pages
Pilani Campus: Birla Institute of Technology and Science, Pilani
No ratings yet
Pilani Campus: Birla Institute of Technology and Science, Pilani
3 pages
CCTV Course Outline
No ratings yet
CCTV Course Outline
2 pages
99 Speedmart Thesis
50% (2)
99 Speedmart Thesis
4 pages
EL3013 Sistem Instrumentasi - 03 Measurement Noise and
No ratings yet
EL3013 Sistem Instrumentasi - 03 Measurement Noise and
43 pages
What Is An Engine Overhaul - Quora
No ratings yet
What Is An Engine Overhaul - Quora
3 pages
Feedback: Your Answer Is Correct. The Correct Answer Is: The Strings File
No ratings yet
Feedback: Your Answer Is Correct. The Correct Answer Is: The Strings File
9 pages
The Big Book of Mlops: Ebook
100% (1)
The Big Book of Mlops: Ebook
36 pages
Apache 160 Maintenance Schedule
No ratings yet
Apache 160 Maintenance Schedule
3 pages
TBQ Feher DigitalLogic
No ratings yet
TBQ Feher DigitalLogic
100 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Python Programming

Uploaded by

Python Programming

Uploaded by

Python Programming

1.2 Scope of Project

2.3.0 User Interface Design

2.3.1 Data Flows

3.2 the Spider Class

3.3 Custom Settings

3.4 Starting the Scraping Process

3.5 Parsing the Data

3.6 Saving the Data

4.2 Running the Tests

4.4 Performance Check

4.5 Discussion of Challenges

Several challenges were encountered during development:

4.6 Data Quality Assurance

Ensuring high-quality data is essential. Inconsistencies in website structures may lead to

4.8 Future Improvements

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.