0% found this document useful (0 votes)
6 views

Python Programming

The document outlines a final project for a price comparison web scraper developed using Python and Scrapy, designed to automate the retrieval of product information from eBay and Amazon. It details the project's purpose, scope, design, implementation, testing, and future improvements, emphasizing user convenience and data accuracy. The scraper effectively addresses challenges such as website structure changes and legal considerations while providing a foundation for further enhancements like a user-friendly interface and database integration.

Uploaded by

mana79rhm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Python Programming

The document outlines a final project for a price comparison web scraper developed using Python and Scrapy, designed to automate the retrieval of product information from eBay and Amazon. It details the project's purpose, scope, design, implementation, testing, and future improvements, emphasizing user convenience and data accuracy. The scraper effectively addresses challenges such as website structure changes and legal considerations while providing a foundation for further enhancements like a user-friendly interface and database integration.

Uploaded by

mana79rhm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODT, PDF, TXT or read online on Scribd
You are on page 1/ 11

Python Programming

Final Project

Price Comparison
By
Mana Rahimi
202180090100

Zhengzhou University

1
Contents
1.0Introduction…………………………………………………………3
1.1Purpose………………………………………………………………3
1.2Scope of Project……………………………………..………………3
2.0 Design………………………………………………………………3
2.1 Technologies Used…………………….……………………………3
2.2 Price Comparison St…………………………………………………4
2.3.0 User Interface Design………………………………………………4
2.4 DFD Diagram…………………………………………………………5
2.5 Flowchart Diagram: ……………………….…………………………6
3.0 Implementation………………………………………………………7
3.1 Setting Up the Environment…………………………….……………7
3.2 the Spider Class……………………..………………………………7
3.3 Custom Settings………………………………………………………7
3.4 Starting the Scraping Process…………………………………………7
3.5 Parsing the Data………………………………………………………7-8
3.6 Saving the Data………………………………………………………8
4.0 Testing …………………..……………………………………………8
5. Conclusion…………………………..…………………………………11

2
1.0 Introduction
1.1 Purpose
This report explains how we built a web scraper using Scrapy, a user-friendly tool in Python
designed for gathering data from websites. The main goal of this project was to create a handy
tool that allows users to search for products on eBay and Amazon, collecting important details
like product names, prices, and links. This report will discuss the design, implementation and
testing process, and plan to improve this scraper in the future.

1.2 Scope of Project


In today's digital world, online shopping has changed the way we buy things. With so many
options available at our fingertips, it can be overwhelming for shoppers to compare prices and
features of products.
However, comparing prices across multiple e-commerce platforms can be time-consuming and
challenging. The Price Comparison System addresses this issue by automating the process of
retrieving and comparing product prices from major online retailers like eBay and Amazon.
This system is designed to enhance user convenience by providing accurate and up-to-date
information on product prices, links, and other relevant details. Users only need to input the
name of the desired product, and the system efficiently scrapes data from multiple platforms,
presenting it in a structured format such as JSON or CSV.

2.0 Design
2.1 Technologies Used
The following technologies are utilized to build the scraper:
• Python Language: This serves as the primary programming language.
• Scrapy: An open-source framework that simplifies data gathering from websites.
• XPath/CSS Selectors: These tools aid in navigating and extracting data from HTML
documents.
• Virtual Environment: This is set up to manage dependencies and ensure smooth operation
across different systems.

3
2.2 Price Comparison Structure
The system is designed to be efficient and easy to understand. Its main components include:
2.2.1 User Input Module: This module collects the product name from the user.
2.2.2 Spider Class: This special class in Scrapy handles requests (asking for data) and responses
(receiving data back).
2.2.3 Request Handling: This section constructs the appropriate requests to eBay and Amazon
based on the product name provided.
2.2.4 Data Extraction Logic: This component utilizes tools called XPath and CSS selectors to
read HTML responses and extract the necessary product information.
2.2.5 Output Management: This part saves the scraped data in various formats as specified by
the user.

2.3.0 User Interface Design


The user interface of the scraper is designed to be simple and clean. It prompts the user for only
one piece of information—the product name—making it accessible for anyone without requiring
detailed instructions or technical skills.

2.3.1 Data Flows


The data flow through the Price Comparison system is as follows:
2.3.1 User Input: The user types in a product name.
2.3.2 URL Construction: The spider creates search URLs for both eBay and Amazon using the
provided product name.
2.3.3 Concurrent Requests: The scraper sends requests to both websites simultaneously to
achieve faster results.
2.3.4 Response Handling: The spider receives HTML responses and processes them using
predefined methods.
2.3.5 Data Extraction: Relevant product information is extracted and organized into structured
data.
2.3.6 Data Output: The scraped data is saved in the selected format (JSON, CSV, or JSON
Lines) for later use.

4
2.4 DFD Diagram

Figure 2 : Diagram showing the flow of data and the interaction between components

5
2.5 Flowchart Diagram:

Figure 1 : The diagram shows a sequential process for scraping and presenting product prices.

1. Start :
• This marks the beginning of the program or process.
2. Enter the Desired Product Name in Search Bar:
• The user inputs the name of the product they want to search for.
• This input will be used to fetch product details from the specified websites.
3. Web Scraping from eBay and Amazon:
• The program sends requests to Amazon and eBay, retrieves the HTML of the product search
results, and extracts relevant data like the product name, price, and links.
• This is where the scraping logic is implemented.
4. Show All the Prices of That Product From Both Websites :
• The extracted data (product details, prices and links) from both websites is displayed to the
user in a files.
5. End :
• This marks the completion of the program or workflow.
6
3.0 Implementation
3.1 Setting Up the Environment
The first step in building the scraper involves setting up the environment. A virtual environment
is created using venv to keep all project files organized and separate from other projects. Scrapy
is installed using pip, along with other necessary libraries like requests for handling web
requests.The system will operate on various platforms, including Windows, macOS, and Linux.
It will require an internet connection for web scraping and updating show information. The
software will be compatible with most modern web browsers for user interaction.

3.2 the Spider Class


The main component of the web scraper is the ProductSearcherSpider class. This class is built
using Scrapy and is tasked with sending requests to eBay and Amazon while understanding the
responses received.

3.3 Custom Settings


To ensure that the scraper behaves like a legitimate web browser (which helps avoid getting
blocked), custom settings are configured within the spider class. A user-agent string is added to
identify the scraper as a standard web browser.

3.4 Starting the Scraping Process


The start_requests method initiates the scraping process by generating search URLs for eBay and
Amazon based on user input. Python’s string formatting features are utilized to construct these
URLs.

3.5 Parsing the Data


Two distinct methods are created to handle the data returned from each website:
7
• eBay Parsing: This method employs XPath selectors to extract product details from eBay’s
search results page, pulling out information such as product names , prices from each listing and
product links.
• Amazon Parsing: Similar to the eBay parsing method, this approach uses XPath selectors to
retrieve product information from Amazon’s results page, including product names, prices, and
links.

3.6 Saving the Data


To save the scraped data, Scrapy’s feed exports feature is utilized. The CrawlerProcess class is
configured with options for data storage and format. Users have the flexibility to save their data
as JSON, CSV, or JSON Lines.

4.0 Testing
4.1 Creating Test Cases
During testing, several test cases were created to ensure the scraper operates effectively in
various scenarios:
1. Valid Product Name: When a valid product name is inputted, the system should return relevant
results from both eBay and Amazon.
2. Non-existent Product Testing: For product names that do not exist, the system should return no
results while avoiding any errors.
3. Special Characters Handling: The scraper's ability to manage product names containing
special characters or spaces was tested.

4.2 Running the Tests


Tests were conducted in a controlled environment by inputting various product names into the
scraper. The results were compared with manual searches conducted on eBay and Amazon's
websites.

8
4.3 Results of Testing
Testing results indicated that the scraper functioned well under typical conditions:
• Valid product names yielded accurate results, displaying product names and prices correctly.
• Non-existent products returned no results without errors, demonstrating robustness in
challenging scenarios.
• Special characters were processed correctly through URL encoding, allowing for diverse search
queries.

4.4 Performance Check


Performance assessments focused on the scraper's speed and data retrieval efficiency. The
scraper demonstrated efficient operation, with response times comparable to standard web
browsing speeds. However, excessive requests within a short timeframe could lead to temporary
blocks from e-commerce sites.

4.5 Discussion of Challenges

Several challenges were encountered during development:


1. Changing Website Structures: Frequent updates to eBay and Amazon's website layouts
affected data extraction methods using XPath and CSS selectors. Continuous updates to the
scraper are necessary to adapt to these changes.

4.6 Data Quality Assurance

Ensuring high-quality data is essential. Inconsistencies in website structures may lead to


variations in HTML formatting, potentially resulting in missing information during data
scraping. Extensive testing was conducted to verify that the collected data remains accurate and
complete.

9
4.7 Legal Considerations
Web scraping must be approached with caution regarding the terms of service established by
various websites. Some sites explicitly prohibit scraping, making it crucial to understand and
adhere to these regulations during scraper development.

4.8 Future Improvements


Several enhancements could be implemented to improve the Product Searcher:
1. User-Friendly Interface: Developing a graphical user interface (GUI) would facilitate easier
use for individuals who may not be familiar with command-line operations.
2. Better Filtering Options: Incorporating filters for price range or seller ratings would assist
users in finding specific products more effectively.
3. Database Storage: Transitioning from simple file storage to a database system would enhance
data management and search capabilities.
4. Improved Error Handling: Implementing robust error handling mechanisms would allow the
scraper to navigate issues such as layout changes or network problems more smoothly.
5. Faster Performance: Utilizing multi-threading could significantly accelerate the scraping
process, especially when processing large volumes of data.
6. Scheduled Scraping: Adding a feature for automatic scraping at designated times would
alleviate the need for manual intervention.
7. Data Visualization: Integrating visualization tools to create graphs and charts would enable
users to easily identify trends and comparisons in the data.

10
5 Conclusion
The web scraping system effectively simplifies the price comparison process for users by
automating data collection from e-commerce platforms. By leveraging Python, Scrapy, and
advanced data extraction techniques, the system ensures accurate and efficient performance.
Testing confirms its robustness in handling various scenarios, including special characters and
non-existent products.
The project provides a solid foundation for future enhancements, including the addition of a
graphical user interface (GUI), database integration, and advanced filtering options. While
challenges such as changing website structures and legal considerations were encountered, these
can be mitigated with continuous updates and adherence to ethical scraping practices.

11

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy