Python Programming
Python Programming
Final Project
Price Comparison
By
Mana Rahimi
202180090100
Zhengzhou University
1
Contents
1.0Introduction…………………………………………………………3
1.1Purpose………………………………………………………………3
1.2Scope of Project……………………………………..………………3
2.0 Design………………………………………………………………3
2.1 Technologies Used…………………….……………………………3
2.2 Price Comparison St…………………………………………………4
2.3.0 User Interface Design………………………………………………4
2.4 DFD Diagram…………………………………………………………5
2.5 Flowchart Diagram: ……………………….…………………………6
3.0 Implementation………………………………………………………7
3.1 Setting Up the Environment…………………………….……………7
3.2 the Spider Class……………………..………………………………7
3.3 Custom Settings………………………………………………………7
3.4 Starting the Scraping Process…………………………………………7
3.5 Parsing the Data………………………………………………………7-8
3.6 Saving the Data………………………………………………………8
4.0 Testing …………………..……………………………………………8
5. Conclusion…………………………..…………………………………11
2
1.0 Introduction
1.1 Purpose
This report explains how we built a web scraper using Scrapy, a user-friendly tool in Python
designed for gathering data from websites. The main goal of this project was to create a handy
tool that allows users to search for products on eBay and Amazon, collecting important details
like product names, prices, and links. This report will discuss the design, implementation and
testing process, and plan to improve this scraper in the future.
2.0 Design
2.1 Technologies Used
The following technologies are utilized to build the scraper:
• Python Language: This serves as the primary programming language.
• Scrapy: An open-source framework that simplifies data gathering from websites.
• XPath/CSS Selectors: These tools aid in navigating and extracting data from HTML
documents.
• Virtual Environment: This is set up to manage dependencies and ensure smooth operation
across different systems.
3
2.2 Price Comparison Structure
The system is designed to be efficient and easy to understand. Its main components include:
2.2.1 User Input Module: This module collects the product name from the user.
2.2.2 Spider Class: This special class in Scrapy handles requests (asking for data) and responses
(receiving data back).
2.2.3 Request Handling: This section constructs the appropriate requests to eBay and Amazon
based on the product name provided.
2.2.4 Data Extraction Logic: This component utilizes tools called XPath and CSS selectors to
read HTML responses and extract the necessary product information.
2.2.5 Output Management: This part saves the scraped data in various formats as specified by
the user.
4
2.4 DFD Diagram
Figure 2 : Diagram showing the flow of data and the interaction between components
5
2.5 Flowchart Diagram:
Figure 1 : The diagram shows a sequential process for scraping and presenting product prices.
1. Start :
• This marks the beginning of the program or process.
2. Enter the Desired Product Name in Search Bar:
• The user inputs the name of the product they want to search for.
• This input will be used to fetch product details from the specified websites.
3. Web Scraping from eBay and Amazon:
• The program sends requests to Amazon and eBay, retrieves the HTML of the product search
results, and extracts relevant data like the product name, price, and links.
• This is where the scraping logic is implemented.
4. Show All the Prices of That Product From Both Websites :
• The extracted data (product details, prices and links) from both websites is displayed to the
user in a files.
5. End :
• This marks the completion of the program or workflow.
6
3.0 Implementation
3.1 Setting Up the Environment
The first step in building the scraper involves setting up the environment. A virtual environment
is created using venv to keep all project files organized and separate from other projects. Scrapy
is installed using pip, along with other necessary libraries like requests for handling web
requests.The system will operate on various platforms, including Windows, macOS, and Linux.
It will require an internet connection for web scraping and updating show information. The
software will be compatible with most modern web browsers for user interaction.
4.0 Testing
4.1 Creating Test Cases
During testing, several test cases were created to ensure the scraper operates effectively in
various scenarios:
1. Valid Product Name: When a valid product name is inputted, the system should return relevant
results from both eBay and Amazon.
2. Non-existent Product Testing: For product names that do not exist, the system should return no
results while avoiding any errors.
3. Special Characters Handling: The scraper's ability to manage product names containing
special characters or spaces was tested.
8
4.3 Results of Testing
Testing results indicated that the scraper functioned well under typical conditions:
• Valid product names yielded accurate results, displaying product names and prices correctly.
• Non-existent products returned no results without errors, demonstrating robustness in
challenging scenarios.
• Special characters were processed correctly through URL encoding, allowing for diverse search
queries.
9
4.7 Legal Considerations
Web scraping must be approached with caution regarding the terms of service established by
various websites. Some sites explicitly prohibit scraping, making it crucial to understand and
adhere to these regulations during scraper development.
10
5 Conclusion
The web scraping system effectively simplifies the price comparison process for users by
automating data collection from e-commerce platforms. By leveraging Python, Scrapy, and
advanced data extraction techniques, the system ensures accurate and efficient performance.
Testing confirms its robustness in handling various scenarios, including special characters and
non-existent products.
The project provides a solid foundation for future enhancements, including the addition of a
graphical user interface (GUI), database integration, and advanced filtering options. While
challenges such as changing website structures and legal considerations were encountered, these
can be mitigated with continuous updates and adherence to ethical scraping practices.
11