32 BDA Exp7&8

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Experiment No: 07 & 08

Name: Jess John Roll No.: 32


Batch: B Performance Date: 26/09/2024

Topic: Data Analysis & Data Visualization using Hive/PIG/R/Tableau.

Prerequisite: Basic knowledge of Data Analysis & data visualization tools like
Hive/PIG/R/Tableau etc.
Mapping With CSL702.6
COs:
Objectives: Able to be used for data analysis, statistical modeling, and data
visualization tasks.
Outcome: Analyze the datasets using Data Analysis tools & generate reports using
data visualization tools capabilities.
Instructions: This experiment is a compulsory experiment. All the students are
required to perform this experiment individually.
Deliverables: Submission:
1. Installation steps snapshots.
2. Detail about Datasets
3. Data Analysis
4. Data Visualization
Installation Installation steps snapshots for Windows 10 systems

Step 1: Search Download Tableau in Internet Browser, you will be


shown with the installation page on the first link or use the following
link
- https://www.tableau.com/products/desktop/download

You will be redirected to the following page


Step 2: Click on Start your free trial button to install in your Windows
System.

Tableau software will get download into your local system.

Step 3: Installing the software in local machine -


Step 4 - Tableau will be installed in the system and will be
redirected to the page below -

Datasets The Global Superstore dataset is a comprehensive collection of


transactional data from a fictional global online retailer, primarily used
for data analysis and visualization. This dataset contains around 50,000
records of orders placed between 2011 and 2015, encompassing various
aspects of sales, customer demographics, and product categories.

Key Features of the Dataset


 Order Information: Each record includes details about the
order such as order ID, product details (name, category, sub-
category), quantity, sales amount, and shipping information.
 Customer Data: The dataset captures customer-specific
information including customer ID, name, address, and segment
(Consumer, Corporate, Home Office).
 Sales Metrics: It provides metrics like sales revenue, profit
margins, shipping costs, and discounts applied.
 Temporal Data: The dataset includes timestamps for order
dates and ship dates which are essential for time-series analysis.
Orders Dataset

Column Name Description

Row ID A unique identifier for each row in the dataset.

Order ID A unique identifier for each order placed by customers.

Order Date The date when the order was placed. This is crucial for
time-based analyses and trends.

Ship Date The date when the order was shipped to the customer.
Important for evaluating shipping efficiency.

Ship Mode The method used for shipping (e.g., Standard Class,
Second Class, etc.). Influences delivery time and cost.

Customer ID A unique identifier for each customer, useful for tracking


customer behavior over time.

Customer The name of the customer who placed the order.


Name

Segment The market segment to which the customer belongs (e.g.,


Consumer, Corporate, Home Office). Helps in
segmentation analysis.

City The city where the customer resides, useful for geographic
sales analysis.

State The state where the customer resides, providing a more


granular geographic view.

Country The country of the customer, essential for international


sales analysis.

Postal Code The postal code of the customer's address, useful for
detailed geographic segmentation.
Market Indicates the market classification (e.g., Domestic,
International), which can impact sales strategies.

Region The broader region (e.g., East, West) where the customer
is located, helpful for regional performance analysis.

Product ID A unique identifier for each product sold in the store,


essential for inventory management and sales tracking.

Category The main category of products (e.g., Technology,


Furniture, Office Supplies) to facilitate category-level
analysis.

Sub-Category A more specific classification within a category (e.g.,


Chairs under Furniture), useful for detailed product
analysis.

Product Name The name of the product sold, important for identifying
specific items in sales reports.

Sales The total sales amount generated from each order line
item, critical for revenue analysis.

Quantity The number of units sold in each order line item,


important for inventory and sales volume analysis.

Discount The discount applied to the order line item, which affects
overall sales revenue and profit margins.

Profit The profit earned from each order line item after
accounting for costs, crucial for profitability analysis.

Shipping Cost The cost incurred to ship the order to the customer;
impacts overall profitability and pricing strategies.

Order Priority Indicates the priority level of the order (e.g., High,
Medium, Low), useful for managing fulfillment processes.
Returns Dataset

Column Name Description

Returned Indicates whether an order was returned (Yes) or not


(No).

Order ID The unique identifier for the order that was returned.

Market The market classification (e.g., LATAM, US, etc) of the


returned order.
People Dataset

Column Description
Name

People Contains information about individual employees, such as


names and identifiers.

Region Refers to the broader geographical location or area where the


employee is located, useful for organizational planning and
management.
Data Analysis
The following analysis can be done on the dataset with the help of all
the above mentioned tables to gain significant insights:
1. Total Sales of Each Product Category: This analysis quantifies
the total sales generated by each product category within the
Global SuperStore dataset.
2. Revenue by Markets: This analysis examines the total revenue
generated from different geographical markets.
3. Sales of Quantity of Each Product Type by Markets: This
analysis tracks the quantity sold for each product type across
various markets.
4. Profits per Country: This analysis evaluates the total profits
earned from each country represented in the dataset.
5. Revenue for Markets using Maps: This analysis visualizes
revenue data geographically, highlighting performance across
different regions.
6. Revenue Per Month: This analysis assesses the total revenue
generated on a monthly basis over a specified time period.
7. Revenue Forecasting: This analysis predicts future revenue
trends based on historical sales data and market conditions.

Loading the dataset


Data
Visualization Total Sales of Each Product Category
Revenue by Markets

Sales of Quantity of Each Product Type by Markets

Profits per Country


Revenue for Markets using Maps

Revenue Per Month

Revenue Forecasting
Creating a Dashboard

Conclusion: Students will be able to successfully perform Data Analysis & Data
Visualization using Tableau.
References: 1. Moodle Notes
2. https://www.tableau.com/
3. https://help.tableau.com/current/pro/desktop/en-
us/getstarted_buildmanual_ex1basic.htm
4. https://www.analyticsvidhya.com/blog/2021/10/step-by-step-
guide-data-visualization-tableau/
5. https://community.tableau.com/s/question/0D5
4T00000C5vSDSAZ/global-superstore-data-file
6. https://www.kaggle.com/datasets/apoorvaappz/global-super-
store-dataset
Don Bosco Institute of Technology
Department of Computer Engineering

Assessment Rubric for Experiment No.: 07 & 08

Title of the Experiment: Data Analysis & Data Visualization using Hive/PIG/R/Tableau
Year and Semester: IVth Year and VIIth Semester

Faculty In-charge: Ms. Sana Shaikh

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy