0% found this document useful (0 votes)

21 views

KPMG Task1

1. The data analyst identified multiple quality issues in the raw datasets including redundant outliers, missing values, inconsistent entries across tables, multiple data types in single columns, and duplicate values. 2. Recommendations were provided to address each issue, such as removing outliers and redundant data, dropping records with missing values, only analyzing synced data across customer tables, converting all values in problematic columns to the same data type, and standardizing abbreviations and values in columns with duplicates. 3. The client is asked to review the identified issues and recommendations to ensure consistent data quality before proceeding with further analysis.

Uploaded by

2021519199.shafaque

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views

KPMG Task1

Uploaded by

2021519199.shafaque

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 2

Dear [Name of the Client],

We have received the three raw datasets from SP rocket central Pty Limited. As per the preliminary
task, in the below-mentioned list, we have analysed the quality of the raw data and we found multiple
quality issues that need to be addressed. Also, we have suggested recommendations to mitigate the
quality issues and improve the effectiveness of the data.

1. Redundant Outliers.
Issue: Some of the data values are outliers and can disrupt the whole dataset. For example, The
customer ID “34” with the name of Jephthah Bachmann was born in 1843, meaning that he is
175 years old which is an error in the data in the Customer Demographic Table.

Recommendation: Remove the redundant data as it may skew the distribution of the dataset.

2. Missing Values.
Issue: Multiple attributes like “Online Order”, “Brand Name”, “Product Line”, “Product Class”, “
Product Size”, “Standard Cost”, and “product_first_sold_date” in the Transactions table had
blank values. Also, In the Customer Demographic “Job Title”, “Job Category” and “Tenure” some
of the records are missing.

Recommendation: As the percentage of missing values in the datasets is low as compared to

the whole dataset we can go proceed by removing them.

3. Inconsistent Entries across the datasets.

Issue: There are an additional number of entries in customer_ids in the Transactions
table than Customer Demographic and Customer Address Table. Hence, the skewed
data cannot be used if there are any missing records.

Recommendation: We will only perform the analysis on the synced data of all the three
customer tables across the customer_ID.

4. Multiple DataTypes for a Single Column.

Issue: For the attribute “Standard Cost” in the Transaction table there are some records with
special string characters which causes inconsistency in the dataset.

Recommendation: Remove the special characters from the records and convert all the
characters into numeric data to ensure consistent data types.

5. Duplicate values for the same column.

Issue: In the “State” Column of the Customer Address Table multiple duplicate values were
found such as “VIC” & Victoria, “NSW” & “New South Wales”. Also, the issue is in the “Gender”
column of the Customer Demographic Dataset.
Recommendation: To use abbreviations of the states instead of full names for all the records to
ensure consistency across addresses. For Gender Column, the records “U” can be imputed with
the distribution of the dataset.

Please look into the above-mentioned quality issues along with the recommended changes to
ensure the consistent quality of the dataset across all the tables. If all the suggestions are
matched we can proceed with further analysis of the data to find some suitable insights for the
company.

Regards,
Vinit Shetty.

KPMG - VI Data Quality Assessment
50% (2)
KPMG - VI Data Quality Assessment
1 page
SMDM Project Report - Shubham Bakshi - 07.05.2023
0% (1)
SMDM Project Report - Shubham Bakshi - 07.05.2023
23 pages
Module 1 Answer PDF
100% (4)
Module 1 Answer PDF
2 pages
ETL Test Scenarios and Test Cases
78% (9)
ETL Test Scenarios and Test Cases
5 pages
SPAN UTG: Table of Water Demand Estimation
100% (4)
SPAN UTG: Table of Water Demand Estimation
2 pages
Addressed Issues
100% (1)
Addressed Issues
3 pages
TASK 1 Data - Quality - Analysis
No ratings yet
TASK 1 Data - Quality - Analysis
2 pages
Dataset Quality Report
100% (1)
Dataset Quality Report
6 pages
Additional Customer - Ids in The Transactions Table' and Customer Address Table' But Not in Customer Master (Customer Demographic) '
No ratings yet
Additional Customer - Ids in The Transactions Table' and Customer Address Table' But Not in Customer Master (Customer Demographic) '
2 pages
Data Quality Assessment-Sprocket Central Pty LTD
No ratings yet
Data Quality Assessment-Sprocket Central Pty LTD
2 pages
Customer Demographic: Field Name Errors
No ratings yet
Customer Demographic: Field Name Errors
2 pages
Task 1-Email
No ratings yet
Task 1-Email
3 pages
KPMG Data Analytics - Task 1
100% (1)
KPMG Data Analytics - Task 1
1 page
Data Schema Basics
From Everand
Data Schema Basics
Mei Gates
No ratings yet
Wrangle Report
No ratings yet
Wrangle Report
7 pages
Assessing Data Quality Dimensions
No ratings yet
Assessing Data Quality Dimensions
9 pages
SMDM Project Report
No ratings yet
SMDM Project Report
39 pages
KPMG Data Quality Assessment
No ratings yet
KPMG Data Quality Assessment
2 pages
Data Quality
No ratings yet
Data Quality
6 pages
TASK 1 Data - Quality - Analysis
No ratings yet
TASK 1 Data - Quality - Analysis
1 page
Microsoft Excel Statistical and Advanced Functions for Decision Making
From Everand
Microsoft Excel Statistical and Advanced Functions for Decision Making
Palani Murugappan
No ratings yet
8) Improper Filter: It Is Very Common Mistake Where Developer
No ratings yet
8) Improper Filter: It Is Very Common Mistake Where Developer
2 pages
Test Scenario Test Cases: Validation
No ratings yet
Test Scenario Test Cases: Validation
4 pages
Project Descriptioin
No ratings yet
Project Descriptioin
5 pages
TSK 1
No ratings yet
TSK 1
3 pages
unit 5(13 MARKS)
No ratings yet
unit 5(13 MARKS)
24 pages
IT Specialist: Data Analytics Certification Prep - 500 Exam Questions and Explanations
From Everand
IT Specialist: Data Analytics Certification Prep - 500 Exam Questions and Explanations
Steve Brown
No ratings yet
Lect 6
No ratings yet
Lect 6
36 pages
Steel Investment Castings World Summary: Market Sector Values & Financials by Country
From Everand
Steel Investment Castings World Summary: Market Sector Values & Financials by Country
Editorial DataGroup
No ratings yet
Data Analytics Consulting: Mohammad Waseem Shaikh 17cs002052
No ratings yet
Data Analytics Consulting: Mohammad Waseem Shaikh 17cs002052
16 pages
Steel Mill Products World Summary: Market Sector Values & Financials by Country
From Everand
Steel Mill Products World Summary: Market Sector Values & Financials by Country
Editorial DataGroup
No ratings yet
BAR MID 1
No ratings yet
BAR MID 1
12 pages
Data Processing Service Lines World Summary: Market Values & Financials by Country
From Everand
Data Processing Service Lines World Summary: Market Values & Financials by Country
Editorial DataGroup
No ratings yet
CR-640 BPP DDD - D&B Project (20 May)
No ratings yet
CR-640 BPP DDD - D&B Project (20 May)
18 pages
(Mba-Ft - Year-Ii) Data Analysis Group Assignment: Submitted To: Prof. Chetan Jhaveri Date of Submission: 25 July, 2019
No ratings yet
(Mba-Ft - Year-Ii) Data Analysis Group Assignment: Submitted To: Prof. Chetan Jhaveri Date of Submission: 25 July, 2019
10 pages
Data Quality Assessment KPMG AU - Forage
No ratings yet
Data Quality Assessment KPMG AU - Forage
1 page
20 Scenario Q&A for Data Analyst
No ratings yet
20 Scenario Q&A for Data Analyst
4 pages
Big Data Quality Assurance (Manual) - Interview Questionnaire v1.0 1
No ratings yet
Big Data Quality Assurance (Manual) - Interview Questionnaire v1.0 1
9 pages
KPMG - Sprocket
No ratings yet
KPMG - Sprocket
1 page
Case Study 1 Data Mart
No ratings yet
Case Study 1 Data Mart
6 pages
CA 1 Watson Studio
No ratings yet
CA 1 Watson Studio
11 pages
Amc 5 Obj 61
No ratings yet
Amc 5 Obj 61
2 pages
Nondestructive Testing World Summary: Market Values & Financials by Country
From Everand
Nondestructive Testing World Summary: Market Values & Financials by Country
Editorial DataGroup
No ratings yet
Web Search Portals World Summary: Market Values & Financials by Country
From Everand
Web Search Portals World Summary: Market Values & Financials by Country
Editorial DataGroup
No ratings yet
Institutional Metal Bookstacks World Summary: Market Sector Values & Financials by Country
From Everand
Institutional Metal Bookstacks World Summary: Market Sector Values & Financials by Country
Editorial DataGroup
No ratings yet
Summary_ Lifecycle of Data Analysis -3982
No ratings yet
Summary_ Lifecycle of Data Analysis -3982
7 pages
GROUP 5
No ratings yet
GROUP 5
2 pages
Grocery Dataset Findings Mail
No ratings yet
Grocery Dataset Findings Mail
2 pages
Cable Network Revenues World Summary: Market Values & Financials by Country
From Everand
Cable Network Revenues World Summary: Market Values & Financials by Country
Editorial DataGroup
No ratings yet
Assignment1
No ratings yet
Assignment1
2 pages
IIT FDS Assignment 1 Likhita
No ratings yet
IIT FDS Assignment 1 Likhita
7 pages
ScalerMart Case Study
No ratings yet
ScalerMart Case Study
3 pages
Customer Segmentation 1683225943
No ratings yet
Customer Segmentation 1683225943
34 pages
Data Analytics With Spreadsheets and SQL Project Templates
No ratings yet
Data Analytics With Spreadsheets and SQL Project Templates
5 pages
Employment Agencies World Summary: Market Values & Financials by Country
From Everand
Employment Agencies World Summary: Market Values & Financials by Country
Editorial DataGroup
No ratings yet
Adm Q&a
No ratings yet
Adm Q&a
13 pages
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Datamart 1st Phase Analysis and Reconsilation of Data Source
No ratings yet
Datamart 1st Phase Analysis and Reconsilation of Data Source
2 pages
Data Cleaning
No ratings yet
Data Cleaning
35 pages
B Tech-AIML-question bank-2 Answer Key
No ratings yet
B Tech-AIML-question bank-2 Answer Key
9 pages
Information Services World Summary: Market Values & Financials by Country
From Everand
Information Services World Summary: Market Values & Financials by Country
Editorial DataGroup
No ratings yet
Silabus e Commerce
No ratings yet
Silabus e Commerce
2 pages
Eresco Mf4: Reliable, Lightweight, Portable X-Ray Generator
No ratings yet
Eresco Mf4: Reliable, Lightweight, Portable X-Ray Generator
8 pages
ARM CLIT New Format
100% (1)
ARM CLIT New Format
65 pages
MBA (2018-20 Batch) - Summer Internship Project Guidelines For SIP Report Preparation
No ratings yet
MBA (2018-20 Batch) - Summer Internship Project Guidelines For SIP Report Preparation
1 page
Understanding Google's Networks
No ratings yet
Understanding Google's Networks
6 pages
Final SOR 2023-24_vadoadra
No ratings yet
Final SOR 2023-24_vadoadra
267 pages
Huawei AP Interface Management Command
No ratings yet
Huawei AP Interface Management Command
217 pages
Nego Illustration Answered Part 1
No ratings yet
Nego Illustration Answered Part 1
2 pages
Arch Associates Management Consulting Firm Profile
No ratings yet
Arch Associates Management Consulting Firm Profile
5 pages
Valvulas Vickers Tn32
100% (2)
Valvulas Vickers Tn32
40 pages
Nego Midterms Reviewer PDF
No ratings yet
Nego Midterms Reviewer PDF
60 pages
PAcking List Sandwick
No ratings yet
PAcking List Sandwick
1 page
Traffic Management Module 1 1
100% (1)
Traffic Management Module 1 1
174 pages
T2 - 912010019 - Daftar Pustaka PDF
No ratings yet
T2 - 912010019 - Daftar Pustaka PDF
5 pages
Flow Assurance
No ratings yet
Flow Assurance
9 pages
HPE Aruba Networking Foundational Care-a00137490enw
No ratings yet
HPE Aruba Networking Foundational Care-a00137490enw
10 pages
Polyamide/imide (PAI) Information
No ratings yet
Polyamide/imide (PAI) Information
3 pages
Kirka Hack Script
No ratings yet
Kirka Hack Script
32 pages
Disaster - Bhopal Gas Tragedy Research Report
No ratings yet
Disaster - Bhopal Gas Tragedy Research Report
13 pages
Presentation On 7 QC Tools
No ratings yet
Presentation On 7 QC Tools
18 pages
Substation Layout CBIP
No ratings yet
Substation Layout CBIP
82 pages
SEMINAR REPORT Railway Braking
No ratings yet
SEMINAR REPORT Railway Braking
21 pages
Radiation Safety and Protection For Dental Radiography
No ratings yet
Radiation Safety and Protection For Dental Radiography
43 pages
Bloomberg 27-Inch Flat Panel Display BFP200-27 Fixed
No ratings yet
Bloomberg 27-Inch Flat Panel Display BFP200-27 Fixed
6 pages
Blue Shark Vodka Names Retired Rear Admiral Mark Milliken As CEO
No ratings yet
Blue Shark Vodka Names Retired Rear Admiral Mark Milliken As CEO
2 pages
Westbrook-Parker Monthly Report: JANUARY 2016
No ratings yet
Westbrook-Parker Monthly Report: JANUARY 2016
7 pages
6FM150D X
No ratings yet
6FM150D X
2 pages
Informix Administrators Guide 9
No ratings yet
Informix Administrators Guide 9
748 pages
Housing First in Canada
No ratings yet
Housing First in Canada
148 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

KPMG Task1

Uploaded by

KPMG Task1

Uploaded by

Dear [Name of the Client],

Recommendation: As the percentage of missing values in the datasets is low as compared to

3. Inconsistent Entries across the datasets.

4. Multiple DataTypes for a Single Column.

5. Duplicate values for the same column.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.