KPMG Task1
KPMG Task1
We have received the three raw datasets from SP rocket central Pty Limited. As per the preliminary
task, in the below-mentioned list, we have analysed the quality of the raw data and we found multiple
quality issues that need to be addressed. Also, we have suggested recommendations to mitigate the
quality issues and improve the effectiveness of the data.
1. Redundant Outliers.
Issue: Some of the data values are outliers and can disrupt the whole dataset. For example, The
customer ID “34” with the name of Jephthah Bachmann was born in 1843, meaning that he is
175 years old which is an error in the data in the Customer Demographic Table.
Recommendation: Remove the redundant data as it may skew the distribution of the dataset.
2. Missing Values.
Issue: Multiple attributes like “Online Order”, “Brand Name”, “Product Line”, “Product Class”, “
Product Size”, “Standard Cost”, and “product_first_sold_date” in the Transactions table had
blank values. Also, In the Customer Demographic “Job Title”, “Job Category” and “Tenure” some
of the records are missing.
Recommendation: We will only perform the analysis on the synced data of all the three
customer tables across the customer_ID.
Recommendation: Remove the special characters from the records and convert all the
characters into numeric data to ensure consistent data types.
Please look into the above-mentioned quality issues along with the recommended changes to
ensure the consistent quality of the dataset across all the tables. If all the suggestions are
matched we can proceed with further analysis of the data to find some suitable insights for the
company.
Regards,
Vinit Shetty.