0% found this document useful (0 votes)
14 views3 pages

Assignment Question Oct 2024

The assignment for DSC651 involves analyzing a dataset from The Movie Database (TMDB) through various sections including an introduction, data examination, data treatment, a data dictionary, and data visualization. Students are required to identify and address missing values and data errors, create new attributes, and visualize data patterns. The final submission must be an MS-Excel file containing specific worksheets and is due by 5 PM on November 11, 2024.

Uploaded by

2024767081
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views3 pages

Assignment Question Oct 2024

The assignment for DSC651 involves analyzing a dataset from The Movie Database (TMDB) through various sections including an introduction, data examination, data treatment, a data dictionary, and data visualization. Students are required to identify and address missing values and data errors, create new attributes, and visualize data patterns. The final submission must be an MS-Excel file containing specific worksheets and is due by 5 PM on November 11, 2024.

Uploaded by

2024767081
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

ASSIGNMENT (30%)

DSC651 - DATA REPRESENTATION AND REPORTING


TECHNIQUES

You are given a dataset from The Movie Database (TMDB). Follow these instructions to complete the
assignment:
• Section A: Write an introduction based on your literature review.
• Section B: Conduct data examination by identifying and addressing missing values and data
errors.
• Section C: Perform data treatment as needed to prepare the dataset for analysis.
• Section D: Develop a data dictionary to define and describe each variable in the dataset.
• Section E: Develop a dashboard

Complete each section thoroughly to ensure a comprehensive analysis.

INSTRUCTIONS

A: Introduction (50 marks)


a) Write a brief introduction about The Movie Database (TMDB) that comprise of the following
sections:
i. Project Background
ii. Project Objectives
iii. Data context
iv. Target audience
v. Within 150 to 250 words

b) The following screenshot shows an example of the layout for the INTRODUCTION worksheet.

Prepared on Semester Oct 2024 [29102024]


B: Data Examination
You have been provided with a dataset from The Movie Database (TMDB). Using pivot tables, carry
out the following data examination tasks:
a) Identify Missing Values:
a. Use pivot tables to examine and locate any missing values in the dataset.
b. Take a screenshot of the output showing missing values.
c. Replace missing values with appropriate data, based on the context of the dataset.
b) Identify Data Errors:
a. Use pivot tables to detect data errors such as typos, incorrect formatting, or
unwanted symbols/characters.
b. Take a screenshot of the output showing these errors.
c. Correct the identified errors by adjusting typos, formatting numbers, and removing
extra symbols or characters.
c) Documentation:
a. Record your findings and solutions from both tasks in the DATA EXAMINATION
worksheet.
b. Summarize these findings in the DE FEEDBACK worksheet, referring to the Tutorial 3B
for the DE Feedback table format.

C. Data Treatment
Perform the following tasks:
a) Create a New Attribute for Popularity: Use a grouping method to categorize the Popularity
variable into five distinct categories.
b) Create a Year Range Attribute: Use tokenization or coding to derive a new attribute
representing a range of years based on the Release_date variable.
c) Extract Movie Language: Apply chunking to separate the movie language from the Title
attribute and derive a new attribute for language.
d) Separate Main Genre: Use chunking to split combined genres in the Main genre attribute
where two genres appear together.
e) Identify Top Genres: Identify the top five genres and visualize the pattern of movie ratings
across these genres.
f) Analyze Popularity Patterns: Identify any significant patterns in movie popularity within the
production timeline.
g) Explore Popularity and Rating Relationships: Examine the relationship between Popularity
and Average_rating, grouping these attributes using a pivot table.
h) Visualize Unpopular Movies: Define "unpopular" movies as those with popularity between 0
and 100. Visualize the rating and genre pattern for these movies, grouping Popularity in ranges
of 100 up to 1000.
i) Genre-based Rating Visualization: Visualize movie rating patterns based on genre.
j) Top Genres in English Movies: Identify the top 10 production genres for English-language
movies.
k) Visualize Ratings and Views: Create a visualization showing Average_rating in relation to
Average view per day.

Prepared on Semester Oct 2024 [29102024]


Tips
1. Missing values – at least 30 data
2. Data errors – at least 30 data
3. Formatting - at least 30 data
4. Tokenization or coding - 1 derivative attribute

D: Data Dictionary

Using the following template, create a data dictionary and record your answers in the DATA
DICTIONARY worksheet:

No Variable Variable Description Data Types Level of


Name Measurement
1 Hobby Activity done regularly in leisure time for pleasure. Categorical Nominal
2

E: Data Visualization
Create and plot seven charts and arrange them in the DASHBOARD worksheet.

F: Submission
1. Submit your work in MS-Excel file and name the file as <TEAM LEADER NAME>.xls
2. The file must contain the following worksheets:

3. Submit the assignment by 5 PM on 11 November 2024 in the designated folder.

Prepared on Semester Oct 2024 [29102024]

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy