Assignment Question Oct 2024
Assignment Question Oct 2024
You are given a dataset from The Movie Database (TMDB). Follow these instructions to complete the
assignment:
• Section A: Write an introduction based on your literature review.
• Section B: Conduct data examination by identifying and addressing missing values and data
errors.
• Section C: Perform data treatment as needed to prepare the dataset for analysis.
• Section D: Develop a data dictionary to define and describe each variable in the dataset.
• Section E: Develop a dashboard
INSTRUCTIONS
b) The following screenshot shows an example of the layout for the INTRODUCTION worksheet.
C. Data Treatment
Perform the following tasks:
a) Create a New Attribute for Popularity: Use a grouping method to categorize the Popularity
variable into five distinct categories.
b) Create a Year Range Attribute: Use tokenization or coding to derive a new attribute
representing a range of years based on the Release_date variable.
c) Extract Movie Language: Apply chunking to separate the movie language from the Title
attribute and derive a new attribute for language.
d) Separate Main Genre: Use chunking to split combined genres in the Main genre attribute
where two genres appear together.
e) Identify Top Genres: Identify the top five genres and visualize the pattern of movie ratings
across these genres.
f) Analyze Popularity Patterns: Identify any significant patterns in movie popularity within the
production timeline.
g) Explore Popularity and Rating Relationships: Examine the relationship between Popularity
and Average_rating, grouping these attributes using a pivot table.
h) Visualize Unpopular Movies: Define "unpopular" movies as those with popularity between 0
and 100. Visualize the rating and genre pattern for these movies, grouping Popularity in ranges
of 100 up to 1000.
i) Genre-based Rating Visualization: Visualize movie rating patterns based on genre.
j) Top Genres in English Movies: Identify the top 10 production genres for English-language
movies.
k) Visualize Ratings and Views: Create a visualization showing Average_rating in relation to
Average view per day.
D: Data Dictionary
Using the following template, create a data dictionary and record your answers in the DATA
DICTIONARY worksheet:
E: Data Visualization
Create and plot seven charts and arrange them in the DASHBOARD worksheet.
F: Submission
1. Submit your work in MS-Excel file and name the file as <TEAM LEADER NAME>.xls
2. The file must contain the following worksheets: