0% found this document useful (0 votes)

13 views3 pages

IMDB Dataframe Insights

The document outlines a comprehensive analysis plan for IMDB data, detailing various steps including data inspection, transformation, and visualization. Key tasks involve calculating profits, extracting top movies, analyzing ratings, and demographic insights across genres. The final goal is to compile a complete report based on the findings and visualizations using Power BI.

Uploaded by

mythris107

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views3 pages

IMDB Dataframe Insights

Uploaded by

mythris107

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Insights to explore on IMDB Data

1. Inspect the data frame for dimensions, null-values, and summary of different
columns
2. Get the summary of numeric columns
3. Convert the unit of the budget and gross columns from $ to million $
4. Create a new column profit and sort the data frame using the profit then Extract the
top ten profiting movies in descending order and store them in a new data frame
— top10
5. Plot a scatter plot for budget and profit features and write a few words on what you
observed
6. Extract the movies with a negative profit and store them in a new data frame –
negative_profit
7. Create a new column Avg_rating (average of the MetaCritic and Rating) in the data
frame and arrange the movies in the descending order of Avg_rating
8. Find the trios which have the most number of Facebook likes combined (i.e the sum
of actor_1_facebook_likes, actor_2_facebook_likes, and actor_3_facebook_likes should
be maximum) and find out the top 5 popular trios, and output their names in a list –
Write a few words on what you observed
9. Check how the Runtime variable is distributed by plotting a histogram or distplot of
seaborn to find the Runtime range most of the movies fall into.
10. Although R rated movies are restricted movies for the under 18 age group, still there
are vote counts from that age group, so filter these R rated movies and sort them by
‘CVotesU18’ in descending order. Get the top 5 among all the R rated movies that have
been voted by the under-18 age group.
11.Display Title of The Movie Having Runtime Greater Than or equal to 180 Minutes
12.In Which Year There Was The Highest Average Voting?
13.In Which Year There Was The Highest Average Revenue?
14.Find The Average Rating For Each Director
15.Display Top 10 Lengthy Movie Titles and Runtime
16.Display Number of Movies released Per Year
17.Find Most Popular Movie Title (Highest Revenue)
18.Display Top 10 Highest Rated Movies And its Directors
19.Display Top 10 Highest Revenue Movies
20.Find Average Rating of Movies Year Wise
21.Does Rating Affect The Revenue ?
22.Classify Movies Based on Ratings [Excellent, Good, and Average]
23.Count the Number of Action Movies
24.How Many Films of Each Genre Were Made?
25. Demographic Analysis – Create a new data frame genre_top10 as below
a. Create a new dataframe df_by_genre that contains genre_1, genre_2,
and genre_3 and all the columns related to CVotes/Votes from the movies data
frame. There are 47 columns to be extracted in total. add a column called cnt to the
dataframe df_by_genre and initialize it to one.
b. Group the dataframe df_by_genre by genre_1 and find the sum of all the numeric
columns such as cnt, columns related to CVotes and Votes columns and store it in
a dataframe df_by_g1. Performing the same operation for genre_2 and genre_3 and
store it dataframes df_by_g2 and df_by_g3 respectively
c. Now that we have 3 dataframes performed by grouping over genre_1, genre_2,
and genre_3 separately, it's time to combine them. For this, add the three
dataframes and store it in a new dataframe df_add, so that the corresponding
values of Votes/CVotes get added for each genre(use the function add())
d. The column cnt on aggregation has basically kept the track of the number of
occurrences of each genre. Subset the genres that have at least 10 movies into a
new dataframe genre_top10 based on the cnt column value.
e. Now, take the mean of all the numeric columns by dividing them with the column
value cnt and store it back to the same dataframe.
f. Since the number of votes can’t be a fraction, typecasting all the CVotes related
columns to integers. Also, round off all the Votes related columns up to two digits
after the decimal point.
g. Now the final data frame genre_top10 should have the complete information about
all the demographic (Votes- and CVotes-related) columns across the top 10 genres.
26. By using the genre_top10 data frame (created in above step) draw some
insights as below
a. plot a bar chart for different genres vs cnt using seaborn
b. Plot a heatmap to see how the average number of votes of males is varying across
the genres. Use a seaborn heatmap for this analysis. The X-axis should contain the
four age-groups for males, i.e., CVotesU18M, CVotes1829M, CVotes3044M,
and CVotes45AM. The Y-axis will have the genres and the annotation in the
heatmap tell the average number of votes for that age-male group – Draw the
inferences from this plotting
c. heatmap to see how the average number of votes of females is varying across the
genres. Use a seaborn heatmap for this analysis. The X-axis should contain the four
age-groups for females, i.e., CVotesU18F, CVotes1829F, CVotes3044F,
and CVotes45AF. The Y-axis will have the genres and the annotation in the heatmap
tell the average number of votes for that age-female group - Draw the inferences
from this plotting
d. Plot a heatmap to see how the average number of votes of females is varying across
the genres. Use a seaborn heatmap for this analysis. The X-axis should contain the
four age-groups for females, i.e., VotesU18F, Votes1829F, Votes3044F,
and Votes45AF. The Y-axis will have the genres and the annotation in the heatmap
tell the average number of votes for that age-female group - Draw the inferences
from this plotting
e. Sort the dataframe genre_top10 based on the value of CVotes1000in descending
order.
27.USA vs non-USA cross-analysis – Consider movies data frame for this analysis –
- Create a column IFUS in the dataframe movies. The column IFUS should contain
the value "USA" if the Country of the movie is "USA". For all other countries other
than the USA, IFUS should contain the value non-USA.
- Make a boxplot that shows how the number of votes from the US people
i.e. CVotesUS is varying for the US and non-US movies. Make use of the
column IFUS to make this plot. Similarly, make another subplot that shows how
non-US voters have voted for the US and non-US movies by
plotting CVotesnUS for both the US and non-US movies.
- Draw the inferences for this analysis
28. Write a complete report on IMDB data collected based on the analysis
done in all the above steps and also show the visualization using Power BI

*******************************HAPPY ANALYSIS****************************************

Becoming A Person of Influence How To Positively Impact The Lives of Others PDF Ebook by John C. Maxwelll
No ratings yet
Becoming A Person of Influence How To Positively Impact The Lives of Others PDF Ebook by John C. Maxwelll
3 pages
IMDB Movie Analysis 05 Project
No ratings yet
IMDB Movie Analysis 05 Project
7 pages
Faculty Evaluation System Final Report
No ratings yet
Faculty Evaluation System Final Report
55 pages
Vertopal.com IMDb+Movie+Assignment Stub
No ratings yet
Vertopal.com IMDb+Movie+Assignment Stub
9 pages
DSLAB5
No ratings yet
DSLAB5
17 pages
IMDB Movie Analysis1
No ratings yet
IMDB Movie Analysis1
14 pages
5
No ratings yet
5
3 pages
Final Project1 IMDB Movie Analysis PDF
No ratings yet
Final Project1 IMDB Movie Analysis PDF
9 pages
Movie Recommendation System Analysis
No ratings yet
Movie Recommendation System Analysis
8 pages
Group 15 Report
No ratings yet
Group 15 Report
23 pages
IMDB Movie Analysis
No ratings yet
IMDB Movie Analysis
2 pages
Report
No ratings yet
Report
26 pages
Source Code
No ratings yet
Source Code
19 pages
Report Final-MovieLens
No ratings yet
Report Final-MovieLens
47 pages
Analytic Project Report APR
No ratings yet
Analytic Project Report APR
42 pages
IMDB Movie Analysis
No ratings yet
IMDB Movie Analysis
17 pages
Movies Statistical Analysis
No ratings yet
Movies Statistical Analysis
3 pages
IMDB Analysis
No ratings yet
IMDB Analysis
4 pages
Movie Recommendation System in R Jupyter Notebook
No ratings yet
Movie Recommendation System in R Jupyter Notebook
18 pages
Ads - Phase 5
No ratings yet
Ads - Phase 5
14 pages
Moviesuggester - Jupyter Notebook
No ratings yet
Moviesuggester - Jupyter Notebook
11 pages
Project Movielense Solution
No ratings yet
Project Movielense Solution
4 pages
Practical Work 1 - Recommender Systems
No ratings yet
Practical Work 1 - Recommender Systems
3 pages
project 5
No ratings yet
project 5
5 pages
Project Problem Statement
No ratings yet
Project Problem Statement
3 pages
SNEHA KUMARI_262_DS PROJECT.
No ratings yet
SNEHA KUMARI_262_DS PROJECT.
19 pages
ARTICULO ANALYSIS E IMPLEMENTACION DE FILM Y MARCADO EN PYTHON
No ratings yet
ARTICULO ANALYSIS E IMPLEMENTACION DE FILM Y MARCADO EN PYTHON
8 pages
RE Paper
No ratings yet
RE Paper
25 pages
04 - Movie Rating Analysis
No ratings yet
04 - Movie Rating Analysis
9 pages
Movie Notebook
No ratings yet
Movie Notebook
91 pages
Adriano Axel Pliopas Pereira - 83393 - Exercise 8 - Ggplot2movies
No ratings yet
Adriano Axel Pliopas Pereira - 83393 - Exercise 8 - Ggplot2movies
15 pages
Document (3)
No ratings yet
Document (3)
4 pages
Project Movielense Solution
29% (7)
Project Movielense Solution
4 pages
Movies Final Report
No ratings yet
Movies Final Report
22 pages
DAV_PROJECT
No ratings yet
DAV_PROJECT
22 pages
Week 3
No ratings yet
Week 3
2 pages
R Project 98
No ratings yet
R Project 98
15 pages
IMDB Movie Analysis: by Biswajeet Nayak
No ratings yet
IMDB Movie Analysis: by Biswajeet Nayak
23 pages
Project 4 Imdb Movie Analysis
No ratings yet
Project 4 Imdb Movie Analysis
17 pages
movie data analysis netflix
No ratings yet
movie data analysis netflix
16 pages
Project 2 - Movielens Case Study
No ratings yet
Project 2 - Movielens Case Study
5 pages
Bollywood and Heart Data Analysis
No ratings yet
Bollywood and Heart Data Analysis
15 pages
Python Project Description
No ratings yet
Python Project Description
4 pages
Movie Mania Report With Code
No ratings yet
Movie Mania Report With Code
3 pages
Recommendation System
No ratings yet
Recommendation System
11 pages
BCM Project
No ratings yet
BCM Project
4 pages
project 5
No ratings yet
project 5
13 pages
IMDB Movie Analysis
No ratings yet
IMDB Movie Analysis
6 pages
Hands-On Lab - Importing Data in R
No ratings yet
Hands-On Lab - Importing Data in R
8 pages
3 An Illustrative Analysis: 3.1 Gathering Data
No ratings yet
3 An Illustrative Analysis: 3.1 Gathering Data
11 pages
DA Lab Program-6
No ratings yet
DA Lab Program-6
4 pages
Python
No ratings yet
Python
30 pages
Building Groups Sets Hierarchies
No ratings yet
Building Groups Sets Hierarchies
1 page
IMDB Movie Analysis - PDF
No ratings yet
IMDB Movie Analysis - PDF
8 pages
Netflix Data Analysis
No ratings yet
Netflix Data Analysis
23 pages
Investigate A Dataset
No ratings yet
Investigate A Dataset
14 pages
Netflix Recommendation Based On IMDB
No ratings yet
Netflix Recommendation Based On IMDB
5 pages
Exp 7
No ratings yet
Exp 7
64 pages
Rotten Tomatoes Audience Rating Prediction
No ratings yet
Rotten Tomatoes Audience Rating Prediction
36 pages
IMDB Movie Analysis Report
No ratings yet
IMDB Movie Analysis Report
11 pages
Thinking Statistically
From Everand
Thinking Statistically
Anthony Banfield
5/5 (1)
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
From Everand
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
Sama Alshatali
No ratings yet
(John Cowley) Communications and Networking An in PDF
No ratings yet
(John Cowley) Communications and Networking An in PDF
236 pages
1 - Bios Tweaks
No ratings yet
1 - Bios Tweaks
13 pages
ALS Geochemistry Fee Schedule USD 2023
No ratings yet
ALS Geochemistry Fee Schedule USD 2023
52 pages
Maple Model(s) PLC or Controller
No ratings yet
Maple Model(s) PLC or Controller
4 pages
CN-WEEK11
No ratings yet
CN-WEEK11
8 pages
OSPF Authentication Cisco CCNP
No ratings yet
OSPF Authentication Cisco CCNP
6 pages
[FREE PDF sample] DirectX 11 1 Game Programming 1st Edition Pooya Eimandar ebooks
No ratings yet
[FREE PDF sample] DirectX 11 1 Game Programming 1st Edition Pooya Eimandar ebooks
40 pages
A review on speaker recognition_ Technology and challenges
No ratings yet
A review on speaker recognition_ Technology and challenges
14 pages
BSNL Toll Free Service
No ratings yet
BSNL Toll Free Service
1 page
FGBMFI Panduan Identitas PDF
No ratings yet
FGBMFI Panduan Identitas PDF
13 pages
DX Diag
No ratings yet
DX Diag
45 pages
Ieeesensors
No ratings yet
Ieeesensors
5 pages
Unit 1 Introduction To Computers, Algorithms & C
No ratings yet
Unit 1 Introduction To Computers, Algorithms & C
73 pages
A-iC_PDFa
No ratings yet
A-iC_PDFa
11 pages
Computer Event Advert
No ratings yet
Computer Event Advert
5 pages
rs232针脚分配
100% (2)
rs232针脚分配
6 pages
WI MAX Technology
No ratings yet
WI MAX Technology
25 pages
Keysight X-Series RELEASE NOTES
No ratings yet
Keysight X-Series RELEASE NOTES
107 pages
Buy ebook Introductory Differential Equations Fourth Edition Martha L. Abell cheap price
100% (1)
Buy ebook Introductory Differential Equations Fourth Edition Martha L. Abell cheap price
82 pages
Clustering: Source: I. Business Analytics by U Dinesh Kumar Means-Example-1.htm) rial/Clustering/Numerical Example - HTM
No ratings yet
Clustering: Source: I. Business Analytics by U Dinesh Kumar Means-Example-1.htm) rial/Clustering/Numerical Example - HTM
24 pages
Pptsoft 1
No ratings yet
Pptsoft 1
48 pages
Stimulation of Three Point Bending Test For Aluminum
No ratings yet
Stimulation of Three Point Bending Test For Aluminum
12 pages
IEEE-paper-format-template[1][1]
No ratings yet
IEEE-paper-format-template[1][1]
7 pages
Software Release Life Cycle
No ratings yet
Software Release Life Cycle
9 pages
Venkata Sai Sandeep Vennam_10 yrs_Senior Identity & Access Management Engineer
0% (1)
Venkata Sai Sandeep Vennam_10 yrs_Senior Identity & Access Management Engineer
4 pages
Azure Virtual Machines Types
No ratings yet
Azure Virtual Machines Types
4 pages
OWASP Top Ten _ OWASP Foundation
No ratings yet
OWASP Top Ten _ OWASP Foundation
5 pages
Book-8-version-0.1-1
No ratings yet
Book-8-version-0.1-1
21 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

IMDB Dataframe Insights

Uploaded by

IMDB Dataframe Insights

Uploaded by

Insights to explore on IMDB Data

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.