0% found this document useful (0 votes)

8 views5 pages

project 5

The document outlines a structured analysis of a movie dataset using Pandas, including data reading, cleaning, and analysis tasks. Key findings include the identification of the highest-grossing movies, the top 250 IMDb-rated films, and the favorite actors based on user and critic reviews. The analysis serves to provide insights valuable to movie enthusiasts and industry professionals.

Uploaded by

Aisha Sheikh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views5 pages

project 5

Uploaded by

Aisha Sheikh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Task 1: Reading and Inspection

Subtask 1.1: Import and read the movie database

We begin by importing the necessary libraries and reading the movie

dataset into a Pandas DataFrame.

pythonCopy code
import numpy as np
import pandas as pd

# Read the movie dataset

movies = pd.read_csv("Movies.csv")

Subtask 1.2: Inspect the dataframe

We inspect the dataset to understand its structure and contents.

pythonCopy code
# Check the number of rows and columns
print("Number of rows and columns:", movies.shape)

# Check columns with null values

print("Columns with null values:", (movies.isnull().sum() >
0).sum())

Answers to Questions:

1. There are 3821 rows and 26 columns in the dataframe.

2. Three columns have null values.

Task 2: Cleaning the Data

Subtask 2.1: Drop unnecessary columns

We drop columns that are not required for our analysis.

pythonCopy code
columns_to_drop = [
'color', 'director_facebook_likes', 'actor_1_facebook_likes',
'actor_2_facebook_likes', 'actor_3_facebook_likes', 'actor_2_name',
'cast_total_facebook_likes', 'actor_3_name', 'duration',
'facenumber_in_poster', 'content_rating', 'country',
'movie_imdb_link', 'aspect_ratio', 'plot_keywords'
]

movies.drop(columns=columns_to_drop, inplace=True)

Answers to Questions: 3. After dropping unnecessary columns, the

dataframe contains 10 columns.

Subtask 2.2: Inspect Null values

We find the percentage of null values in each column.

pythonCopy code
null_percentage = (movies.isnull().sum() / len(movies)) * 100

Answers to Questions: 4. The column with the highest percentage of

null values is “language”.

Subtask 2.3: Fill NaN values

We fill NaN values in the “language” column with “English”.

pythonCopy code
movies.language.fillna("English", inplace=True)

Answers to Questions: 5. After filling NaN values, there are 3670

movies made in the English language.

Task 3: Data Analysis

Subtask 3.1: Change the unit of columns

We convert the unit of the “budget” and “gross” columns from

dollars to million dollars.

pythonCopy code
movies.gross = movies.gross / 1000000
movies.budget = movies.budget / 1000000

Subtask 3.2: Find the movies with the highest profit

We calculate the “profit” for each movie and find the top ten
profiting movies.

pythonCopy code
movies["Profit"] = movies.gross - movies.budget
top10 = movies.sort_values("Profit", ascending=False).head(10)

Answers to Questions: 6. The movie ranked 5th from the top in the
list is “The Avengers”.

Subtask 3.3: Find IMDb Top 250

We create a dataframe IMDb_Top_250 containing the top 250 movies
with the highest IMDb rating and where num_voted_users is greater
than 25,000.

IMDb_Top_250 = movies[(movies['imdb_score'] > 8.0) &

(movies['num_voted_users'] > 25000)]
IMDb_Top_250 = IMDb_Top_250.sort_values(by='imdb_score',
ascending=False).head(250)
IMDb_Top_250['Rank'] = range(1, IMDb_Top_250.shape[0] + 1)

Answers to Questions: 7. The bucket holding the maximum number

of movies from IMDb_Top_250 is "8 to 8.5".

Subtask 3.4: Find the critic-favorite and audience-favorite actors

We create dataframes for three actors,

namely, Meryl_Streep, Leo_Caprio, and Brad_Pitt, containing movies
where they are the lead actors. Then, we combine these dataframes,
group by actor, and find the mean of critic and user reviews.

Meryl_Streep = movies[movies["actor_1_name"] == "Meryl Streep"]

Leo_Caprio = movies[movies["actor_1_name"] == "Leonardo DiCaprio"]
Brad_Pitt = movies[movies["actor_1_name"] == "Brad Pitt"]
Combined = pd.concat([Meryl_Streep, Leo_Caprio, Brad_Pitt], axis=0)
actor_reviews =
Combined.groupby(by="actor_1_name")[["num_critic_for_reviews",
"num_user_for_reviews"]].mean()

Answers to Questions: 8 and 9

1. According to user reviews, “Leonardo DiCaprio” is the

highest-rated among the three actors.
2. According to critic reviews, “Leonardo DiCaprio” is also
the highest-rated among the three actors.

Conclusion
In this analysis, we explored a movie dataset, cleaned the data, and
conducted various analyses to find interesting insights about
movies, actors, and ratings. We discovered the highest-grossing
movies, IMDb’s top 250 movies, and the favorite actors among
critics and audiences. This analysis provides valuable information
for movie enthusiasts and industry professionals.

DSBDA Mini Project
No ratings yet
DSBDA Mini Project
19 pages
IMDB Movie Analysis 05 Project
No ratings yet
IMDB Movie Analysis 05 Project
7 pages
Submission I - Case Study For PGDDS (Semester II)
No ratings yet
Submission I - Case Study For PGDDS (Semester II)
14 pages
Vertopal.com IMDb+Movie+Assignment Stub
No ratings yet
Vertopal.com IMDb+Movie+Assignment Stub
9 pages
IMDB Movie Analysis: by Biswajeet Nayak
No ratings yet
IMDB Movie Analysis: by Biswajeet Nayak
23 pages
IMDB Movie Analysis1
No ratings yet
IMDB Movie Analysis1
14 pages
IMDB Movie Analysis
No ratings yet
IMDB Movie Analysis
80 pages
IMDB Analysis
No ratings yet
IMDB Analysis
4 pages
IMDB Movie Analysis
No ratings yet
IMDB Movie Analysis
17 pages
Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram
No ratings yet
Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram
27 pages
Practical Work 1 - Recommender Systems
No ratings yet
Practical Work 1 - Recommender Systems
3 pages
Exp 7
No ratings yet
Exp 7
64 pages
R_
No ratings yet
R_
13 pages
Data Analysis using Python_Homework 5.docx
No ratings yet
Data Analysis using Python_Homework 5.docx
3 pages
Group 15 Report
No ratings yet
Group 15 Report
23 pages
IMDB Movie Analysis Report
No ratings yet
IMDB Movie Analysis Report
11 pages
IP CSV Project For Class 12
No ratings yet
IP CSV Project For Class 12
22 pages
Final Project
No ratings yet
Final Project
7 pages
Recommendation System
No ratings yet
Recommendation System
11 pages
NEEL (1) Edited Edited
No ratings yet
NEEL (1) Edited Edited
12 pages
Report Final-MovieLens
No ratings yet
Report Final-MovieLens
47 pages
DA Lab Program-6
No ratings yet
DA Lab Program-6
4 pages
NEEL (1)_edited
No ratings yet
NEEL (1)_edited
12 pages
Movies Final Report
No ratings yet
Movies Final Report
22 pages
Final Project1 IMDB Movie Analysis PDF
No ratings yet
Final Project1 IMDB Movie Analysis PDF
9 pages
COM 428 - Jupyter Notebook2_101223
No ratings yet
COM 428 - Jupyter Notebook2_101223
16 pages
imdb
No ratings yet
imdb
11 pages
Hitchhiker's Guide To Exploratory Data Analysis - by Harshit Tyagi - Towards Data Science
No ratings yet
Hitchhiker's Guide To Exploratory Data Analysis - by Harshit Tyagi - Towards Data Science
14 pages
Movie Recommendation System Analysis
No ratings yet
Movie Recommendation System Analysis
8 pages
NAAN MUTHALVAN PRACTICAL SAMPLE
No ratings yet
NAAN MUTHALVAN PRACTICAL SAMPLE
7 pages
3 An Illustrative Analysis: 3.1 Gathering Data
No ratings yet
3 An Illustrative Analysis: 3.1 Gathering Data
11 pages
Neel
No ratings yet
Neel
12 pages
PMT2 23
No ratings yet
PMT2 23
28 pages
Divya_NM[1]-2
No ratings yet
Divya_NM[1]-2
41 pages
NEEL (1)
No ratings yet
NEEL (1)
12 pages
Netflix Data Exploration Solution Approach
No ratings yet
Netflix Data Exploration Solution Approach
6 pages
Pandas Data Frame For Beginners
No ratings yet
Pandas Data Frame For Beginners
25 pages
Moviesuggester - Jupyter Notebook
No ratings yet
Moviesuggester - Jupyter Notebook
11 pages
netflix-case
0% (1)
netflix-case
19 pages
Chapter2 - Data Wrangling
No ratings yet
Chapter2 - Data Wrangling
48 pages
Team_Renegades_MMLA_Report
No ratings yet
Team_Renegades_MMLA_Report
27 pages
Netflix Data - Cleaning, Analysis and Visualization - (Data Analyst)
No ratings yet
Netflix Data - Cleaning, Analysis and Visualization - (Data Analyst)
24 pages
IMDB Dataframe Insights
No ratings yet
IMDB Dataframe Insights
3 pages
RS2
No ratings yet
RS2
16 pages
Simplifying Data Science With Python
From Everand
Simplifying Data Science With Python
Billy David millican
No ratings yet
KS
No ratings yet
KS
23 pages
15 Pandas Function For 90 - of The Work
No ratings yet
15 Pandas Function For 90 - of The Work
12 pages
Experiment 7
No ratings yet
Experiment 7
3 pages
Recommender System
No ratings yet
Recommender System
45 pages
swati mam The_iScale movies project code
No ratings yet
swati mam The_iScale movies project code
13 pages
Using Excel With Pandas
No ratings yet
Using Excel With Pandas
16 pages
Document (3)
No ratings yet
Document (3)
4 pages
Investigate A Dataset
No ratings yet
Investigate A Dataset
14 pages
Movie Recommendation System in R Jupyter Notebook
No ratings yet
Movie Recommendation System in R Jupyter Notebook
18 pages
Project Movielense Solution
No ratings yet
Project Movielense Solution
4 pages
202203100110120_Set-A
No ratings yet
202203100110120_Set-A
7 pages
Chapter 9 - Recommendation Systems
No ratings yet
Chapter 9 - Recommendation Systems
12 pages
A1: Resit Coursework: Big Data (6CS030)
100% (1)
A1: Resit Coursework: Big Data (6CS030)
40 pages
Movie Recommendation System
No ratings yet
Movie Recommendation System
22 pages
Cleaning Data in Python
No ratings yet
Cleaning Data in Python
47 pages
AIML Mod4 Loki
No ratings yet
AIML Mod4 Loki
11 pages
12 Ip Question Ms 24 25 Bgr
No ratings yet
12 Ip Question Ms 24 25 Bgr
12 pages
Pandas Doc 1681445742
No ratings yet
Pandas Doc 1681445742
47 pages
IP Pyq's
No ratings yet
IP Pyq's
109 pages
843-AI-XI (1)
No ratings yet
843-AI-XI (1)
12 pages
Pandas Series Questions FINAL Corrected
No ratings yet
Pandas Series Questions FINAL Corrected
3 pages
Scipy,Matplotlib,Pandas
No ratings yet
Scipy,Matplotlib,Pandas
16 pages
Customer Segmentation Using K-Means Algorithm PROJECT
No ratings yet
Customer Segmentation Using K-Means Algorithm PROJECT
28 pages
Introduction to Python
No ratings yet
Introduction to Python
6 pages
Pandas Notes
No ratings yet
Pandas Notes
9 pages
Python Code For Loan Default Prediction
No ratings yet
Python Code For Loan Default Prediction
4 pages
IP 12th Practical File
No ratings yet
IP 12th Practical File
29 pages
NSIC Internship GEN AI 1 Month 05.05.2025
No ratings yet
NSIC Internship GEN AI 1 Month 05.05.2025
3 pages
Aai101 Data Science Question Bank
No ratings yet
Aai101 Data Science Question Bank
24 pages
OceanofPDF - Com Python Machine Learning The Beginners Gu - Lilly Trinity
No ratings yet
OceanofPDF - Com Python Machine Learning The Beginners Gu - Lilly Trinity
115 pages
Assvid
No ratings yet
Assvid
13 pages
IP Project Intro Ackno
No ratings yet
IP Project Intro Ackno
10 pages
Pandas - Dataframe - Merging or Joining
No ratings yet
Pandas - Dataframe - Merging or Joining
29 pages
Mohammad Wahaj Tariq Resume Senior Full Stack Data Engineer
No ratings yet
Mohammad Wahaj Tariq Resume Senior Full Stack Data Engineer
3 pages
Python Data Analysis: Perform data collection, data processing, wrangling, visualization, and model building using Python 3rd Edition Avinash Navlani download
100% (1)
Python Data Analysis: Perform data collection, data processing, wrangling, visualization, and model building using Python 3rd Edition Avinash Navlani download
55 pages
Advanced+Data+Science+&+AI++Certification+Program
No ratings yet
Advanced+Data+Science+&+AI++Certification+Program
1 page
Prathamesh Ghatole: Experience
No ratings yet
Prathamesh Ghatole: Experience
2 pages
TIME - Vivian Siahaan - AMAZON STOCK PRICE - VISUALIZATION - FORECASTING - AND PREDIC
100% (1)
TIME - Vivian Siahaan - AMAZON STOCK PRICE - VISUALIZATION - FORECASTING - AND PREDIC
672 pages
12 IP File Programs 6 To 17
No ratings yet
12 IP File Programs 6 To 17
9 pages
118 - Driver Drowsiness Monitoring System Using Visual Behaviour and Machine Learning
No ratings yet
118 - Driver Drowsiness Monitoring System Using Visual Behaviour and Machine Learning
89 pages
Aishwarya Resume
No ratings yet
Aishwarya Resume
1 page
12 IP Dataframe and Pyplot Notes
No ratings yet
12 IP Dataframe and Pyplot Notes
14 pages
12 Ip Pb1 Ahd Qp b
No ratings yet
12 Ip Pb1 Ahd Qp b
9 pages
Pandas_Notes
No ratings yet
Pandas_Notes
6 pages
Intership Final
No ratings yet
Intership Final
23 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

project 5

Uploaded by

project 5

Uploaded by

Task 1: Reading and Inspection

Subtask 1.1: Import and read the movie database

We begin by importing the necessary libraries and reading the movie

# Read the movie dataset

Subtask 1.2: Inspect the dataframe

We inspect the dataset to understand its structure and contents.

# Check columns with null values

1. There are 3821 rows and 26 columns in the dataframe.

2. Three columns have null values.

Task 2: Cleaning the Data

We drop columns that are not required for our analysis.

Answers to Questions: 3. After dropping unnecessary columns, the

Subtask 2.2: Inspect Null values

We find the percentage of null values in each column.

Answers to Questions: 4. The column with the highest percentage of

Subtask 2.3: Fill NaN values

We fill NaN values in the “language” column with “English”.

Answers to Questions: 5. After filling NaN values, there are 3670

Task 3: Data Analysis

We convert the unit of the “budget” and “gross” columns from

Subtask 3.2: Find the movies with the highest profit

Subtask 3.3: Find IMDb Top 250

IMDb_Top_250 = movies[(movies['imdb_score'] > 8.0) &

Answers to Questions: 7. The bucket holding the maximum number

Subtask 3.4: Find the critic-favorite and audience-favorite actors

We create dataframes for three actors,

Meryl_Streep = movies[movies["actor_1_name"] == "Meryl Streep"]

Answers to Questions: 8 and 9

1. According to user reviews, “Leonardo DiCaprio” is the

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.