project 5
project 5
pythonCopy code
import numpy as np
import pandas as pd
pythonCopy code
# Check the number of rows and columns
print("Number of rows and columns:", movies.shape)
Answers to Questions:
pythonCopy code
columns_to_drop = [
'color', 'director_facebook_likes', 'actor_1_facebook_likes',
'actor_2_facebook_likes', 'actor_3_facebook_likes', 'actor_2_name',
'cast_total_facebook_likes', 'actor_3_name', 'duration',
'facenumber_in_poster', 'content_rating', 'country',
'movie_imdb_link', 'aspect_ratio', 'plot_keywords'
]
movies.drop(columns=columns_to_drop, inplace=True)
pythonCopy code
null_percentage = (movies.isnull().sum() / len(movies)) * 100
pythonCopy code
movies.language.fillna("English", inplace=True)
pythonCopy code
movies.gross = movies.gross / 1000000
movies.budget = movies.budget / 1000000
We calculate the “profit” for each movie and find the top ten
profiting movies.
pythonCopy code
movies["Profit"] = movies.gross - movies.budget
top10 = movies.sort_values("Profit", ascending=False).head(10)
Answers to Questions: 6. The movie ranked 5th from the top in the
list is “The Avengers”.
Conclusion
In this analysis, we explored a movie dataset, cleaned the data, and
conducted various analyses to find interesting insights about
movies, actors, and ratings. We discovered the highest-grossing
movies, IMDb’s top 250 movies, and the favorite actors among
critics and audiences. This analysis provides valuable information
for movie enthusiasts and industry professionals.