0% found this document useful (0 votes)
8 views17 pages

Imdb_Movie_Analysis

The document presents an analysis of an IMDB movie dataset aimed at identifying key factors contributing to a movie's success based on IMDB ratings. It details the approach taken, including data cleaning, genre extraction, and statistical analysis, while providing insights on genre distribution, movie duration, language impact, director influence, and budget correlation with earnings. The project resulted in enhanced data analysis skills and actionable insights for stakeholders in the film industry.

Uploaded by

Sarthak Bajaj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views17 pages

Imdb_Movie_Analysis

The document presents an analysis of an IMDB movie dataset aimed at identifying key factors contributing to a movie's success based on IMDB ratings. It details the approach taken, including data cleaning, genre extraction, and statistical analysis, while providing insights on genre distribution, movie duration, language impact, director influence, and budget correlation with earnings. The project resulted in enhanced data analysis skills and actionable insights for stakeholders in the film industry.

Uploaded by

Sarthak Bajaj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

IMDB

MOVIE
ANALYSIS PREPARED BY:
DHANUSH HEGDE

SUBMITTED TO
TRAINITY

01
TABLE OF
CONTENTS
PROJECT DESCRIPTION 03

APPROACH USED 04

TECH-STACK USED 05

TASKS PERFORMED & INSIGHTS 06

SKILLS GAINED & RESULT 16

LINK TO THE EXCEL SHEET 17

02
PROJECT
DESCRIPTION
THE DATASET PROVIDED PERTAINS TO
IMDB MOVIES AND OFFERS A VALUABLE
OPPORTUNITY TO EXPLORE A SIGNIFICANT
AND RELEVANT RESEARCH QUESTION:
"WHAT ARE THE KEY FACTORS THAT
CONTRIBUTE TO A MOVIE'S SUCCESS ON
IMDB?" IN THIS CONTEXT, SUCCESS IS
PRIMARILY DEFINED BY HIGH IMDB
RATINGS, WHICH SERVE AS A WIDELY
ACCEPTED MEASURE OF AUDIENCE
SATISFACTION AND OVERALL MOVIE
QUALITY.

THIS PROBLEM IS OF CONSIDERABLE


IMPORTANCE TO VARIOUS STAKEHOLDERS
IN THE FILM INDUSTRY, INCLUDING
PRODUCERS, DIRECTORS,
SCREENWRITERS, AND INVESTORS. BY
IDENTIFYING AND ANALYZING THE
ATTRIBUTES THAT POSITIVELY INFLUENCE
IMDB RATINGS—SUCH AS GENRE, CAST,
BUDGET, DIRECTOR REPUTATION,
RUNTIME, RELEASE YEAR, AND MORE—
INDUSTRY PROFESSIONALS CAN GAIN
ACTIONABLE INSIGHTS. THESE INSIGHTS
CAN GUIDE THEM IN MAKING MORE
INFORMED DECISIONS WHEN PLANNING,
PRODUCING, AND MARKETING MOVIES,
THEREBY INCREASING THE LIKELIHOOD OF
BOTH CRITICAL AND COMMERCIAL
SUCCESS. UNDERSTANDING WHAT
RESONATES WITH AUDIENCES ALSO HELPS
IN MINIMIZING FINANCIAL RISKS AND
OPTIMIZING RESOURCE ALLOCATION IN
THE COMPETITIVE WORLD OF FILMMAKING. 03
APPROACH
Dataset Understanding Reviewed
The IMDB dataset structure to understand the
types of information provided for each movie.
Data Cleaning
Removed rows with missing critical values,
replaced numeric blanks with averages, and
ensured consistent formatting.
Genre Extraction and Transformation
Split multi-genre strings into separate columns
using Excel text functions for easier analysis.
Formula-Based Data Handling
Applied logical and error-handling formulas to
manage irregular data and extract specific
values.
Tool Used Used Microsoft Excel
For all preprocessing tasks using formulas and
built-in features.
Execution Summary
Transformed the raw dataset into a clean and
structured format suitable for analysis and
visualization.

04
TECH-STACK
USED
📊 MICROSOFT EXCEL
CORE TOOL FOR DATA HANDLING
AND PROCESSING
USED FOR:
CLEANING MISSING VALUES
SPLITTING GENRE COLUMNS
USING FORMULAS
HANDLING ERRORS AND APPLYING
LOGIC-BASED FUNCTIONS
ENABLED QUICK MANIPULATION
AND TRANSFORMATION OF THE
DATASET

🎨 CANVA 🤖AI ASSISTANT


ACTED AS A REAL-TIME GUIDE
USED TO DESIGN THE PROJECT
FOR THE PROJECT
REPORT
PROVIDED:
HELPED IN:
STEP-BY-STEP SOLUTIONS TO
CREATING WELL-STRUCTURED
EXCEL FORMULA ISSUES
LAYOUTS
EXPLANATION OF FUNCTIONS AND
ADDING VISUAL APPEAL TO THE
FORMULAS
DOCUMENTATION
INSIGHTFUL SUGGESTIONS TO
MAINTAINING CONSISTENCY IN
IMPROVE CLARITY AND
FONTS, COLORS, AND
EXECUTION
FORMATTING

05
TASK PERFORMED &
INSIGHTS
MOVIE GENRE ANALYSIS: ANALYZE THE DISTRIBUTION OF MOVIE GENRES AND THEIR
IMPACT ON THE IMDB SCORE.
TASK: DETERMINE THE MOST COMMON GENRES OF MOVIES IN THE DATASET. THEN,
FOR EACH GENRE, CALCULATE DESCRIPTIVE STATISTICS (MEAN, MEDIAN, MODE,
RANGE, VARIANCE, STANDARD DEVIATION) OF THE IMDB SCORES.

DESCRIPTIVE STATISTICS:

06
INSIGHTS
Genre Distribution Dominance: Drama accounts for approximately 17.9% of the total movies (2594 / 14500).
Comedy represents around 12.9% of the total movies (1872 / 14500).
Thriller constitutes roughly 9.7% of the total movies (1408 / 14500).
These top three genres collectively make up approximately 40.5% of the entire dataset.

Low Representation Genres:


Film-Noir has a count of only 6 movies, representing less than 0.05% of the dataset.
Game-Show is the least frequent genre with only 1 movie (< 0.01%).
Short and Reality-TV also have very low counts of 5 and 2 respectively (< 0.05% each).

Overall Score Statistics (Assuming IMDB Scores):


Mean Score: 557.69
Median Score: 267.5
Mode Score: 1
Standard Deviation: 641.75
Range of Scores: 2593 (Maximum of 2594 - Minimum of 1)
The standard deviation (641.75) is significantly larger than the mean (557.69), indicating high dispersion or
variability in the scores.
The positive skewness of 1.71 quantitatively confirms that the distribution of scores has a longer tail towards
higher values.
The kurtosis of 3.08 suggests a distribution that is slightly more peaked and has slightly heavier tails compared
to a normal distribution.

Key Quantitative Takeaways:


1. The dataset exhibits a clear concentration of movies within the Drama, Comedy, and Thriller genres,
highlighting their relative prevalence.
2. A significant number of genres have minimal representation, potentially limiting the scope of any
genre-specific IMDB score analysis.
3. The overall statistics point towards a highly variable score distribution with a tendency towards lower
values and a few exceptionally high values.

07
MOVIE DURATION ANALYSIS: ANALYZE THE DISTRIBUTION OF MOVIE DURATIONS AND
ITS IMPACT ON THE IMDB SCORE.
TASK: ANALYZE THE DISTRIBUTION OF MOVIE DURATIONS AND IDENTIFY THE
RELATIONSHIP BETWEEN MOVIE DURATION AND IMDB SCORE.

HISTOGRAM

SCATTER PLOT

08
INSIGHTS
Movie Duration Distribution:
Right-Skewed Distribution: The histogram visually demonstrates a higher frequency of shorter movie durations
with a tail extending towards longer durations.
Average Duration (Mean): 107.19 minutes.
Median Duration: 103 minutes. (The mean being higher than the median quantitatively confirms the right skew.)
Standard Deviation: 25.16 minutes. (This indicates a moderate level of variability around the average duration.)

Relationship Between Movie Duration and IMDB Score:


Weak Linear Correlation: The scatter plot shows a wide dispersion of IMDB scores across different movie
durations, indicating no strong linear relationship.
Slightly Positive Trendline: The linear trendline exhibits a marginally positive slope, suggesting a very weak
tendency for slightly longer movies to have slightly higher IMDB scores. However, the data points are
significantly scattered around this line.
No Strong Predictive Power: Duration alone does not appear to be a reliable predictor of a movie's IMDB
score, as both high and low scores are observed for movies of similar lengths.

Quantitative Summary:
The average movie duration is approximately 107 minutes.
The typical movie duration (median) is around 103 minutes.
Movie durations in the dataset vary by an average of 25 minutes from the mean. The visual and trendline
analysis suggests a negligible to very weak positive linear correlation between movie duration and IMDB score.

09
LANGUAGE ANALYSIS: SITUATION: EXAMINE THE DISTRIBUTION OF MOVIES BASED ON
THEIR LANGUAGE.
TASK: DETERMINE THE MOST COMMON LANGUAGES USED IN MOVIES AND ANALYZE
THEIR IMPACT ON THE IMDB SCORE USING DESCRIPTIVE STATISTICS.

COUNT OF LANGUAGES: MEAN ,MEDIAN AND STD DEVIATION: BOX AND WHISKERS PLOT:

10
INSIGHTS
English Language Share: English constitutes 93.5% (4716/5043) of the movies in the dataset.
Mean IMDB Score (English): 6.39.
Mean IMDB Score (Japanese): 7.39 (based on 18 movies). This is 15.6% higher than the English mean.
Mean IMDB Score (Italian): 7.22 (based on 15 movies). This is 13% higher than the English mean.
Mean IMDB Score (French): 7.03 (based on 73 movies). This is 10% higher than the English mean.
Languages with < 1% representation (excluding English) have varying mean scores but statistically weak
significance. For example, Maori (1 movie) has a mean of 8.7, but this single data point is not representative.

Key Quantitative Takeaways:


The dataset is overwhelmingly dominated by English-language films.
Japanese and Italian films in this dataset exhibit notably higher average IMDB scores compared to the English
average.
Quantitative comparisons for languages with very low movie counts are statistically unreliable.

11
DIRECTOR ANALYSIS: INFLUENCE OF DIRECTORS ON MOVIE RATINGS.
TASK: IDENTIFY THE TOP DIRECTORS BASED ON THEIR AVERAGE IMDB SCORE AND
ANALYZE THEIR CONTRIBUTION TO THE SUCCESS OF MOVIES USING PERCENTILE
CALCULATIONS.

TOP 10 DIRECTORS

STATISTICS:

12
INSIGHTS

Analysis of Director Influence on Movie Ratings:


The image presents a list of directors sorted (presumably) by their average IMDB score, along with some overall
statistics. Here's an analysis:

Top Directors Based on Average IMDB Score (Top 10):


The first 10 directors listed (based on your description) and their corresponding average IMDB scores are:
1. John Blanchard: 9.5
2. Mitchell Altieri: 8.7
3. Sadyk Sher-Niyaz: 8.7
4. Cary Bell: 8.7
5. Mike Mayhall: 8.6
6. Charles Chaplin: 8.6
7. Raja Menon: 8.5
8. Ron Fricke: 8.5
9. Damien Chazelle: 8.5
10. Majid Majidi: 8.5

These directors appear to have the highest average IMDB scores in the dataset. It suggests that movies directed by
them, on average, tend to receive very high ratings.

Overall Statistics:
Maximum IMDB Score: 9.5 (This aligns with the top score of John Blanchard).
Minimum IMDB Score: 1.7
Percent Rank: 100% (This statistic is unclear in this context without knowing what it's ranking. If it refers to the
director with the maximum score, then John Blanchard has a 100% rank within this list based on the highest
score).
Percentile: 9.5 (Similar to Percent Rank, if this refers to the score, it indicates that a score of 9.5 is at the highest
percentile of the IMDB scores in this dataset).

Insights on Director Contribution and Success:


Identification of High-Rated Directors: The list directly identifies directors whose filmographies (within this
dataset) are associated with high average IMDB scores. This could indicate a consistent ability to create well-
received movies.
Potential Indicators of Success: While average IMDB score is one metric of success, it's important to
consider the number of movies each director has in the dataset. A high average based on only one or two
movies might not be as indicative of consistent success as a high average across a larger body of work. This
information isn't provided in this snippet.
Benchmarking Excellence: The top scores (around 8.5 and above) can serve as a benchmark for directorial
success within this dataset.
Percentile Context: The percentile information (if related to the scores themselves) highlights the rarity of
scores at the higher end of the spectrum. A score of 9.5 represents the absolute highest rating achieved in this
dataset.

13
. BUDGET ANALYSIS: EXPLORE THE RELATIONSHIP BETWEEN MOVIE BUDGETS AND
THEIR FINANCIAL SUCCESS.
TASK: ANALYZE THE CORRELATION BETWEEN MOVIE BUDGETS AND GROSS
EARNINGS, AND IDENTIFY THE MOVIES WITH THE HIGHEST PROFIT MARGIN.

TOP 10 MOVIES(HIGHEST PROFIT MARGIN)

STATISTICS:

14
INSIGHTS

Correlation Analysis:
The calculated correlation between Gross earnings and Budget is 0.1124821. This indicates a very weak
positive correlation between a movie's budget and its gross earnings within this dataset. This suggests that a
higher budget does not necessarily guarantee significantly higher gross revenue.

Top Movies by Profit Margin (Highlighted Rows - Assuming Top 10):

The movies with the highest profit margins are :


1. Avatar (523505847)
2. Jurassic World (502177271)
3. Titanic (458672302)
4. Star Wars: Episode IV - A New Hope (449935665)
5. E.T. the Extra-Terrestrial (424449459)
6. The Avengers (403279547)
7. The Lion King ( 377783777)
8. Star Wars: Episode I - The Phantom Menace ( 359446777)
9. The Dark Knight ( 348316061)
10. The Hunger Games (329999255)

Key Insights:
Weak Budget-Gross Correlation: The near-zero correlation suggests that, in this dataset, the amount of money
invested in a movie's budget has a minimal linear relationship with how much it earns at the box office. Other
factors like genre, star power, marketing, and critical reception likely play a more significant role in determining
gross earnings.

High Profitability of Certain Films: The list of top movies by profit margin showcases films that were
exceptionally successful in generating revenue relative to their production costs.

"Avatar" Leads in Profit: "Avatar" stands out with the highest profit margin, significantly exceeding other films in
the top 10.

Franchise Success: Several movies from well-established franchises (Star Wars, Avengers, Jurassic Park, The
Hunger Games) appear in the top profit margin list, indicating the potential for high returns on investment for
successful franchises.

Classic and Modern Hits: The list includes both older classics ("Star Wars: Episode IV - A New Hope", "E.T.") and
more recent blockbusters ("Avatar", "Jurassic World", "The Avengers"), suggesting that high profitability isn't limited
to a specific era.

"Max_ProfitMargin" Identification: The analysis correctly identifies "Avatar" as the movie with the maximum
profit margin of 523505847.

15
SKILLS GAINED AND
RESULT
This project enhanced my data analysis skills,
leading to:
Improved Data Quality: Expertise in
identifying and handling
missing/inconsistent data for accurate
analysis.
Pivot Table Proficiency: Skillful creation
and interpretation of pivot tables to uncover
key trends.
Statistical Understanding: Enhanced
grasp of mean, median, mode for data-
driven decisions.
Effective Data Visualization: Improved
ability to create clear graphs/charts for
communicating findings.
Stronger Problem-Solving: Learned to
resolve common data issues (missing
values, inconsistent formats).
Advanced MS Excel Skills: Strengthened
abilities in data manipulation, graph
creation, and table generation.
Overall Project Outcome: Resulted in a more
rigorous and insightful data analysis, identifying
key patterns and providing more accurate,
data-driven insights for informed decision-
making.

16
LINK TO THE EXCEL SHEET
17

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy