0% found this document useful (0 votes)
30 views5 pages

Predictive Analysis 1 Assignment

The document analyzes data from a CSV file containing information about Bollywood movies. It loads the data into a Pandas dataframe and performs various analyses. It calculates genre counts, release month counts, returns of investment by release type, and explores correlations between box office collections, YouTube views/likes, and budget. Graphs of budget distribution, ROI by genre, and a heatmap of feature correlations are also generated.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views5 pages

Predictive Analysis 1 Assignment

The document analyzes data from a CSV file containing information about Bollywood movies. It loads the data into a Pandas dataframe and performs various analyses. It calculates genre counts, release month counts, returns of investment by release type, and explores correlations between box office collections, YouTube views/likes, and budget. Graphs of budget distribution, ROI by genre, and a heatmap of feature correlations are also generated.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

9/20/23, 10:55 PM Untitled19.

ipynb - Colaboratory

import pandas as pd

boll = pd.read_csv('/content/bollywood.csv')

boll.head(2)

Release
SlNo MovieName ReleaseTime Genre Budget BoxOfficeCollection YoutubeViews YoutubeLi
Date

18-Apr-
0 1 2 States LW Romance 36 104.0 8576361 26
14

4-Jan- Table No.


1 2 N Thriller 10 12.0 1087320 1
13 21

boll.Genre.value_counts()

Comedy 36
Drama 35
Thriller 26
Romance 25
Action 21
Thriller 3
Action 3
Name: Genre, dtype: int64

boll[['Genre','ReleaseTime']].value_counts()

Genre ReleaseTime
Drama N 24
Comedy N 23
Thriller N 20
Romance N 15
Action N 12
Drama HS 6
Comedy LW 5
HS 5
Thriller FS 4
Romance LW 4
Drama FS 4
Comedy FS 3
Action N 3
Romance FS 3
HS 3
Action LW 3
HS 3
FS 3
Thriller N 2
Thriller HS 1
LW 1
Drama LW 1
Thriller LW 1
dtype: int64

boll['month'] = pd.DatetimeIndex(boll['Release Date']).month

boll.head(1)

https://colab.research.google.com/drive/1yp6bmte8FJDIV36MybTtJcHeugjFQKF6#scrollTo=1hbjGmTcn40T&printMode=true 1/5
9/20/23, 10:55 PM Untitled19.ipynb - Colaboratory

Release
SlNo MovieName ReleaseTime Genre Budget BoxOfficeCollection YoutubeViews YoutubeLi
Date

18-Apr-
0 1 2 States LW Romance 36 104.0 8576361 26
14
boll['month'].value_counts()

1 20
3 19
5 18
7 16
2 16
4 11
9 10
6 10
11 10
10 9
8 8
12 2
Name: month, dtype: int64

boll[boll['Budget']>25][['MovieName','month']].value_counts()

MovieName month
2 States 4 1
Raja Natwarlal 8 1
Kill Dil 11 1
Kochadaiiyaan 5 1
Krrish 3 11 1
..
Highway 2 1
Himmatwala 3 1
Holiday 6 1
Humshakals 6 1
Zilla Ghaziabad 2 1
Length: 62, dtype: int64

boll['ROI']=(boll['BoxOfficeCollection']-boll['Budget'])/boll['Budget']

boll.nlargest(10,['ROI'])

https://colab.research.google.com/drive/1yp6bmte8FJDIV36MybTtJcHeugjFQKF6#scrollTo=1hbjGmTcn40T&printMode=true 2/5
9/20/23, 10:55 PM Untitled19.ipynb - Colaboratory

Release
SlNo MovieName ReleaseTime Genre Budget BoxOfficeCollection YoutubeViews Youtube
Date

26-Apr-
64 65 Aashiqui 2 N Romance 12 110.0 2926673
13

19-Dec-
89 90 PK HS Drama 85 735.0 13270623
14
boll.groupby('ReleaseTime')['ROI'].mean()
13-Sep- Grand
132 133 LW Comedy 35 298.0 1795640
13 Masti
ReleaseTime 20-Sep- The
135 0.973853
FS 136 N Drama 10 85.0 1064854
13 Lunchbox
HS 0.850867
LW 1.127205
14-Jun-
87 88 Fukrey N Comedy 5 36.2 227912
N 0.657722 13
Name: ROI, dtype: float64
5-Sep-
58 59 Mary Kom N Drama 15 104.0 6086811
14
import matplotlib.pyplot as plt
import128
seaborn 18-Oct-
129 as sn Shahid FS Drama 6 40.0 1148516
13
%matplotlib inline
import warnings Humpty
11-Jul-
warnings.filterwarnings('ignore')
37 38 Sharma Ki N Romance 20 130.0 6604595
14
plt.hist(boll['Budget']) Dulhania

Bhaag
12-Jul-
101 102 Milkha4., 4., 2.,
(array([64., 40.,1319., 11., N 2.,Drama 30
1., 2.]), 164.0 2635390
array([ 2. , 16.8, 31.6,Bhaag 46.4, 61.2, 76. , 90.8, 105.6, 120.4,
135.2, 150. ]),
9-Aug- Chennai
<BarContainer
115 116 object of 10 artists>) FS Comedy 75 395.0 1882346
13 Express

sn.distplot(boll['Budget'])

https://colab.research.google.com/drive/1yp6bmte8FJDIV36MybTtJcHeugjFQKF6#scrollTo=1hbjGmTcn40T&printMode=true 3/5
9/20/23, 10:55 PM Untitled19.ipynb - Colaboratory

<Axes: xlabel='Budget', ylabel='Density'>

sn.distplot(boll[boll['Genre']=='Comedy']['ROI'],color='g',label='comedy')
sn.distplot(boll[boll['Genre']=='Drama']['ROI'],color='r',label='drama')
plt.legend()

<matplotlib.legend.Legend at 0x7bea32b63880>

feature=['BoxOfficeCollection','YoutubeLikes']
boll[feature].corr()

BoxOfficeCollection YoutubeLikes

BoxOfficeCollection 1.000000 0.682517

YoutubeLikes 0.682517 1.000000

heatfeature=['Budget', 'BoxOfficeCollection','YoutubeViews','YoutubeLikes','YoutubeDislikes']
sn.heatmap(boll[heatfeature].corr(),annot= True)

https://colab.research.google.com/drive/1yp6bmte8FJDIV36MybTtJcHeugjFQKF6#scrollTo=1hbjGmTcn40T&printMode=true 4/5
9/20/23, 10:55 PM Untitled19.ipynb - Colaboratory

<Axes: >

https://colab.research.google.com/drive/1yp6bmte8FJDIV36MybTtJcHeugjFQKF6#scrollTo=1hbjGmTcn40T&printMode=true 5/5

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy