EDA On FIFA Dataset: Importing Essential Libraries
EDA On FIFA Dataset: Importing Essential Libraries
EDA On FIFA Dataset: Importing Essential Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as mp
Reading Data
In [2]: data = pd.read_csv("./Datasets/data-1.csv")
data.columns
'Height', 'Weight', 'LS', 'ST', 'RS', 'LW', 'LF', 'CF', 'RF', 'RW',
'LAM', 'CAM', 'RAM', 'LM', 'LCM', 'CM', 'RCM', 'RM', 'LWB', 'LDM',
dtype='object')
In [4]: data.head()
Out[4]: Unnamed:
ID Name Age Photo Nationality
0
Cristiano
1 1 20801 33 https://cdn.sofifa.org/players/4/19/20801.png Portugal https:/
Ronaldo
Neymar
2 2 190871 26 https://cdn.sofifa.org/players/4/19/190871.png Brazil https:/
Jr
K. De
4 4 192985 27 https://cdn.sofifa.org/players/4/19/192985.png Belgium https
Bruyne
5 rows × 89 columns
Data Cleaning
In [5]: #checking if there are any missing values in rows
data.isnull().any(axis=1)
Out[5]: 0 True
1 True
2 True
3 True
4 True
...
18202 True
18203 True
18204 True
18205 True
18206 True
In [6]: #checking if there is any row having all the values missing
data.isnull().all(axis=1).sum()
Out[6]: 0
In [7]: data.isnull().sum()
Out[7]: Unnamed: 0 0
ID 0
Name 0
Age 0
Photo 0
...
GKHandling 48
GKKicking 48
GKPositioning 48
GKReflexes 48
data.isnull().sum(axis=1).sort_values(ascending=False)
Out[8]: 13244 75
13267 75
13240 75
13265 75
13264 75
..
11377 1
11376 1
11375 1
11374 1
0 1
In [9]: #checking for the rows which have missing values greater than 50
data[data.isnull().sum(axis=1)>50]
Out[9]:
Unnamed:
ID Name Age Photo Nationality
0
K.
13241 13241 219702 21 https://cdn.sofifa.org/players/4/19/219702.png Poland
Dankowski
L. Northern
13248 13248 211511 24 https://cdn.sofifa.org/players/4/19/211511.png
McCullough Ireland
R.
13251 13251 134968 34 https://cdn.sofifa.org/players/4/19/134968.png Belgium
Haemhouts
Saudi
13258 13258 230713 A. Al Malki 23 https://cdn.sofifa.org/players/4/19/230713.png
Arabia
Unnamed:
ID Name Age Photo Nationality
0
H. Al Saudi
13261 13261 221498 25 https://cdn.sofifa.org/players/4/19/221498.png
Mansour Arabia
L.
13266 13266 237371 20 https://cdn.sofifa.org/players/4/19/237371.png Sweden
Bengtsson
F.
13271 13271 213821 26 https://cdn.sofifa.org/players/4/19/213821.png Chile
Sepúlveda
Saudi
13276 13276 211006 M. Al Amri 26 https://cdn.sofifa.org/players/4/19/211006.png
Arabia
P.
13279 13279 239679 22 https://cdn.sofifa.org/players/4/19/239679.png Italy
Mazzocchi
Unnamed:
ID Name Age Photo Nationality
0
48 rows × 89 columns
In [10]: data.shape
In [12]: data.shape
pd.set_option("max_rows",89)
data.isnull().sum()
Out[13]: Unnamed: 0 0
ID 0
Name 0
Age 0
Photo 0
Nationality 0
Flag 0
Overall 0
Potential 0
Club 241
Club Logo 0
Value 0
Wage 0
Special 0
Preferred Foot 0
International Reputation 0
Weak Foot 0
Skill Moves 0
Work Rate 0
Body Type 0
Real Face 0
Position 12
Jersey Number 12
Joined 1505
Height 0
Weight 0
LS 2037
ST 2037
RS 2037
LW 2037
LF 2037
CF 2037
RF 2037
RW 2037
LAM 2037
CAM 2037
RAM 2037
LM 2037
LCM 2037
CM 2037
RCM 2037
RM 2037
LWB 2037
LDM 2037
CDM 2037
RDM 2037
RWB 2037
LB 2037
LCB 2037
CB 2037
RCB 2037
RB 2037
Crossing 0
Finishing 0
HeadingAccuracy 0
ShortPassing 0
Volleys 0
Dribbling 0
Curve 0
FKAccuracy 0
LongPassing 0
BallControl 0
Acceleration 0
SprintSpeed 0
Agility 0
Reactions 0
Balance 0
ShotPower 0
Jumping 0
Stamina 0
Strength 0
LongShots 0
Aggression 0
Interceptions 0
Positioning 0
Vision 0
Penalties 0
Composure 0
Marking 0
StandingTackle 0
SlidingTackle 0
GKDiving 0
GKHandling 0
GKKicking 0
GKPositioning 0
GKReflexes 0
dtype: int64
In [14]: x=data.isnull().sum()
y=(x/data.shape[0])*100
RS 2037 11.217578
LW 2037 11.217578
LF 2037 11.217578
CF 2037 11.217578
RF 2037 11.217578
RW 2037 11.217578
LM 2037 11.217578
CM 2037 11.217578
LS 2037 11.217578
RM 2037 11.217578
LB 2037 11.217578
CB 2037 11.217578
RB 2037 11.217578
ST 2037 11.217578
Position 12 0.066083
ShotPower 0 0.000000
Aggression 0 0.000000
LongShots 0 0.000000
Strength 0 0.000000
Stamina 0 0.000000
Jumping 0 0.000000
Weight 0 0.000000
Balance 0 0.000000
Reactions 0 0.000000
Interceptions 0 0.000000
SprintSpeed 0 0.000000
Acceleration 0 0.000000
BallControl 0 0.000000
Agility 0 0.000000
Vision 0 0.000000
Positioning 0 0.000000
FKAccuracy 0 0.000000
Penalties 0 0.000000
Composure 0 0.000000
Marking 0 0.000000
StandingTackle 0 0.000000
SlidingTackle 0 0.000000
GKDiving 0 0.000000
GKHandling 0 0.000000
GKKicking 0 0.000000
GKPositioning 0 0.000000
GKReflexes 0 0.000000
LongPassing 0 0.000000
Volleys 0 0.000000
Curve 0 0.000000
Special 0 0.000000
Wage 0 0.000000
Value 0 0.000000
ID 0 0.000000
Potential 0 0.000000
Dribbling 0 0.000000
Overall 0 0.000000
Flag 0 0.000000
Nationality 0 0.000000
Photo 0 0.000000
Age 0 0.000000
Name 0 0.000000
Crossing 0 0.000000
Finishing 0 0.000000
HeadingAccuracy 0 0.000000
ShortPassing 0 0.000000
Height 0 0.000000
Unnamed: 0 0 0.000000
In [16]: data.columns
'LS', 'ST', 'RS', 'LW', 'LF', 'CF', 'RF', 'RW', 'LAM', 'CAM', 'RAM',
'LM', 'LCM', 'CM', 'RCM', 'RM', 'LWB', 'LDM', 'CDM', 'RDM', 'RWB', 'LB',
'Release Clause'],
dtype='object')
In [17]: data.dtypes[data.isnull().any()]
Position object
Joined object
LS object
ST object
RS object
LW object
LF object
CF object
RF object
RW object
LAM object
CAM object
RAM object
LM object
LCM object
CM object
RCM object
RM object
LWB object
LDM object
CDM object
RDM object
RWB object
LB object
LCB object
CB object
RCB object
RB object
dtype: object
In [18]: #Player who have missing value in jersey number means that they donot have jersey nu
#missing values using mean,median or mode. So let's impute the missing value as NA
data['Jersey Number'].fillna('NA',inplace=True)
data['Position']=data['Position'].fillna(data['Position'].mode()[0])
data['Joined']=data['Joined'].fillna(data['Joined'].mode()[0])
In [20]: data['RB'].fillna(0,inplace=True)
data['RCB'].fillna(0,inplace=True)
data['CB'].fillna(0,inplace=True)
data['LCB'].fillna(0,inplace=True)
data['LB'].fillna(0,inplace=True)
data['RWB'].fillna(0,inplace=True)
data['RDM'].fillna(0,inplace=True)
data['CDM'].fillna(0,inplace=True)
data['LDM'].fillna(0,inplace=True)
data['LWB'].fillna(0,inplace=True)
data['RM'].fillna(0,inplace=True)
data['RCM'].fillna(0,inplace=True)
data['CM'].fillna(0,inplace=True)
data['LCM'].fillna(0,inplace=True)
data['LM'].fillna(0,inplace=True)
data['RAM'].fillna(0,inplace=True)
data['CAM'].fillna(0,inplace=True)
data['LAM'].fillna(0,inplace=True)
data['RW'].fillna(0,inplace=True)
data['RF'].fillna(0,inplace=True)
data['CF'].fillna(0,inplace=True)
data['LF'].fillna(0,inplace=True)
data['LW'].fillna(0,inplace=True)
data['RS'].fillna(0,inplace=True)
data['ST'].fillna(0,inplace=True)
data['LS'].fillna(0,inplace=True)
In [21]: data.isnull().sum().sum()
Out[21]: 0
In [22]: data.head()
Out[22]: Unnamed:
ID Name Age Photo Nationality
0
Cristiano
1 1 20801 33 https://cdn.sofifa.org/players/4/19/20801.png Portugal https:/
Ronaldo
Neymar
2 2 190871 26 https://cdn.sofifa.org/players/4/19/190871.png Brazil https:/
Jr
K. De
4 4 192985 27 https://cdn.sofifa.org/players/4/19/192985.png Belgium https
Bruyne
5 rows × 88 columns
data.isnull().sum()
Out[23]: Unnamed: 0 0
ID 0
Name 0
Age 0
Photo 0
Nationality 0
Flag 0
Overall 0
Potential 0
Club 0
Club Logo 0
Value 0
Wage 0
Special 0
Preferred Foot 0
International Reputation 0
Weak Foot 0
Skill Moves 0
Work Rate 0
Body Type 0
Real Face 0
Position 0
Jersey Number 0
Joined 0
Height 0
Weight 0
LS 0
ST 0
RS 0
LW 0
LF 0
CF 0
RF 0
RW 0
LAM 0
CAM 0
RAM 0
LM 0
LCM 0
CM 0
RCM 0
RM 0
LWB 0
LDM 0
CDM 0
RDM 0
RWB 0
LB 0
LCB 0
CB 0
RCB 0
RB 0
Crossing 0
Finishing 0
HeadingAccuracy 0
ShortPassing 0
Volleys 0
Dribbling 0
Curve 0
FKAccuracy 0
LongPassing 0
BallControl 0
Acceleration 0
SprintSpeed 0
Agility 0
Reactions 0
Balance 0
ShotPower 0
Jumping 0
Stamina 0
Strength 0
LongShots 0
Aggression 0
Interceptions 0
Positioning 0
Vision 0
Penalties 0
Composure 0
Marking 0
StandingTackle 0
SlidingTackle 0
GKDiving 0
GKHandling 0
GKKicking 0
GKPositioning 0
GKReflexes 0
Release Clause 0
dtype: int64
def defending(data):
'SlidingTackle']].mean()).mean()))
def general(data):
'BallControl']].mean()).mean()))
def mental(data):
'Vision','Composure']].mean()).mean()))
def passing(data):
'LongPassing']].mean()).mean()))
def mobility(data):
'Agility','Reactions']].mean()).mean()))
def power(data):
'Strength']].mean()).mean()))
def rating(data):
def shooting(data):
'ShotPower','LongShots', 'Penalties']].mean()).mean()
# lets check the column names in the data after adding new features
data.columns
'LS', 'ST', 'RS', 'LW', 'LF', 'CF', 'RF', 'RW', 'LAM', 'CAM', 'RAM',
'LM', 'LCM', 'CM', 'RCM', 'RM', 'LWB', 'LDM', 'CDM', 'RDM', 'RWB', 'LB',
dtype='object')
Data Visualization
In [26]: # lets check the Distribution of Scores of Different Skills
mp.subplot(2, 4, 1)
mp.grid()
mp.subplot(2, 4, 2)
mp.grid()
mp.subplot(2, 4, 3)
mp.grid()
mp.subplot(2, 4, 4)
mp.grid()
mp.subplot(2, 4, 5)
mp.grid()
mp.subplot(2, 4, 6)
mp.grid()
mp.subplot(2, 4, 7)
mp.grid()
mp.subplot(2, 4, 8)
mp.grid()
mp.show()
mp.rcParams['figure.figsize'] = (8, 3)
mp.show()
'Overall']].sort_values(by = 'Overall',
ascending = False).style.background_gradient
0 L. Messi Argentina 94
2 Neymar Jr Brazil 92
7 L. Suárez Uruguay 91
22 M. Neuer Germany 89
mp.show()
Skewness of Age
Run the below cell to get a visualisation of the skewness of players
Age = figure(
Age.quad(
bottom = 0,
top = hist,
left = edges[:-1],
right = edges[1:],
line_color = 'white'
show(Age)
Out[32]: Unnamed:
ID Name Age Photo Nationali
0
Y.
13567 13567 246419 16 https://cdn.sofifa.org/players/4/19/246419.png Belgiu
Verschaeren
Unite
17115 17115 245341 C. Bassett 16 https://cdn.sofifa.org/players/4/19/245341.png
Stat
P. Samiec-
17263 17263 245485 16 https://cdn.sofifa.org/players/4/19/245485.png Polan
Talar
Unnamed:
ID Name Age Photo Nationali
0
Ne
17753 17753 245539 B. Waine 16 https://cdn.sofifa.org/players/4/19/245539.png
Zealan
Northe
17757 17757 246315 L. Smyth 16 https://cdn.sofifa.org/players/4/19/246315.png
Irelan
M.
17776 17776 244553 16 https://cdn.sofifa.org/players/4/19/244553.png Austr
Köstenbauer
F.
17841 17841 244925 16 https://cdn.sofifa.org/players/4/19/244925.png Austr
Tauchhammer
Republic
18018 18018 243722 B. O'Gorman 16 https://cdn.sofifa.org/players/4/19/243722.png
Irelan
Republic
18106 18106 243685 J. Cleary 16 https://cdn.sofifa.org/players/4/19/243685.png
Irelan
Republic
18124 18124 245808 G. Hollywood 16 https://cdn.sofifa.org/players/4/19/245808.png
Irelan
42 rows × 96 columns
Out[33]: Unnamed:
ID Name Age Photo Nationality
0
O.
4741 4741 140029 45 https://cdn.sofifa.org/players/4/19/140029.png Mexico https
Pérez
1 rows × 96 columns
0 L. Messi 31 94
1 Cristiano Ronaldo 33 94
Seaborn - Heatmap
Relationship between columns
01 --> Direct proportion
00 --> No relationship
In [36]: some_clubs = ('CD Leganés', 'Southampton', 'RC Celta', 'Empoli', 'Fortuna Düsseldorf
'Tottenham Hotspur', 'FC Barcelona', 'Valencia CF', 'Chelsea', 'Real Ma
mp.rcParams['figure.figsize'] = (15, 8)
mp.grid()
mp.show()
mp.rcParams['figure.figsize'] = (16, 8)
mp.grid()
mp.show()