EDA On FIFA Dataset: Importing Essential Libraries

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

9/7/2021 FIFA_DATASET(GRP - 3)

EDA on FIFA Dataset


Importing Essential Libraries
Problem Statement : To perform a complete EDA and furnish accurate Insights on the FIFA
19 dataset

In [1]: #Importing important libraries

import pandas as pd

import numpy as np

import seaborn as sns

import matplotlib.pyplot as mp

from bokeh.plotting import figure


from bokeh.io import show

Reading Data
In [2]: data = pd.read_csv("./Datasets/data-1.csv")

In [3]: #printing all the columns of the dataset

data.columns

Out[3]: Index(['Unnamed: 0', 'ID', 'Name', 'Age', 'Photo', 'Nationality', 'Flag',

'Overall', 'Potential', 'Club', 'Club Logo', 'Value', 'Wage', 'Special',

'Preferred Foot', 'International Reputation', 'Weak Foot',

'Skill Moves', 'Work Rate', 'Body Type', 'Real Face', 'Position',

'Jersey Number', 'Joined', 'Loaned From', 'Contract Valid Until',

'Height', 'Weight', 'LS', 'ST', 'RS', 'LW', 'LF', 'CF', 'RF', 'RW',

'LAM', 'CAM', 'RAM', 'LM', 'LCM', 'CM', 'RCM', 'RM', 'LWB', 'LDM',

'CDM', 'RDM', 'RWB', 'LB', 'LCB', 'CB', 'RCB', 'RB', 'Crossing',

'Finishing', 'HeadingAccuracy', 'ShortPassing', 'Volleys', 'Dribbling',

'Curve', 'FKAccuracy', 'LongPassing', 'BallControl', 'Acceleration',

'SprintSpeed', 'Agility', 'Reactions', 'Balance', 'ShotPower',

'Jumping', 'Stamina', 'Strength', 'LongShots', 'Aggression',

'Interceptions', 'Positioning', 'Vision', 'Penalties', 'Composure',

'Marking', 'StandingTackle', 'SlidingTackle', 'GKDiving', 'GKHandling',

'GKKicking', 'GKPositioning', 'GKReflexes', 'Release Clause'],

dtype='object')

In [4]: data.head()

Out[4]: Unnamed:
ID Name Age Photo Nationality
0

0 0 158023 L. Messi 31 https://cdn.sofifa.org/players/4/19/158023.png Argentina https:/

Cristiano
1 1 20801 33 https://cdn.sofifa.org/players/4/19/20801.png Portugal https:/
Ronaldo

Neymar
2 2 190871 26 https://cdn.sofifa.org/players/4/19/190871.png Brazil https:/
Jr

3 3 193080 De Gea 27 https://cdn.sofifa.org/players/4/19/193080.png Spain https:/

K. De
4 4 192985 27 https://cdn.sofifa.org/players/4/19/192985.png Belgium https
Bruyne

5 rows × 89 columns

localhost:8888/nbconvert/html/FIFA_DATASET(GRP - 3).ipynb?download=false 1/21


9/7/2021 FIFA_DATASET(GRP - 3)

Data Cleaning
In [5]: #checking if there are any missing values in rows

data.isnull().any(axis=1)

Out[5]: 0 True

1 True

2 True

3 True

4 True

...

18202 True

18203 True

18204 True

18205 True

18206 True

Length: 18207, dtype: bool

In [6]: #checking if there is any row having all the values missing

data.isnull().all(axis=1).sum()

Out[6]: 0

In [7]: data.isnull().sum()

Out[7]: Unnamed: 0 0

ID 0

Name 0

Age 0

Photo 0

...

GKHandling 48

GKKicking 48

GKPositioning 48

GKReflexes 48

Release Clause 1564

Length: 89, dtype: int64

In [8]: #sorting the missing values in rows in descending order

data.isnull().sum(axis=1).sort_values(ascending=False)

Out[8]: 13244 75

13267 75

13240 75

13265 75

13264 75

..

11377 1

11376 1

11375 1

11374 1

0 1

Length: 18207, dtype: int64

In [9]: #checking for the rows which have missing values greater than 50

data[data.isnull().sum(axis=1)>50]

Out[9]:

localhost:8888/nbconvert/html/FIFA_DATASET(GRP - 3).ipynb?download=false 2/21


9/7/2021 FIFA_DATASET(GRP - 3)

Unnamed:
ID Name Age Photo Nationality
0

13236 13236 177971 J. McNulty 33 https://cdn.sofifa.org/players/4/19/177971.png Scotland

13237 13237 195380 J. Barrera 29 https://cdn.sofifa.org/players/4/19/195380.png Nicaragua

13238 13238 139317 J. Stead 35 https://cdn.sofifa.org/players/4/19/139317.png England

13239 13239 240437 A. Semprini 20 https://cdn.sofifa.org/players/4/19/240437.png Italy

13240 13240 209462 R. Bingham 24 https://cdn.sofifa.org/players/4/19/209462.png England

K.
13241 13241 219702 21 https://cdn.sofifa.org/players/4/19/219702.png Poland
Dankowski

13242 13242 225590 I. Colman 23 https://cdn.sofifa.org/players/4/19/225590.png Argentina

13243 13243 233782 M. Feeney 19 https://cdn.sofifa.org/players/4/19/233782.png England

13244 13244 239158 R. Minor 30 https://cdn.sofifa.org/players/4/19/239158.png Denmark

13245 13245 242998 Klauss 21 https://cdn.sofifa.org/players/4/19/242998.png Brazil

13246 13246 244022 I. Sissoko 22 https://cdn.sofifa.org/players/4/19/244022.png France

13247 13247 189238 F. Hart 28 https://cdn.sofifa.org/players/4/19/189238.png Austria

L. Northern
13248 13248 211511 24 https://cdn.sofifa.org/players/4/19/211511.png
McCullough Ireland

13249 13249 224055 Li Yunqiu 27 https://cdn.sofifa.org/players/4/19/224055.png China PR

13250 13250 244535 F. Garcia 29 https://cdn.sofifa.org/players/4/19/244535.png Paraguay

R.
13251 13251 134968 34 https://cdn.sofifa.org/players/4/19/134968.png Belgium
Haemhouts

13252 13252 225336 E. Binaku 22 https://cdn.sofifa.org/players/4/19/225336.png Albania

13253 13253 171320 G. Miller 31 https://cdn.sofifa.org/players/4/19/171320.png Scotland

13254 13254 246328 A. Aidonis 17 https://cdn.sofifa.org/players/4/19/246328.png Germany

13255 13255 196921 L. Sowah 25 https://cdn.sofifa.org/players/4/19/196921.png Germany

13256 13256 202809 R. Deacon 26 https://cdn.sofifa.org/players/4/19/202809.png England

Jang Hyun Korea


13257 13257 226617 25 https://cdn.sofifa.org/players/4/19/226617.png
Soo Republic

Saudi
13258 13258 230713 A. Al Malki 23 https://cdn.sofifa.org/players/4/19/230713.png
Arabia

localhost:8888/nbconvert/html/FIFA_DATASET(GRP - 3).ipynb?download=false 3/21


9/7/2021 FIFA_DATASET(GRP - 3)

Unnamed:
ID Name Age Photo Nationality
0

13259 13259 234809 E. Guerrero 27 https://cdn.sofifa.org/players/4/19/234809.png Chile

13260 13260 246073 Hernáiz 20 https://cdn.sofifa.org/players/4/19/246073.png Spain

H. Al Saudi
13261 13261 221498 25 https://cdn.sofifa.org/players/4/19/221498.png
Mansour Arabia

13262 13262 244026 H. Paul 24 https://cdn.sofifa.org/players/4/19/244026.png Germany

13263 13263 244538 S. Bauer 25 https://cdn.sofifa.org/players/4/19/244538.png Austria

13264 13264 201019 M. Chergui 29 https://cdn.sofifa.org/players/4/19/201019.png France

13265 13265 221499 D. Gardner 28 https://cdn.sofifa.org/players/4/19/221499.png England

L.
13266 13266 237371 20 https://cdn.sofifa.org/players/4/19/237371.png Sweden
Bengtsson

13267 13267 242491 F. Jaramillo 22 https://cdn.sofifa.org/players/4/19/242491.png Colombia

13268 13268 153148 L. Garguła 37 https://cdn.sofifa.org/players/4/19/153148.png Poland

13269 13269 244540 S. Rivera 26 https://cdn.sofifa.org/players/4/19/244540.png Colombia

13270 13270 245564 Vinicius 19 https://cdn.sofifa.org/players/4/19/245564.png Brazil

F.
13271 13271 213821 26 https://cdn.sofifa.org/players/4/19/213821.png Chile
Sepúlveda

13272 13272 240701 L. Spence 22 https://cdn.sofifa.org/players/4/19/240701.png Scotland

13273 13273 242237 B. Lepistu 25 https://cdn.sofifa.org/players/4/19/242237.png Estonia

13274 13274 244029 A. Abruscia 27 https://cdn.sofifa.org/players/4/19/244029.png Italy

13275 13275 244541 E. González 23 https://cdn.sofifa.org/players/4/19/244541.png Venezuela

Saudi
13276 13276 211006 M. Al Amri 26 https://cdn.sofifa.org/players/4/19/211006.png
Arabia

13277 13277 215102 J. Rebolledo 26 https://cdn.sofifa.org/players/4/19/215102.png Chile

13278 13278 246078 C. Mamengi 17 https://cdn.sofifa.org/players/4/19/246078.png Netherlands

P.
13279 13279 239679 22 https://cdn.sofifa.org/players/4/19/239679.png Italy
Mazzocchi

13280 13280 244543 Y. Ammour 19 https://cdn.sofifa.org/players/4/19/244543.png France

Jwa Joon Korea


13281 13281 212800 27 https://cdn.sofifa.org/players/4/19/212800.png
Hyeop Republic

localhost:8888/nbconvert/html/FIFA_DATASET(GRP - 3).ipynb?download=false 4/21


9/7/2021 FIFA_DATASET(GRP - 3)

Unnamed:
ID Name Age Photo Nationality
0

13282 13282 231232 O. Marrufo 25 https://cdn.sofifa.org/players/4/19/231232.png Mexico

13283 13283 232256 Han Pengfei 25 https://cdn.sofifa.org/players/4/19/232256.png China PR

48 rows × 89 columns

In [10]: data.shape

Out[10]: (18207, 89)

In [11]: data = data[data.isnull().sum(axis=1)<=50]

In [12]: data.shape

Out[12]: (18159, 89)

In [13]: #checking for the missing values in columns

pd.set_option("max_rows",89)

data.isnull().sum()

Out[13]: Unnamed: 0 0

ID 0

Name 0

Age 0

Photo 0

Nationality 0

Flag 0

Overall 0

Potential 0

Club 241

Club Logo 0

Value 0

Wage 0

Special 0

Preferred Foot 0

International Reputation 0

Weak Foot 0

Skill Moves 0

Work Rate 0

Body Type 0

Real Face 0

Position 12

Jersey Number 12

Joined 1505

Loaned From 16895

Contract Valid Until 241

Height 0

Weight 0

LS 2037

ST 2037

RS 2037

LW 2037

LF 2037

CF 2037

RF 2037

RW 2037

localhost:8888/nbconvert/html/FIFA_DATASET(GRP - 3).ipynb?download=false 5/21


9/7/2021 FIFA_DATASET(GRP - 3)

LAM 2037

CAM 2037

RAM 2037

LM 2037

LCM 2037

CM 2037

RCM 2037

RM 2037

LWB 2037

LDM 2037

CDM 2037

RDM 2037

RWB 2037

LB 2037

LCB 2037

CB 2037

RCB 2037

RB 2037

Crossing 0

Finishing 0

HeadingAccuracy 0

ShortPassing 0

Volleys 0

Dribbling 0

Curve 0

FKAccuracy 0

LongPassing 0

BallControl 0

Acceleration 0

SprintSpeed 0

Agility 0

Reactions 0

Balance 0

ShotPower 0

Jumping 0

Stamina 0

Strength 0

LongShots 0

Aggression 0

Interceptions 0

Positioning 0

Vision 0

Penalties 0

Composure 0

Marking 0

StandingTackle 0

SlidingTackle 0

GKDiving 0

GKHandling 0

GKKicking 0

GKPositioning 0

GKReflexes 0

Release Clause 1516

dtype: int64

In [14]: x=data.isnull().sum()

y=(x/data.shape[0])*100

z={'Number of missing values':x,'Percentage of missing values':y}

df=pd.DataFrame(z,columns=['Number of missing values','Percentage of missing values'


df.sort_values(by='Percentage of missing values',ascending=False)

Out[14]: Number of missing values Percentage of missing values

Loaned From 16895 93.039264

LWB 2037 11.217578

LCM 2037 11.217578

localhost:8888/nbconvert/html/FIFA_DATASET(GRP - 3).ipynb?download=false 6/21


9/7/2021 FIFA_DATASET(GRP - 3)

Number of missing values Percentage of missing values

RS 2037 11.217578

LW 2037 11.217578

LF 2037 11.217578

CF 2037 11.217578

RF 2037 11.217578

RW 2037 11.217578

LAM 2037 11.217578

CAM 2037 11.217578

RAM 2037 11.217578

LM 2037 11.217578

CM 2037 11.217578

LS 2037 11.217578

RCM 2037 11.217578

RM 2037 11.217578

LDM 2037 11.217578

CDM 2037 11.217578

RDM 2037 11.217578

RWB 2037 11.217578

LB 2037 11.217578

LCB 2037 11.217578

CB 2037 11.217578

RCB 2037 11.217578

RB 2037 11.217578

ST 2037 11.217578

Release Clause 1516 8.348477

Joined 1505 8.287901

Club 241 1.327166

Contract Valid Until 241 1.327166

Jersey Number 12 0.066083

Position 12 0.066083

ShotPower 0 0.000000

Aggression 0 0.000000

LongShots 0 0.000000

Strength 0 0.000000

Stamina 0 0.000000

Jumping 0 0.000000

localhost:8888/nbconvert/html/FIFA_DATASET(GRP - 3).ipynb?download=false 7/21


9/7/2021 FIFA_DATASET(GRP - 3)

Number of missing values Percentage of missing values

Weight 0 0.000000

Balance 0 0.000000

Reactions 0 0.000000

Interceptions 0 0.000000

SprintSpeed 0 0.000000

Acceleration 0 0.000000

BallControl 0 0.000000

Agility 0 0.000000

Vision 0 0.000000

Positioning 0 0.000000

FKAccuracy 0 0.000000

Penalties 0 0.000000

Composure 0 0.000000

Marking 0 0.000000

StandingTackle 0 0.000000

SlidingTackle 0 0.000000

GKDiving 0 0.000000

GKHandling 0 0.000000

GKKicking 0 0.000000

GKPositioning 0 0.000000

GKReflexes 0 0.000000

LongPassing 0 0.000000

Volleys 0 0.000000

Curve 0 0.000000

Club Logo 0 0.000000

Real Face 0 0.000000

Body Type 0 0.000000

Work Rate 0 0.000000

Skill Moves 0 0.000000

Weak Foot 0 0.000000

International Reputation 0 0.000000

Preferred Foot 0 0.000000

Special 0 0.000000

Wage 0 0.000000

Value 0 0.000000

ID 0 0.000000

localhost:8888/nbconvert/html/FIFA_DATASET(GRP - 3).ipynb?download=false 8/21


9/7/2021 FIFA_DATASET(GRP - 3)

Number of missing values Percentage of missing values

Potential 0 0.000000

Dribbling 0 0.000000

Overall 0 0.000000

Flag 0 0.000000

Nationality 0 0.000000

Photo 0 0.000000

Age 0 0.000000

Name 0 0.000000

Crossing 0 0.000000

Finishing 0 0.000000

HeadingAccuracy 0 0.000000

ShortPassing 0 0.000000

Height 0 0.000000

Unnamed: 0 0 0.000000

In [15]: data=data.drop(['Loaned From'],axis=1)

In [16]: data.columns

Out[16]: Index(['Unnamed: 0', 'ID', 'Name', 'Age', 'Photo', 'Nationality', 'Flag',

'Overall', 'Potential', 'Club', 'Club Logo', 'Value', 'Wage', 'Special',

'Preferred Foot', 'International Reputation', 'Weak Foot',

'Skill Moves', 'Work Rate', 'Body Type', 'Real Face', 'Position',

'Jersey Number', 'Joined', 'Contract Valid Until', 'Height', 'Weight',

'LS', 'ST', 'RS', 'LW', 'LF', 'CF', 'RF', 'RW', 'LAM', 'CAM', 'RAM',

'LM', 'LCM', 'CM', 'RCM', 'RM', 'LWB', 'LDM', 'CDM', 'RDM', 'RWB', 'LB',

'LCB', 'CB', 'RCB', 'RB', 'Crossing', 'Finishing', 'HeadingAccuracy',

'ShortPassing', 'Volleys', 'Dribbling', 'Curve', 'FKAccuracy',

'LongPassing', 'BallControl', 'Acceleration', 'SprintSpeed', 'Agility',

'Reactions', 'Balance', 'ShotPower', 'Jumping', 'Stamina', 'Strength',

'LongShots', 'Aggression', 'Interceptions', 'Positioning', 'Vision',

'Penalties', 'Composure', 'Marking', 'StandingTackle', 'SlidingTackle',

'GKDiving', 'GKHandling', 'GKKicking', 'GKPositioning', 'GKReflexes',

'Release Clause'],

dtype='object')

In [17]: data.dtypes[data.isnull().any()]

Out[17]: Club object

Position object

Jersey Number float64

Joined object

Contract Valid Until object

LS object

ST object

RS object

LW object

LF object

CF object

RF object

RW object

LAM object

CAM object

localhost:8888/nbconvert/html/FIFA_DATASET(GRP - 3).ipynb?download=false 9/21


9/7/2021 FIFA_DATASET(GRP - 3)

RAM object

LM object

LCM object

CM object

RCM object

RM object

LWB object

LDM object

CDM object

RDM object

RWB object

LB object

LCB object

CB object

RCB object

RB object

Release Clause object

dtype: object

In [18]: #Player who have missing value in jersey number means that they donot have jersey nu
#missing values using mean,median or mode. So let's impute the missing value as NA

data['Jersey Number'].fillna('NA',inplace=True)

Imputing Missing Value Data


In [19]: data['Club']=data['Club'].fillna(data['Club'].mode()[0])

data['Position']=data['Position'].fillna(data['Position'].mode()[0])

data['Joined']=data['Joined'].fillna(data['Joined'].mode()[0])

data['Contract Valid Until']=data['Contract Valid Until'].fillna(data['Contract Vali


data['Release Clause']=data['Release Clause'].fillna(data['Release Clause'].mode()[0

In [20]: data['RB'].fillna(0,inplace=True)

data['RCB'].fillna(0,inplace=True)

data['CB'].fillna(0,inplace=True)

data['LCB'].fillna(0,inplace=True)

data['LB'].fillna(0,inplace=True)

data['RWB'].fillna(0,inplace=True)

data['RDM'].fillna(0,inplace=True)

data['CDM'].fillna(0,inplace=True)

data['LDM'].fillna(0,inplace=True)

data['LWB'].fillna(0,inplace=True)

data['RM'].fillna(0,inplace=True)

data['RCM'].fillna(0,inplace=True)

data['CM'].fillna(0,inplace=True)

data['LCM'].fillna(0,inplace=True)

data['LM'].fillna(0,inplace=True)

data['RAM'].fillna(0,inplace=True)

data['CAM'].fillna(0,inplace=True)

data['LAM'].fillna(0,inplace=True)

data['RW'].fillna(0,inplace=True)

data['RF'].fillna(0,inplace=True)

data['CF'].fillna(0,inplace=True)

data['LF'].fillna(0,inplace=True)

data['LW'].fillna(0,inplace=True)

data['RS'].fillna(0,inplace=True)

data['ST'].fillna(0,inplace=True)

data['LS'].fillna(0,inplace=True)

In [21]: data.isnull().sum().sum()

Out[21]: 0

In [22]: data.head()

localhost:8888/nbconvert/html/FIFA_DATASET(GRP - 3).ipynb?download=false 10/21


9/7/2021 FIFA_DATASET(GRP - 3)

Out[22]: Unnamed:
ID Name Age Photo Nationality
0

0 0 158023 L. Messi 31 https://cdn.sofifa.org/players/4/19/158023.png Argentina https:/

Cristiano
1 1 20801 33 https://cdn.sofifa.org/players/4/19/20801.png Portugal https:/
Ronaldo

Neymar
2 2 190871 26 https://cdn.sofifa.org/players/4/19/190871.png Brazil https:/
Jr

3 3 193080 De Gea 27 https://cdn.sofifa.org/players/4/19/193080.png Spain https:/

K. De
4 4 192985 27 https://cdn.sofifa.org/players/4/19/192985.png Belgium https
Bruyne

5 rows × 88 columns

In [23]: pd.set_option('max_rows', data.shape[0])

data.isnull().sum()

Out[23]: Unnamed: 0 0

ID 0

Name 0

Age 0

Photo 0

Nationality 0

Flag 0

Overall 0

Potential 0

Club 0

Club Logo 0

Value 0

Wage 0

Special 0

Preferred Foot 0

International Reputation 0

Weak Foot 0

Skill Moves 0

Work Rate 0

Body Type 0

Real Face 0

Position 0

Jersey Number 0

Joined 0

Contract Valid Until 0

Height 0

Weight 0

LS 0

ST 0

RS 0

LW 0

LF 0

CF 0

RF 0

RW 0

LAM 0

CAM 0

RAM 0

LM 0

LCM 0

CM 0

RCM 0

localhost:8888/nbconvert/html/FIFA_DATASET(GRP - 3).ipynb?download=false 11/21


9/7/2021 FIFA_DATASET(GRP - 3)

RM 0

LWB 0

LDM 0

CDM 0

RDM 0

RWB 0

LB 0

LCB 0

CB 0

RCB 0

RB 0

Crossing 0

Finishing 0

HeadingAccuracy 0

ShortPassing 0

Volleys 0

Dribbling 0

Curve 0

FKAccuracy 0

LongPassing 0

BallControl 0

Acceleration 0

SprintSpeed 0

Agility 0

Reactions 0

Balance 0

ShotPower 0

Jumping 0

Stamina 0

Strength 0

LongShots 0

Aggression 0

Interceptions 0

Positioning 0

Vision 0

Penalties 0

Composure 0

Marking 0

StandingTackle 0

SlidingTackle 0

GKDiving 0

GKHandling 0

GKKicking 0

GKPositioning 0

GKReflexes 0

Release Clause 0

dtype: int64

In [24]: # creating new features by aggregating the features

def defending(data):

return int(round((data[['Marking', 'StandingTackle',

'SlidingTackle']].mean()).mean()))

def general(data):

return int(round((data[['HeadingAccuracy', 'Dribbling', 'Curve',

'BallControl']].mean()).mean()))

def mental(data):

return int(round((data[['Aggression', 'Interceptions', 'Positioning',

'Vision','Composure']].mean()).mean()))

def passing(data):

return int(round((data[['Crossing', 'ShortPassing',

'LongPassing']].mean()).mean()))

def mobility(data):

return int(round((data[['Acceleration', 'SprintSpeed',

localhost:8888/nbconvert/html/FIFA_DATASET(GRP - 3).ipynb?download=false 12/21


9/7/2021 FIFA_DATASET(GRP - 3)

'Agility','Reactions']].mean()).mean()))

def power(data):

return int(round((data[['Balance', 'Jumping', 'Stamina',

'Strength']].mean()).mean()))

def rating(data):

return int(round((data[['Potential', 'Overall']].mean()).mean()))

def shooting(data):

return int(round((data[['Finishing', 'Volleys', 'FKAccuracy',

'ShotPower','LongShots', 'Penalties']].mean()).mean()

In [25]: # adding these categories to the data

data['Defending'] = data.apply(defending, axis = 1)

data['General'] = data.apply(general, axis = 1)

data['Mental'] = data.apply(mental, axis = 1)

data['Passing'] = data.apply(passing, axis = 1)

data['Mobility'] = data.apply(mobility, axis = 1)

data['Power'] = data.apply(power, axis = 1)

data['Rating'] = data.apply(rating, axis = 1)

data['Shooting'] = data.apply(shooting, axis = 1)

# lets check the column names in the data after adding new features

data.columns

Out[25]: Index(['Unnamed: 0', 'ID', 'Name', 'Age', 'Photo', 'Nationality', 'Flag',

'Overall', 'Potential', 'Club', 'Club Logo', 'Value', 'Wage', 'Special',

'Preferred Foot', 'International Reputation', 'Weak Foot',

'Skill Moves', 'Work Rate', 'Body Type', 'Real Face', 'Position',

'Jersey Number', 'Joined', 'Contract Valid Until', 'Height', 'Weight',

'LS', 'ST', 'RS', 'LW', 'LF', 'CF', 'RF', 'RW', 'LAM', 'CAM', 'RAM',

'LM', 'LCM', 'CM', 'RCM', 'RM', 'LWB', 'LDM', 'CDM', 'RDM', 'RWB', 'LB',

'LCB', 'CB', 'RCB', 'RB', 'Crossing', 'Finishing', 'HeadingAccuracy',

'ShortPassing', 'Volleys', 'Dribbling', 'Curve', 'FKAccuracy',

'LongPassing', 'BallControl', 'Acceleration', 'SprintSpeed', 'Agility',

'Reactions', 'Balance', 'ShotPower', 'Jumping', 'Stamina', 'Strength',

'LongShots', 'Aggression', 'Interceptions', 'Positioning', 'Vision',

'Penalties', 'Composure', 'Marking', 'StandingTackle', 'SlidingTackle',

'GKDiving', 'GKHandling', 'GKKicking', 'GKPositioning', 'GKReflexes',

'Release Clause', 'Defending', 'General', 'Mental', 'Passing',

'Mobility', 'Power', 'Rating', 'Shooting'],

dtype='object')

Data Visualization
In [26]: # lets check the Distribution of Scores of Different Skills

mp.rcParams['figure.figsize'] = (18, 12)

mp.subplot(2, 4, 1)

sns.histplot(data['Defending'], color = 'red')

mp.grid()

mp.subplot(2, 4, 2)

sns.histplot(data['General'], color = 'black')

mp.grid()

mp.subplot(2, 4, 3)

sns.histplot(data['Mental'], color = 'red')

mp.grid()

mp.subplot(2, 4, 4)

sns.histplot(data['Passing'], color = 'black')

mp.grid()

localhost:8888/nbconvert/html/FIFA_DATASET(GRP - 3).ipynb?download=false 13/21


9/7/2021 FIFA_DATASET(GRP - 3)

mp.subplot(2, 4, 5)

sns.histplot(data['Mobility'], color = 'red')

mp.grid()

mp.subplot(2, 4, 6)

sns.histplot(data['Power'], color = 'black')

mp.grid()

mp.subplot(2, 4, 7)

sns.histplot(data['Shooting'], color = 'red')

mp.grid()

mp.subplot(2, 4, 8)

sns.histplot(data['Rating'], color = 'black')

mp.grid()

mp.suptitle('Score Distributions for Different Abilities')

mp.show()

In [27]: # comparison of preferred foot over the different players

mp.rcParams['figure.figsize'] = (8, 3)

sns.countplot(x = data['Preferred Foot'], palette = 'pink')

mp.title('Most Preferred Foot of the Players', fontsize = 20)

mp.show()

localhost:8888/nbconvert/html/FIFA_DATASET(GRP - 3).ipynb?download=false 14/21


9/7/2021 FIFA_DATASET(GRP - 3)

In [28]: data[data['International Reputation'] == 5][['Name','Nationality',

'Overall']].sort_values(by = 'Overall',

ascending = False).style.background_gradient

Out[28]: Name Nationality Overall

0 L. Messi Argentina 94

1 Cristiano Ronaldo Portugal 94

2 Neymar Jr Brazil 92

7 L. Suárez Uruguay 91

22 M. Neuer Germany 89

109 Z. Ibrahimović Sweden 85

In [29]: # different positions acquired by the players

mp.figure(figsize = (13, 15))

ax = sns.countplot(x = 'Position', data = data, palette = 'mako')

ax.set_xlabel(xlabel = 'Different Positions in Football', fontsize = 16)

ax.set_ylabel(ylabel = 'Count of Players', fontsize = 16)

ax.set_title(label = 'Comparison of Positions and Players', fontsize = 20)

mp.show()

localhost:8888/nbconvert/html/FIFA_DATASET(GRP - 3).ipynb?download=false 15/21


9/7/2021 FIFA_DATASET(GRP - 3)

Skewness of Age
Run the below cell to get a visualisation of the skewness of players

In [30]: hist, edges = np.histogram(data['Age'], density=True, bins = 20)

Age = figure(

x_axis_label = 'Age of Players',

title = 'Distribution of Age of Players'

Age.quad(

bottom = 0,

top = hist,

left = edges[:-1],

right = edges[1:],

line_color = 'white'

localhost:8888/nbconvert/html/FIFA_DATASET(GRP - 3).ipynb?download=false 16/21


9/7/2021 FIFA_DATASET(GRP - 3)

show(Age)

print("Skewness of age is", data['Age'].skew())

Skewness of age is 0.39140133751189277

In [31]: print("The age of the youngest player is", data['Age'].min())

print("The age of the oldest player is", data['Age'].max())

The age of the youngest player is 16

The age of the oldest player is 45

In [32]: data.loc[data['Age'] == data['Age'].min()]

Out[32]: Unnamed:
ID Name Age Photo Nationali
0

11457 11457 241266 W. Geubbels 16 https://cdn.sofifa.org/players/4/19/241266.png Franc

11732 11732 244403 A. Taoui 16 https://cdn.sofifa.org/players/4/19/244403.png Franc

12496 12496 245616 Pelayo Morilla 16 https://cdn.sofifa.org/players/4/19/245616.png Spa

12828 12828 246465 Guerrero 16 https://cdn.sofifa.org/players/4/19/246465.png Spa

13293 13293 246594 H. Massengo 16 https://cdn.sofifa.org/players/4/19/246594.png Franc

Y.
13567 13567 246419 16 https://cdn.sofifa.org/players/4/19/246419.png Belgiu
Verschaeren

15363 15363 245015 Y. Roemer 16 https://cdn.sofifa.org/players/4/19/245015.png Netherland

15746 15746 243169 Y. Begraoui 16 https://cdn.sofifa.org/players/4/19/243169.png Franc

15793 15793 241650 J. Lahne 16 https://cdn.sofifa.org/players/4/19/241650.png Swede

16081 16081 241552 J. Italiano 16 https://cdn.sofifa.org/players/4/19/241552.png Austra

16254 16254 244728 S. Steijn 16 https://cdn.sofifa.org/players/4/19/244728.png Netherland

16418 16418 242531 J. Kitolano 16 https://cdn.sofifa.org/players/4/19/242531.png Norwa

16544 16544 240562 D. Adshead 16 https://cdn.sofifa.org/players/4/19/240562.png Englan

16927 16927 243646 B. Nygren 16 https://cdn.sofifa.org/players/4/19/243646.png Swede

17091 17091 243269 A. Doğan 16 https://cdn.sofifa.org/players/4/19/243269.png Turke

Unite
17115 17115 245341 C. Bassett 16 https://cdn.sofifa.org/players/4/19/245341.png
Stat

17175 17175 243353 B. Mumba 16 https://cdn.sofifa.org/players/4/19/243353.png Englan

17177 17177 242074 R. Gómez 16 https://cdn.sofifa.org/players/4/19/242074.png Argentin

17200 17200 245171 H. Andersson 16 https://cdn.sofifa.org/players/4/19/245171.png Swede

P. Samiec-
17263 17263 245485 16 https://cdn.sofifa.org/players/4/19/245485.png Polan
Talar

17354 17354 242240 L. D'Arrigo 16 https://cdn.sofifa.org/players/4/19/242240.png Austra

17712 17712 245470 K. Broda 16 https://cdn.sofifa.org/players/4/19/245470.png Polan

17743 17743 244752 J. Olstad 16 https://cdn.sofifa.org/players/4/19/244752.png Norwa

17751 17751 245537 E. Ceide 16 https://cdn.sofifa.org/players/4/19/245537.png Norwa

localhost:8888/nbconvert/html/FIFA_DATASET(GRP - 3).ipynb?download=false 17/21


9/7/2021 FIFA_DATASET(GRP - 3)

Unnamed:
ID Name Age Photo Nationali
0

Ne
17753 17753 245539 B. Waine 16 https://cdn.sofifa.org/players/4/19/245539.png
Zealan

Northe
17757 17757 246315 L. Smyth 16 https://cdn.sofifa.org/players/4/19/246315.png
Irelan

M.
17776 17776 244553 16 https://cdn.sofifa.org/players/4/19/244553.png Austr
Köstenbauer

17808 17808 245123 A. Mahlonoko 16 https://cdn.sofifa.org/players/4/19/245123.png South Afric

F.
17841 17841 244925 16 https://cdn.sofifa.org/players/4/19/244925.png Austr
Tauchhammer

17881 17881 242165 R. Hauge 16 https://cdn.sofifa.org/players/4/19/242165.png Norwa

17887 17887 246009 M. Tilio 16 https://cdn.sofifa.org/players/4/19/246009.png Austra

17921 17921 246601 J. Rowland 16 https://cdn.sofifa.org/players/4/19/246601.png Englan

17976 17976 243891 M. Larsen 16 https://cdn.sofifa.org/players/4/19/243891.png Denma

18003 18003 239594 J. Imbrechts 16 https://cdn.sofifa.org/players/4/19/239594.png Swede

Republic
18018 18018 243722 B. O'Gorman 16 https://cdn.sofifa.org/players/4/19/243722.png
Irelan

18044 18044 246109 K. Lara 16 https://cdn.sofifa.org/players/4/19/246109.png Colomb

Republic
18106 18106 243685 J. Cleary 16 https://cdn.sofifa.org/players/4/19/243685.png
Irelan

Republic
18124 18124 245808 G. Hollywood 16 https://cdn.sofifa.org/players/4/19/245808.png
Irelan

18162 18162 243866 T. Gundelund 16 https://cdn.sofifa.org/players/4/19/243866.png Denma

18166 18166 243621 N. Ayéva 16 https://cdn.sofifa.org/players/4/19/243621.png Swede

18204 18204 241638 B. Worman 16 https://cdn.sofifa.org/players/4/19/241638.png Englan

18206 18206 246269 G. Nugent 16 https://cdn.sofifa.org/players/4/19/246269.png Englan

42 rows × 96 columns

In [33]: data.loc[data['Age'] == data['Age'].max()]

Out[33]: Unnamed:
ID Name Age Photo Nationality
0

O.
4741 4741 140029 45 https://cdn.sofifa.org/players/4/19/140029.png Mexico https
Pérez

1 rows × 96 columns

Ages of most rated players


In [34]: data.loc[data['Overall'] == data['Overall'].max()][['Name','Age', 'Overall']]

localhost:8888/nbconvert/html/FIFA_DATASET(GRP - 3).ipynb?download=false 18/21


9/7/2021 FIFA_DATASET(GRP - 3)

Out[34]: Name Age Overall

0 L. Messi 31 94

1 Cristiano Ronaldo 33 94

Seaborn - Heatmap
Relationship between columns
01 --> Direct proportion

00 --> No relationship

-1 --> İnverse proportion

In [35]: f,ax = mp.subplots(figsize=(25, 15))

sns.heatmap(data.corr(), annot=True, linewidths=0.5,linecolor="red", fmt= '.1f',ax=a


mp.show()

In [36]: some_clubs = ('CD Leganés', 'Southampton', 'RC Celta', 'Empoli', 'Fortuna Düsseldorf
'Tottenham Hotspur', 'FC Barcelona', 'Valencia CF', 'Chelsea', 'Real Ma

data_clubs = data.loc[data['Club'].isin(some_clubs) & data['Overall']]

mp.rcParams['figure.figsize'] = (15, 8)

ax = sns.boxplot(x = data_clubs['Club'], y = data_clubs['Overall'], palette = 'infer


ax.set_xlabel(xlabel = 'Some Popular Clubs', fontsize = 9)

ax.set_ylabel(ylabel = 'Overall Score', fontsize = 9)

ax.set_title(label = 'Distribution of Overall Score in Different popular Clubs', fon


mp.xticks(rotation = 90)

mp.grid()

mp.show()

localhost:8888/nbconvert/html/FIFA_DATASET(GRP - 3).ipynb?download=false 19/21


9/7/2021 FIFA_DATASET(GRP - 3)

In [37]: # Distribution of Wages in some Popular clubs

some_clubs = ('CD Leganés', 'Southampton', 'RC Celta', 'Empoli', 'Fortuna Düsseldorf


'Tottenham Hotspur', 'FC Barcelona', 'Valencia CF', 'Chelsea', 'Real Ma

data_club = data.loc[data['Club'].isin(some_clubs) & data['International Reputation'


mp.rcParams['figure.figsize'] = (16, 8)

ax = sns.boxenplot(x = 'Club', y = 'International Reputation', data = data_club, pal


ax.set_xlabel(xlabel = 'Names of some popular Clubs', fontsize = 10)

ax.set_ylabel(ylabel = 'Distribution of Reputation', fontsize = 10)

ax.set_title(label = 'Distribution of International Reputation in some Popular Clubs


mp.xticks(rotation = 90)

mp.grid()

mp.show()

localhost:8888/nbconvert/html/FIFA_DATASET(GRP - 3).ipynb?download=false 20/21


9/7/2021 FIFA_DATASET(GRP - 3)

localhost:8888/nbconvert/html/FIFA_DATASET(GRP - 3).ipynb?download=false 21/21

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy