Data Science Fundamentals
Data Science Fundamentals
Data Science Fundamentals
LAB EXERCISE
PROGRAMS AND OUTPUTS
EX:1 PACKAGES FOR DATA SCINCE IN PYTHON
OUTPUT :
AIM : WORKING WITH NUMPY ARRAYS
ALGORITHM :
STEP 1 :
STEP 2 :
STEP 3 :
STEP 4 :
STEP 5 :
PROGRAM :
import numpy as np
list_1 = [1, 2, 3, 4]
list_2 = [5, 6, 7, 8]
list_3 = [9, 10, 11, 12]
sample_array = np.array([list_1,
list_2,
list_3])
print("Numpy multi dimensional array in python\n",
sample_array)
OUTPUT :
Numpy multi dimensional array in python
[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]]
AIM : WORKING WITH PANDAS DATA FRAMES
ALGORTIHM :
STEP 1 :
STEP 2:
STEP 3 :
STEP 4 :
STEP 5 :
PROGRAM :
import pandas as pd
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Age':[27, 24, 22, 32],
'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
'Qualification':['Msc', 'MA', 'MCA', 'Phd']}
df = pd.DataFrame(data)
print(df[['Name', 'Qualification']])
OUTPUT :
EX.NO 5. USE THE DIABETES DATA SET FROM UCI
AND PIMA INDIANS DATE:DIABETES DATA SET FOR
PERFORMING THE FOLLOWING:
AIM:
To explore various commands for doing Univariate analytics on
the UCI AND PIMA
INDIANS DIABETES data set.
ALGORITHM:
STEP 1: Start the program
STEP 2: To download the UCI AND PIMA INDIANS DIABETES
data set using Kaggle.
STEP 3: To read data from UCI AND PIMA INDIANS DIABETES
data set.
STEP 4: To find the mean, median, mode, variance,
standard deviation, skewness and kurtosis in the
given excel data set package.
STEP 5: Display the output.
STEP 6: Stop the program.
PROGRAM:
import
pandas as pd
import numpy
as np
import matplotlib.pyplot
as plt import seaborn
as sns
sns.set_style('darkgrid')
%matplotlib inline
from matplotlib.ticker import
FormatStrFormatter import warnings
warnings.filterwarnings('ignore')
df =
pd.read_csv('C:/Users/kirub/Documents/Learning/Untitled
Folder/diabetes.csv') df.head()
df.shape
df.dtypes
df['Outcome']=df['Outcome'].astype('bool')
df.dtypes['Outcome']
df.info()
df.describ
e().T
#
displaying
df1
print(df1)
#mean
df.mean()
#median
df.median(
)
#mode df.mode()
#Variance
df.var()
#standard
deviation
df.std()
#
#kurtosis
df.kurtosis(axis=0,skipn
a=True)
df['Outcome'].kurtosis(axis=0,s
kipna=True) #skewness
# skewness along the
index axis df.skew(axis
= 0, skipna = True)
#Pregnancy variable
preg_proportion =
np.array(df['Pregnancies'].value_counts())
preg_month =
np.array(df['Pregnancies'].value_counts().index)
preg_proportion_perc =
np.array(np.round(preg_proportion/
sum(preg_proportion),3)*100,dtype=int)
preg =
pd.DataFrame({'month':preg_month,'count_of_preg_prop':preg_p
roportion,'percentage_pro portion':preg_proportion_perc})
preg.set_index(['month'],inplac
e=True) preg.head(10)
sns.countplot(data=df['Outcome'])
sns.distplot(df['Pregnancies'])
sns.boxplot(data=df['Pregnancies'])
OUTPUT:
AIM :
ALGORITHM :
STEP : 1
STEP : 2
STEP : 3
STEP : 4
STEP : 5
OUTPUT :
AIM :
ALGORITHM :
STEP : 1
STEP : 2
STEP : 3
STEP : 4
STEP : 5
OUTPUT :
AIM :
ALGORITHM :
STEP : 1
STEP : 2
STEP : 3
STEP : 4
STEP : 5
OUTPUT :
AIM :
ALGORITHM :
STEP : 1
STEP : 2
STEP : 3
STEP : 4
STEP : 5
OUTPUT :
AIM :
ALGORITHM :
STEP : 1
STEP : 2
STEP : 3
STEP : 4
STEP : 5
OUTPUT :
AIM :
ALGORITHM :
STEP : 1
STEP : 2
STEP : 3
STEP : 4
STEP : 5
OUTPUT :