0% found this document useful (0 votes)
5 views

PDA_Week_1-8 & 10-12.Ipynb - Colab

The document outlines the first three weeks of a Data Analysis with Python course, covering topics such as creating and manipulating arrays using NumPy, performing statistical operations, and applying logical comparisons. It includes practical examples of working with datasets, including heights and student marks, and demonstrates the use of various NumPy functions for data analysis. Additionally, it introduces basic data visualization techniques using Matplotlib.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

PDA_Week_1-8 & 10-12.Ipynb - Colab

The document outlines the first three weeks of a Data Analysis with Python course, covering topics such as creating and manipulating arrays using NumPy, performing statistical operations, and applying logical comparisons. It includes practical examples of working with datasets, including heights and student marks, and demonstrates the use of various NumPy functions for data analysis. Additionally, it introduces basic data visualization techniques using Matplotlib.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

10/20/24, 4:18 PM PDA_Week_1

Data Analysis with Python Week 1 Session

In [ ]: # Importing a package
# Import package_name as alias name
import numpy as np
# Create an array
# arrayname = alias.name.array([v1,v2,v3,..vn])
# 1-D array
a= np.array([1,2,3,4])
a

array([1, 2, 3, 4])
Out[ ]:

In [ ]: # 2-D array
# arrayname=aliasname.array([[d1][d2]])
b= np.array([[1,2,3,4,5,6,],[1,2,3,4,5,6,]])
b

array([[1, 2, 3, 4, 5, 6],
Out[ ]:
[1, 2, 3, 4, 5, 6]])

In [ ]: # 3-D array
# arrayname=aliasname.array([[d1][d2][d3]])
c= np.array([[1,2,3],[1,2,3,],[4,5,6],[8,9,10]])
c

array([[ 1, 2, 3],
Out[ ]:
[ 1, 2, 3],
[ 4, 5, 6],
[ 8, 9, 10]])

In [ ]: # Attributes --> dimensions,shape,size


# Dimensions- Number of dimensions--> ndim--> arrayname.ndim
a.ndim

1
Out[ ]:

In [ ]: b.ndim

2
Out[ ]:

In [ ]: c.ndim

2
Out[ ]:

In [ ]: # Size --> Finding total number of elements in an array --> arrayname.size


a.size

4
Out[ ]:

In [ ]: b.size

12
Out[ ]:

In [ ]: c.size

12
Out[ ]:

file:///C:/Users/Rayyan Khan/Downloads/Python Analysis/PDA HTML Files/PDA_Week_1.html 1/3


10/20/24, 4:18 PM PDA_Week_1

In [ ]: # Minimum --> min(arrayname) or np.min(a)


min(a)

1
Out[ ]:

In [ ]: np.min(a)

1
Out[ ]:

In [ ]: np.min(b)

1
Out[ ]:

In [ ]: np.min(c)

1
Out[ ]:

In [ ]: # Maximum --> max(arrayname) or np.max(a)


max(a)

4
Out[ ]:

In [ ]: np.max(a)

4
Out[ ]:

In [ ]: np.max(b)

6
Out[ ]:

In [ ]: np.max(c)

10
Out[ ]:

In [ ]: # Index position of min & max --> argmin() & argmax()


# argmin()--> Index pos of min
# argmax()--> Index pos of max
# Syntax --> aliasname.argmin(arrayname),aliasname.argmin(arrayname)
# Syntax --> aliasname.argmin(arrayname),aliasname.argmax(arrayname)
np.argmin(a)

0
Out[ ]:

In [ ]: np.argmax(a)

3
Out[ ]:

In [ ]: # Sum of array
# Syntax --> sum(arrayname) or np.sum(arrayname)
sum(a)

10
Out[ ]:

In [ ]: np.sum(a)

10
Out[ ]:

file:///C:/Users/Rayyan Khan/Downloads/Python Analysis/PDA HTML Files/PDA_Week_1.html 2/3


10/20/24, 4:18 PM PDA_Week_1

In [ ]: # Product of array
# Syntax --> np.prod(arrayname)
np.prod(a)

24
Out[ ]:

In [ ]: # Mean of array
# Syntax --> np.mean(arrayname)
np.mean(a)

2.5
Out[ ]:

In [ ]: # Median of array
# Syntax --> np.median(arrayname)
np.median(a)

2.5
Out[ ]:

In [ ]: # Variance of array
# Syntax --> np.var(arrayname)
np.var(a)

1.25
Out[ ]:

In [ ]: # Height -- Dataset -- Application


import numpy as np
import pandas as pd
data = pd.read_csv(r"/content/heights.csv")
heights = np.array(data['height'])
print(heights)
# Statistical metrics of the Heights Dataset
print("Minimum Height:", heights.min())
print("Sum:",np.sum(heights))
print("Min Index:",np.argmin(heights))
print("Max Index:",np.argmax(heights))
print("Variance:", np.var(heights))
print("Average Height:",heights.mean())
print("Maximum Height:", heights.max())
print("25th Percentile:",np.percentile(heights,25))
print("Median:",np.median(heights))
print("75th Percentile:",np.percentile(heights,75))

Out[ ]:
[74.42443878 65.53754283 63.62919774 ... 63.66416353 71.9258358
68.36848621]
Minimum Height: 57.5032186105382
Sum: 79762.86328320228
Min Index: 428
Max Index: 576
Variance: 14.8406074828533
Average Height: 66.91515376107574
Maximum Height: 77.0512818135321
25th Percentile: 64.0097456309595
Median: 66.4512652109843
75th Percentile: 69.84810005291368

file:///C:/Users/Rayyan Khan/Downloads/Python Analysis/PDA HTML Files/PDA_Week_1.html 3/3


10/20/24, 4:29 PM PDA_Week_2

Data Analysis with Python Week 2 Session

In [ ]: # Importing a package
# Import package_name as alias name
import numpy as np
# Create an array
# arrayname = alias.name.array([v1,v2,v3,..vn])
# 1-D array
a= np.array([1,2,3,4,5])
print(a)
# Comparisions --> <,<=,>,>=,==,!=
# aliasname.comparision(array_1,array_2)
# Less Than Operator
print(np.less(a,3))
print(a<3)
# Less Than and Equal to Operator
print(np.less_equal(a,3))
print(a<=3)
# Greater Than Operator
print(np.greater(a,4))
print(a>4)
# Greater Than and Equal to Operator
print(np.greater_equal(a,4))
print(a>=4)
# Equal To Operator
print(np.equal(a,3))
print(a==3)
# Not Equal To Operator
print(np.not_equal(a,4))
print(a!=4)

[1 2 3 4 5]
[ True True False False False]
[ True True False False False]
[ True True True False False]
[ True True True False False]
[False False False False True]
[False False False False True]
[False False False True True]
[False False False True True]
[False False True False False]
[False False True False False]
[ True True True False True]
[ True True True False True]

In [ ]: # Boolean Logic Operators --> and(&),or(|),not(~)


b= np.array([1,2,3,4,5])
print(b)
# Logical AND Operator
print(np.logical_and(a,b))
print(a & b)
c= (a<b)&(a>b)
c
# Logical OR Operator
print(np.logical_or(a,b))
print(a | b)
d= (a<b)|(a>b)
d
# Logical NOT Operator
print(np.logical_not(a))
print(~a)

file:///C:/Users/Rayyan Khan/Downloads/PDA_Week_2.html 1/3


10/20/24, 4:29 PM PDA_Week_2
[1 2 3 4 5]
[ True True True True True]
[1 2 3 4 5]
[ True True True True True]
[1 2 3 4 5]
[False False False False False]
[-2 -3 -4 -5 -6]

In [1]: # Importing a package


# Import package_name as alias name
import numpy as np
# Create an array
# arrayname = alias.name.array([v1,v2,v3,..vn])
# 1-D array
a= np.array([1,2,3,4,5])
# Masking --> Manipulation of the array to remove the unwanted array elements
print(a[a<5])
print(a+3)
print(a*3)

[1 2 3 4]
[4 5 6 7 8]
[ 3 6 9 12 15]

In [ ]: import numpy as np
import pandas as pd
data = pd.read_csv(r"/content/Student_Marks.csv")
Marks = np.array(data['Marks'])
print(Marks)
# Comparisons
print("\nMarks less than 50:", np.less(marks, 50))
print("Marks less than or equal to 50:", np.less_equal(marks, 50))
print("Marks greater than 70:", np.greater(marks, 70))
print("Marks greater than or equal to 75:", np.greater_equal(marks, 75))
print("Marks equal to 72:", np.equal(marks, 72))
print("Marks not equal to 51:", np.not_equal(marks, 51))

# Boolean Logic Operations


pass_marks = np.array([50] * len(marks))
print("\nLogical AND (Marks and pass marks):", np.logical_and(marks, pass_marks))
print("Logical OR (Marks or pass marks):", np.logical_or(marks, pass_marks))
print("Logical NOT (Marks):", np.logical_not(marks))

# Masking
print("\nMarks less than 50 (masked):", marks[marks < 50])
print("Marks greater than 70 (masked):", marks[marks > 70])

# Operations on the dataset


print("\nMarks + 5:", marks + 5)
print("Marks * 2:", marks * 2)

file:///C:/Users/Rayyan Khan/Downloads/PDA_Week_2.html 2/3


10/20/24, 4:29 PM PDA_Week_2
[19.202 7.734 13.811 53.018 55.299 17.822 29.889 17.264 20.348 30.862
42.036 12.132 24.318 17.672 11.397 19.466 30.548 38.49 50.986 25.133
22.073 35.939 12.209 28.043 16.517 6.623 12.647 26.532 9.333 8.837
24.172 8.1 15.038 39.965 17.171 43.978 13.119 46.453 41.358 51.142
7.336 15.725 19.771 10.429 9.742 8.924 16.703 22.701 26.882 19.106
40.602 22.184 7.892 36.653 53.158 18.238 53.359 51.583 31.236 51.343
10.522 10.844 19.59 21.379 12.591 13.562 27.569 6.185 8.92 21.4
16.606 13.416 20.398 7.014 39.952 6.217 36.746 38.278 49.544 6.349
54.321 17.705 44.099 16.106 16.461 39.957 23.149 6.053 11.253 40.024
24.394 19.564 23.916 42.426 24.451 19.128 5.609 41.444 12.027 32.357]

Marks less than 50: [ True False False False True False False True False False]
Marks less than or equal to 50: [ True False False False True False False True F
alse False]
Marks greater than 70: [False True True False False False True False True Tru
e]
Marks greater than or equal to 75: [False False True False False False True Fals
e True True]
Marks equal to 72: [False True False False False False False False False False]
Marks not equal to 51: [ True True True False True True True True True Tru
e]

Logical AND (Marks and pass marks): [ True True True True True True True Tr
ue True True]
Logical OR (Marks or pass marks): [ True True True True True True True True
True True]
Logical NOT (Marks): [False False False False False False False False False False]

Marks less than 50 (masked): [45 33 48]


Marks greater than 70 (masked): [72 88 90 75 84]

Marks + 5: [50 77 93 56 38 72 95 53 80 89]


Marks * 2: [ 90 144 176 102 66 134 180 96 150 168]

file:///C:/Users/Rayyan Khan/Downloads/PDA_Week_2.html 3/3


10/20/24, 4:29 PM PDA_Week_3

Data Analysis with Python Week 3 Session

In [6]: # Importing a package


# Import package_name as alias name
import numpy as np
import matplotlib.pyplot as plt
# Creating a random number generator
# Syntax --> np.random.randomstate(seed_number)
rand = np.random.RandomState(42)
# Creating a random array of size 10 consisting of numbers between 0 and 100
x = rand.randint(100, size=10)
print(x)

[51 92 14 71 60 20 82 86 74 74]

In [9]: # Entering the index numbers to print the elements of that particular index
# Syntax --> ind = [i1,i2,i3,..in]
ind = [3, 7, 2]
# Printing the elements of the entered index numbers
# Syntax 1 --> arrayname[ind]
x[ind]

[71, 86, 14]

In [10]: # Syntax 2 --> print([arrayname[i1],arrayname[i2], ..arrayname[in]])


print([x[3], x[7], x[2]])

[71, 86, 14]

In [13]: # Entering the index numbers in 2-D array format


# Syntax --> ind = np.array([[i1, i2], [i3, i4], ..[in-1,in]])
ind = np.array([[3, 7], [4, 5]])
x[ind]

array([[71, 86],
Out[13]:
[60, 20]])

In [19]: '''Creating an array with n random elements between 0 and n-1, and then reshape
it into a matrix with r rows and c columns where n= r*c'''
# Syntax --> x = np.arange(n).reshape((r, c))
x = np.arange(12).reshape((3, 4))
x

array([[ 0, 1, 2, 3],
Out[19]:
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])

In [21]: ''' Creating a array using fancy indexing i.e first create a row array and a
column array, then use them to index into another array such that each
element corresponds to the matching indices in the row and column array'''
row = np.array([0, 1, 2])
col = np.array([2, 1, 3])
X[row, col]

array([ 2, 5, 11])
Out[21]:

In [22]: '''Creating an array using advanced indexing i.e reshapes the row indices into a
column and pairs them with the column indices to access specific elements
from the array, creating a grid-like retrieval pattern'''
# Syntax --> arrayname[row_arrayname[:, np.newaxis], coloumn_arrayname]
X[row[:, np.newaxis], col]

file:///C:/Users/Rayyan Khan/Downloads/PDA_Week_3.html 1/3


10/20/24, 4:29 PM PDA_Week_3
array([[ 2, 1, 3],
Out[22]:
[ 6, 5, 7],
[10, 9, 11]])

In [23]: '''Creating an array by indexing one specific row number with entered
coloumn numbers'''
# Syntax --> arrayname[row_number, [c1, c2, ...cn]]
X[2, [2, 0, 1]]

array([10, 8, 9])
Out[23]:

In [26]: '''Creating an array by indexing from specific row number till the last row
number with entered coloumn numbers'''
# Syntax --> arrayname[row_number:, [c1, c2, ...cn]]
X[1:, [2, 0, 1]]

array([[ 6, 4, 5],
Out[26]:
[10, 8, 9]])

In [30]: # Importing a package


# Import package_name as alias name
import numpy as np
# Creating a random number generator
# Syntax --> np.random.randomstate(seed_number)
rand = np.random.RandomState(42)
print(rand)
# Setting the mean vector of the distribution
mean = [0, 0]
# Defining a covariance 2-D array
cov = [[1, 2],
[2, 5]]
'''Generating n number of samples from a multivariate normal distribution with
the specified mean and covariance'''
# Syntax --> X = rand.multivariate_normal(mean, covariance, n)
X = rand.multivariate_normal(mean, cov, 100)
# Printing the shape of the multivariate normal distribution array
print(X.shape)

RandomState(MT19937)
(100, 2)

In [35]: '''Randomly selecting n number of unique indices from the range of X's first
dimension without repetition'''
# Syntax --> indices = np.random.choice(X.shape[0], n, replace=False)
indices = np.random.choice(X.shape[0], 20, replace=False)
print(indices)
'''Creating a array using fancy indexing i.e selecting the rows of X
corresponding to the chosen indices'''
selection = X[indices]
# Printing the shape of the resulting selection array
print(selection.shape)

[39 34 4 81 77 5 69 41 93 15 71 96 36 49 98 3 87 31 67 59]
(20, 2)

In [45]: '''From the multivariate normal distribution array scatter plotting where all
data points are plotted with semi-transparency'''
# Syntax --> plt.scatter(x, y, alpha=0.5)
'''Where:
x: Data for the x-axis. X[:, 0] means "all rows, first column" of array X
y: Data for the y-axis. X[:, 1] means "all rows, second column" of array X
alpha=0.5: Sets the transparency of the points to 50%'''
plt.scatter(X[:, 0], X[:, 1], alpha=0.5)
'''From the multivariate normal distribution array scatter plotting the selected

file:///C:/Users/Rayyan Khan/Downloads/PDA_Week_3.html 2/3


10/20/24, 4:29 PM PDA_Week_3
20 points are highlighted in larger, red markers'''
# Syntax --> plt.scatter(x, y, facecolor='r', s=150);
'''Where:
x: Data for the x-axis
selection[:, 0] means "all rows, first column" of selection
y: Data for the y-axis
selection[:, 1] means "all rows, second column" of selection
facecolor='r': Sets the face color of the points to red
s=150: Sets the size of the points'''
plt.scatter(selection[:, 0], selection[:, 1], facecolor='r', s=150);

file:///C:/Users/Rayyan Khan/Downloads/PDA_Week_3.html 3/3


10/20/24, 6:26 PM PDA_Week_4

Data Analysis with Python Week 4 Session

In [ ]: # Importing a package
# Import package_name as alias name
import numpy as np
# arrayname = alaisname.eye(value)
# eye() --> Creating an Identity Matrix
# Creating an Identiy Matrix Order 3
a= np.eye(3)
a

array([[1., 0., 0.],


Out[ ]:
[0., 1., 0.],
[0., 0., 1.]])

In [ ]: # Creating an Identiy Matrix Order 10


b= np.eye(10)
b

array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
Out[ ]:
[0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])

In [ ]: # identity() --> Creating an Identity Matrix


# arrayname = alaisname.identity(value)
# Creating an Identiy Matrix Order 5
c= np.identity(5)
c

array([[1., 0., 0., 0., 0.],


Out[ ]:
[0., 1., 0., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 0., 1., 0.],
[0., 0., 0., 0., 1.]])

In [ ]: # Creating an Identiy Matrix Order 8


d= np.identity(8)
d

array([[1., 0., 0., 0., 0., 0., 0., 0.],


Out[ ]:
[0., 1., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0., 0., 0.],
[0., 0., 0., 1., 0., 0., 0., 0.],
[0., 0., 0., 0., 1., 0., 0., 0.],
[0., 0., 0., 0., 0., 1., 0., 0.],
[0., 0., 0., 0., 0., 0., 1., 0.],
[0., 0., 0., 0., 0., 0., 0., 1.]])

file:///C:/Users/Rayyan Khan/Downloads/PDA_Week_4.html 1/1


10/20/24, 6:22 PM PDA_Week_5

Data Analysis with Python Week 5 Session

In [ ]: # Importing a package
# Import package_name as alias name
import numpy as np
#arrayname = aliasname.linespace(start,stop,num=value)
a = np.linspace(1,10,num=5)
a

array([ 1. , 3.25, 5.5 , 7.75, 10. ])


Out[ ]:

In [ ]: # Creating a vector of length 10 with values evenly distributed between 5 and 50


b = np.linspace(5,50,num=10)
b

array([ 5., 10., 15., 20., 25., 30., 35., 40., 45., 50.])
Out[ ]:

file:///C:/Users/Rayyan Khan/Downloads/Python Analysis/PDA HTML Files/PDA_Week_5.html 1/1


11/9/24, 4:24 PM PDA_Week_6

Data Analysis with Python Week 6 Session

In [17]: # Importing a package


# Import package_name as alias name
import pandas as pd
# Creating a Data Frame
''' Syntax --> df_name = pd.DataFrame({'C1': ['R1.1', 'R2.1', 'R3.1',..'Rn.1'],
'C2': ['R1.2', 'R2.2', 'R3.2',..'Rn.2']},
columns=['C1', 'C2']).set_index("C1")'''
df1 = pd.DataFrame({'Name': ['Peter', 'Paul', 'Mary'],
'Food': ['Fish', 'Beans', 'Bread']},
columns=['Name', 'Food']).set_index("Name")
df2 = pd.DataFrame({'Name': ['Mary', 'Joseph'],
'Drink': ['Wine', 'Beer']},
columns=['Name', 'Drink']).set_index("Name")
# Printing the Data Frame & adding a separation between the Data Frames
# Syntax --> print(df_name,end="\n1\n2..\nN")
print(df1,end="\n\n")
print(df2)

Food
Name
Peter Fish
Paul Beans
Mary Bread

Drink
Name
Mary Wine
Joseph Beer

In [18]: # Joining tow Data Frames considering Data Frame 1 as Main Data Frame
# Syntax --> print(df_name_1.join(df_name_2))
print(df1.join(df2))
# Alternative Method:
# Syntax --> df_name_1.join(df_name_2,how="left")
df1.join(df2,how="left")

Food Drink
Name
Peter Fish NaN
Paul Beans NaN
Mary Bread Wine
Out[18]: Food Drink

Name

Peter Fish NaN

Paul Beans NaN

Mary Bread Wine

In [19]: # Joining tow Data Frames considering Data Frame 2 as Main Data Frame
# Syntax --> df_name_1.join(df_name_2,how="right")
df1.join(df2,how="right")

file:///C:/Users/Rayyan Khan/Downloads/PDA_Week_6.html 1/2


11/9/24, 4:24 PM PDA_Week_6

Out[19]: Food Drink

Name

Mary Bread Wine

Joseph NaN Beer

In [21]: # Joining tow Data Frames considering common elements in both the Data Frames
# Syntax --> df_name_1.join(df_name_2,how="inner")
df1.join(df2,how="inner")

Out[21]: Food Drink

Name

Mary Bread Wine

In [22]: df3 = pd.DataFrame({'Name': ['Bob', 'Jake', 'Lisa', 'Sue'],


'Rank': [1, 2, 3, 4]}).set_index("Name")
df4 = pd.DataFrame({'Name': ['Bob', 'Jake', 'Lisa', 'Sue'],
'Rank': [3, 1, 4, 2]}).set_index("Name")
print(df3)
print(df4)
# Joining tow Data Frames considering C2 of both the Data Frames
# Syntax --> df_name_1.join(df_name_2,lsuffix="_L",rsuffix="_R")
df3.join(df4,lsuffix="_L",rsuffix="_R")

Rank
Name
Bob 1
Jake 2
Lisa 3
Sue 4
Rank
Name
Bob 3
Jake 1
Lisa 4
Sue 2
Out[22]: Rank_L Rank_R

Name

Bob 1 3

Jake 2 1

Lisa 3 4

Sue 4 2

file:///C:/Users/Rayyan Khan/Downloads/PDA_Week_6.html 2/2


11/9/24, 11:53 PM PDA_Week_7

Data Analysis with Python Week 7 Session

In [5]: # Importing a package


# Import package_name as alias name
import numpy as np
import pandas as pd
import seaborn as sns
# Loading a Data Frame
# Syntax --> variable_name = sns.load_dataset('dataset_name')
titanic = sns.load_dataset('titanic')
titanic

Out[5]: survived pclass sex age sibsp parch fare embarked class who

0 0 3 male 22.0 1 0 7.2500 S Third man

1 1 1 female 38.0 1 0 71.2833 C First woman

2 1 3 female 26.0 0 0 7.9250 S Third woman

3 1 1 female 35.0 1 0 53.1000 S First woman

4 0 3 male 35.0 0 0 8.0500 S Third man

... ... ... ... ... ... ... ... ... ... ...

886 0 2 male 27.0 0 0 13.0000 S Second man

887 1 1 female 19.0 0 0 30.0000 S First woman

888 0 3 female NaN 1 2 23.4500 S Third woman

889 1 1 male 26.0 0 0 30.0000 C First man

890 0 3 male 32.0 0 0 7.7500 Q Third man

891 rows × 15 columns

In [17]: '''Calculating the average survival rate for each combination of passenger `sex`
and `class` on the Titanic and displays it in a table format'''
# Syntax --> titanic.groupby(['C1', 'C2'])['C3'].aggregate('mean').unstack()
titanic.groupby(['sex', 'class'])['survived'].aggregate('mean').unstack()

Out[17]: class First Second Third

sex

female 0.968085 0.921053 0.500000

male 0.368852 0.157407 0.135447

In [16]: # Using Pivot Table


# Syntax --> titanic.pivot_table(values='C3', index='C1', columns='C2')
titanic.pivot_table(values= 'survived', index='sex', columns='class')

file:///C:/Users/Rayyan Khan/Downloads/PDA_Week_7.html 1/3


11/9/24, 11:53 PM PDA_Week_7

Out[16]: class First Second Third

sex

female 0.968085 0.921053 0.500000

male 0.368852 0.157407 0.135447

In [20]: # Using Multi-level Pivot Table


'''Calculating the average survival rate for each combination of passenger `sex`
and `class` on the Titanic of 3 particular age groups of 0, 18 and 80 and
displays it in a table format'''
# Syntax --> V1 = pd.cut(titanic['C1'], [A1, A2, A3,..An])
# Using Multi-level Pivot Table
'''Calculating the average survival rate for each combination of passenger `sex`
and `class` on the Titanic of 3 particular age groups of 0, 18 and 80 and
displays it in a table format'''
# Syntax --> V1 = pd.cut(titanic['C1'], [A1, A2, A3,..An])
age1 = pd.cut(titanic['age'], [0, 18, 80])
# Syntax --> titanic.pivot_table(values='C3', index=['C1', 'V1'], columns='C2')
titanic.pivot_table(values='survived', index=['sex', age1], columns='class')

Out[20]: class First Second Third

sex age

female (0, 18] 0.909091 1.000000 0.511628

(18, 80] 0.972973 0.900000 0.423729

male (0, 18] 0.800000 0.600000 0.215686

(18, 80] 0.375000 0.071429 0.133663

In [22]: '''Spliting the Titanic dataset's `fare` column into two equal-sized fare
categories: low and high'''
# Syntax --> V2 = pd.qcut(titanic['C4'], 2)
fare1 = pd.qcut(titanic['fare'], 2)
'''Createing a pivot table showing the average survival rate on the Titanic,
grouped by passenger `sex` and age category (`age1`) as rows, and by fare
category (`fare1`) and passenger `class` as columns'''
# Syntax --> titanic.pivot_table('C3', ['C1', 'V1'], ['V2', 'C2'])
titanic.pivot_table('survived', ['sex', age1], [fare1, 'class'])

Out[22]: fare (-0.001, 14.454] (14.454, 512.329]

class First Second Third First Second Third

sex age

female (0, 18] NaN 1.000000 0.714286 0.909091 1.000000 0.318182

(18, 80] NaN 0.880000 0.444444 0.972973 0.914286 0.391304

male (0, 18] NaN 0.000000 0.260870 0.800000 0.818182 0.178571

(18, 80] 0.0 0.098039 0.125000 0.391304 0.030303 0.192308

file:///C:/Users/Rayyan Khan/Downloads/PDA_Week_7.html 2/3


11/9/24, 11:53 PM PDA_Week_7

In [23]: '''Createing a pivot table that shows the total number of survivors (`survived`)
and the average fare (`fare`), grouped by `sex` as rows and `class` as
columns'''
''' Syntax --> titanic.pivot_table(index='C1', columns='C2',
aggfunc={'C3':sum, 'fare':'C4'})'''
titanic.pivot_table(index='sex', columns='class',
aggfunc={'survived':sum, 'fare':'mean'})

Out[23]: fare survived

class First Second Third First Second Third

sex

female 106.125798 21.970121 16.118810 91 70 72

male 67.226127 19.741782 12.661633 45 17 47

In [24]: '''Creating a pivot table displaying the average survival rate on the Titanic,
grouped by `sex` as rows and `class` as columns, with an extra row and column
labeled "All" (from `margins=True`) to show overall averages for each row and
column'''
# Syntax --> titanic.pivot_table('C3', index='C1', columns='C2', margins=True
titanic.pivot_table('survived', index='sex', columns='class', margins=True)

Out[24]: class First Second Third All

sex

female 0.968085 0.921053 0.500000 0.742038

male 0.368852 0.157407 0.135447 0.188908

All 0.629630 0.472826 0.242363 0.383838

file:///C:/Users/Rayyan Khan/Downloads/PDA_Week_7.html 3/3


11/10/24, 12:31 AM PDA_Week_8

Data Analysis with Python Week 8 Session

In [ ]: # Importing a package
# Import package_name as alias name
import pandas as pd
# Creating a Series Data Frame
# Syntax --> V1 = pd.Series(['R1, R2, R3,... Rn'])
monte = pd.Series(['Graham Chapman', 'John Cleese', 'Terry Gilliam',
'Eric Idle', 'Terry Jones', 'Michael Palin'])
print(monte)

0 Graham Chapman
1 John Cleese
2 Terry Gilliam
3 Eric Idle
4 Terry Jones
5 Michael Palin
dtype: object

In [ ]: # Implementing methods similar to Python String Methods


# Lowering the case of all the strings of the entered Series Data Frame
# Syntax --> V1.str.lower()
print(monte.str.lower())

0 graham chapman
1 john cleese
2 terry gilliam
3 eric idle
4 terry jones
5 michael palin
dtype: object

In [ ]: # Checking if any string of the Series Data Frame starts with the letter 'T'
# Syntax --> V1.str.startswith('T')
print(monte.str.startswith('T'))

0 False
1 False
2 True
3 False
4 True
5 False
dtype: bool

In [ ]: # Spliting the Series Data Frame into 2 column


# Syntax --> V1.str.split()
print(monte.str.split())

0 [Graham, Chapman]
1 [John, Cleese]
2 [Terry, Gilliam]
3 [Eric, Idle]
4 [Terry, Jones]
5 [Michael, Palin]
dtype: object

In [ ]: # Implementing methods using regular expressions


'''Extracting the first name from each by asking for a contiguous group of
characters at the beginning of each element'''

file:///C:/Users/Rayyan Khan/Downloads/PDA_Week_8.html 1/3


11/10/24, 12:31 AM PDA_Week_8

# Syntax --> V1.str.extract('([A-Za-z]+)', expand=False)


print(monte.str.extract('([A-Za-z]+)', expand=False))

0 Graham
1 John
2 Terry
3 Eric
4 Terry
5 Michael
dtype: object

In [ ]: '''Finding all names that start and end with a consonant, making use of the
start-of-string (^) and end-of-string ($) regular expression characters'''
# Syntax --> V1.str.findall(r'^[^AEIOU].*[^aeiou]$')
print(monte.str.findall(r'^[^AEIOU].*[^aeiou]$'))

0 [Graham Chapman]
1 []
2 [Terry Gilliam]
3 []
4 [Terry Jones]
5 [Michael Palin]
dtype: object

In [ ]: # Implementing miscellaneous methods


'''Vectorizing item access and slicing i.e. get a slice of the first three
characters of each array'''
# Syntax --> V1.str[0:n]
print(monte.str[0:3])

0 Gra
1 Joh
2 Ter
3 Eri
4 Ter
5 Mic
dtype: object

In [ ]: '''Extracting the last name of each entry'''


# Syntax --> V1.str.split().str.get(-1)
print(monte.str.split().str.get(-1))

0 Chapman
1 Cleese
2 Gilliam
3 Idle
4 Jones
5 Palin
dtype: object

In [ ]: # Implementing indicator variables in a Data Frame


# Creating a Data Frame
''' Syntax --> df_name = pd.DataFrame({'C1': ['R1.1', 'R2.1', 'R3.1',..'Rn.1'],
'C2': ['R1.2', 'R2.2', 'R3.2',..'Rn.2']},
columns=['C1', 'C2']).set_index("C1")'''
full_monte = pd.DataFrame({'name': monte,
'info': ['B|C|D', 'B|D', 'A|C',
'B|D', 'B|C', 'B|C|D']})
print(full_monte)

file:///C:/Users/Rayyan Khan/Downloads/PDA_Week_8.html 2/3


11/10/24, 12:31 AM PDA_Week_8

name info
0 Graham Chapman B|C|D
1 John Cleese B|D
2 Terry Gilliam A|C
3 Eric Idle B|D
4 Terry Jones B|C
5 Michael Palin B|C|D

In [ ]: '''Spliting the values in the `info` column by the `|` symbol and creating new
columns for each unique value, marking `1` if the value is present and `0` if
it's not.The get_dummies() routine lets you quickly split-out these indicator
variables into a DataFrame'''
# Syntax --> V2['info'].str.get_dummies('|')
full_monte['info'].str.get_dummies('|')

Out[ ]: A B C D

0 0 1 1 1

1 0 1 0 1

2 1 0 1 0

3 0 1 0 1

4 0 1 1 0

5 0 1 1 1

file:///C:/Users/Rayyan Khan/Downloads/PDA_Week_8.html 3/3


10/20/24, 6:23 PM PDA_Week_10

Data Analysis with Python Week 10 Session

In [ ]: # Importing a package
# Import package_name as alias name
import numpy as np
# Create an 2-D array
# arrayname=aliasname.array([[d1][d2]])
a=np.array([[1,2,3],[4,5,6],[6,7,8]])
a

array([[1, 2, 3],
Out[ ]:
[4, 5, 6],
[6, 7, 8]])

In [ ]: # Swapping the rows of a given array in reverse order


# Syntax --> reversed_arrayname = arrayname[::-1,::]
b=a[::-1,::]
b

array([[6, 7, 8],
Out[ ]:
[4, 5, 6],
[1, 2, 3]])

In [ ]: # Swapping the coloumns of a given array in reverse order


# Syntax --> reversed_arrayname = arrayname[:,::-1]
c=a[:,::-1]
c

array([[3, 2, 1],
Out[ ]:
[6, 5, 4],
[8, 7, 6]])

In [ ]: # Swapping both rows and coloumns of a given array in reverse order


# Syntax --> reversed_arrayname = arrayname[::-1,::-1]
d=a[::-1,::-1]
d

array([[8, 7, 6],
Out[ ]:
[6, 5, 4],
[3, 2, 1]])

file:///C:/Users/Rayyan Khan/Downloads/Python Analysis/PDA HTML Files/PDA_Week_10.html 1/1


10/20/24, 6:53 PM PDA_Week_11

Data Analysis with Python Week 11 Session

In [ ]: # Importing a package
# Import package_name as alias name
import numpy as np
# Create an 2-D array
# arrayname=aliasname.array([[d1][d2]]
a=np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
print(a)
'''Computing the mean, median, standard deviation, and variance of a given array
along the second axis'''
# Syntax --> np.mean(arrayname, axis=1)
# Syntax --> np.median(arrayname, axis=1)
# Syntax --> np.std(arrayname, axis=1)
# Syntax --> np.var(arrayname, axis=1)
mean = np.mean(a, axis=1)
median=np.median(a, axis=1)
std_dev = np.std(a, axis=1)
variance = np.var(a, axis=1)
'''Printing the mean, median, standard deviation, and variance of a given array
along the second axis'''
print("Mean along the second axis:", mean)
print("Median along the second axis:", median)
print("Standard deviation along the second axis:", std_dev)
print("Variance along the second axis:", variance)

array([[1., 0., 0.],


Out[ ]:
[0., 1., 0.],
[0., 0., 1.]])

In [9]: '''Computing the mean, median, standard deviation, and variance of a given array
along the first axis'''
# Syntax --> np.mean(arrayname, axis=0)
# Syntax --> np.median(arrayname, axis=0)
# Syntax --> np.std(arrayname, axis=0)
# Syntax --> np.var(arrayname, axis=0)
mean = np.mean(a, axis=0)
median=np.median(a, axis=0)
std_dev = np.std(a, axis=0)
variance = np.var(a, axis=0)
'''Printing the mean, median, standard deviation, and variance of a given array
along the first axis'''
print("Mean along the first axis:", mean)
print("Median along the first axis:", median)
print("Standard deviation along the first axis:", std_dev)
print("Variance along the first axis:", variance)

Mean along the first axis: [5. 6. 7. 8.]


Median along the first axis: [5. 6. 7. 8.]
Standard deviation along the first axis: [3.26598632 3.26598632 3.26598632 3.26598
632]
Variance along the first axis: [10.66666667 10.66666667 10.66666667 10.66666667]

file:///C:/Users/Rayyan Khan/Downloads/PDA_Week_11.html 1/1


10/27/24, 3:39 PM PDA_Week_12

Data Analysis with Python Week 12 Session

In [17]: '''Write a NumPy program to sort the Student ID with increasing Height of the
Students from given Students ID and Height. Print the integer indices that
describes the sort order by multiple columns and the sorted data.'''

# Importing a package
# Import package_name as alias name
import numpy as np

# Creating two array which contains Student IDs and there respective Heights
# arrayname = alias.name.array([v1,v2,v3,..vn])
# 1-D array
student_ID = np.array([101,102,103,104,105])
heights = np.array([5.5,6.1,5.8,5.7,6.0])

# Print the two array which contains Student IDs and there respective Heights
print("Student IDs:",student_ID)
print("Student's Heights",heights)
# Combine the data into a 2-D array
# Syntax --> data = np.column_stack((array_1, array_2))
data = np.column_stack((student_ID,heights))
print("Combined Data:\n",data)

# Sort the data by Height (Coloumn Index 1)


# Syntax --> np.argsort(data[:,1])
sorted_indices = np.argsort(data[:,1])
sorted_data = data[sorted_indices]

# Extract sorted student IDs and Heights


# Syntax --> sorted_data[:,0]
# Syntax --> sorted_data[:,1]
sorted_student_ID = sorted_data[:,0]
sorted_heights = sorted_data[:,1]

# Print the results


print("Sorted Indices (Based on Height):",sorted_indices)
print("Sorted Student IDs:",sorted_student_ID)
print("Sorted Student's Heights:",sorted_heights)

Student IDs: [101 102 103 104 105]


Student's Heights [5.5 6.1 5.8 5.7 6. ]
Combined Data:
[[101. 5.5]
[102. 6.1]
[103. 5.8]
[104. 5.7]
[105. 6. ]]
Sorted Indices (Based on Height): [0 3 2 4 1]
Sorted Student IDs: [101. 104. 103. 105. 102.]
Sorted Student's Heights: [5.5 5.7 5.8 6. 6.1]

In [18]: '''Write a NumPy program to sort the Employee ID with increasing Wages of the
Employees from given Employee ID and Wages. Print the integer indices that
describes the sort order by multiple columns and the sorted data.'''

# Importing a package
# Import package_name as alias name
import numpy as np

# Creating two array which contains Employee IDs and there respective Wages
# arrayname = alias.name.array([v1,v2,v3,..vn])

file:///C:/Users/Rayyan Khan/Downloads/PDA_Week_12.html 1/4


10/27/24, 3:39 PM PDA_Week_12
# 1-D array
employee_ID = np.array([201, 202, 203, 204, 205])
wages = np.array([50000, 60000, 55000, 52000, 58000])

# Printing the arrays which contains Employee IDs and there respective Wages
print("Employee IDs:", employee_ID)
print("Employee Wages:", wages)

# Combine the data into a 2-D array


data = np.column_stack((employee_ID, wages))
print("Combined Data:\n",data)

# Sort the data by Wages (Coloumn Index 1)


sorted_indices = np.argsort(data[:,1])
sorted_data = data[sorted_indices]

# Extract sorted Employee IDs and Wages


sorted_employee_ID = sorted_data[:,0]
sorted_wages = sorted_data[:,1]

# Print the results


print("Sorted Indices (Based on Wages):", sorted_indices)
print("Sorted Employee IDs:", sorted_employee_ID)
print("Sorted Employee Wages:", sorted_wages)

Employee IDs: [201 202 203 204 205]


Employee Wages: [50000 60000 55000 52000 58000]
Combined Data:
[[ 201 50000]
[ 202 60000]
[ 203 55000]
[ 204 52000]
[ 205 58000]]
Sorted Indices (Based on Wages): [0 3 2 4 1]
Sorted Employee IDs: [201 204 203 205 202]
Sorted Employee Wages: [50000 52000 55000 58000 60000]

In [19]: '''Write a NumPy program to sort the Book Names with increasing Prices of the
Books from given Book Names and Prices. Print the integer indices that
describes the sort order by multiple columns and the sorted data.'''

# Importing a package
# Import package_name as alias name
import numpy as np

# Creating two array which contains Book names and there respective Prices
# arrayname = alias.name.array([v1,v2,v3,..vn])
# 1-D array
book_names = np.array(["Book A", "Book B", "Book C", "Book D", "Book E"])
prices = np.array([250, 150, 300, 200, 180])

# Printing the arrays which contains Book names and there respective Prices
print("Book Names:", book_names)
print("Book Prices:", prices)

# Combine the data into a 2-D array


data = np.column_stack((book_names, prices))
print("Combined Data:\n", data)

# Sort the data by Prices (Coloumn Index 1)


sorted_indices = np.argsort(data[:, 1])
sorted_data = data[sorted_indices]

# Extract sorted Book Names and Prices

file:///C:/Users/Rayyan Khan/Downloads/PDA_Week_12.html 2/4


10/27/24, 3:39 PM PDA_Week_12
sorted_book_names = sorted_data[:, 0]
sorted_prices = sorted_data[:, 1]

# Print the results


print("Sorted Indices (Based on Prices):", sorted_indices)
print("Sorted Book Names:", sorted_book_names)
print("Sorted Book Prices:", sorted_prices)

Book Names: ['Book A' 'Book B' 'Book C' 'Book D' 'Book E']
Book Prices: [250 150 300 200 180]
Combined Data:
[['Book A' '250']
['Book B' '150']
['Book C' '300']
['Book D' '200']
['Book E' '180']]
Sorted Indices (Based on Prices): [1 4 3 0 2]
Sorted Book Names: ['Book B' 'Book E' 'Book D' 'Book A' 'Book C']
Sorted Book Prices: ['150' '180' '200' '250' '300']

In [15]: '''Write a NumPy program to sort the Product Names with increasing Prices of the
Products from given Product Names and Prices. Print the integer indices that
describes the sort order by multiple columns and the sorted data.'''

# Importing a package
# Import package_name as alias name
import numpy as np

# Creating two array which contains Product names and there respective Prices
# arrayname = alias.name.array([v1,v2,v3,..vn])
# 1-D array
product_names = np.array(["Product A", "Product B", "Product C", "Product D"])
prices = np.array([250, 150, 300, 200])

# Printing the arrays which contains Product names and there respective Prices
print("Product Names:", product_names)
print("Product Prices:", prices)

# Combine the data into a 2-D array


data = np.column_stack((product_names, prices))
print("Combined Data:\n", data)

# Sort the data by Prices (Coloumn Index 1)


sorted_indices = np.argsort(data[:, 1]) # Convert prices to float if needed
sorted_data = data[sorted_indices]

# Extract sorted Product Names and Prices


sorted_product_names = sorted_data[:, 0]
sorted_prices = sorted_data[:, 1]

# Print the results


print("Sorted Indices (Based on Prices):", sorted_indices)
print("Sorted Product Names:", sorted_product_names)
print("Sorted Product Prices:", sorted_prices)

file:///C:/Users/Rayyan Khan/Downloads/PDA_Week_12.html 3/4


10/27/24, 3:39 PM PDA_Week_12
Product Names: ['Product A' 'Product B' 'Product C' 'Product D']
Product Prices: [250 150 300 200]
Combined Data:
[['Product A' '250']
['Product B' '150']
['Product C' '300']
['Product D' '200']]
Sorted Indices (Based on Prices): [1 3 0 2]
Sorted Product Names: ['Product B' 'Product D' 'Product A' 'Product C']
Sorted Product Prices: ['150' '200' '250' '300']

file:///C:/Users/Rayyan Khan/Downloads/PDA_Week_12.html 4/4

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy