0% found this document useful (0 votes)

9 views96 pages

Numpy Merged

Uploaded by

hariprasadusha143

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views96 pages

Numpy Merged

Uploaded by

hariprasadusha143

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 96

NumPy Arrays and Vectorized Computation

0.0.1 NUMPY MODULE:

NumPy, short for Numerical Python, is a fundamental library for numerical computing in
Python. It provides powerful data structures, primarily the ndarray (n-dimensional array), which
enables efficient storage and manipulation of large datasets. With its support for multi-
dimensional arrays, NumPy allows users to perform complex mathematical operations with ease.
One of the key features of NumPy is its ability to perform element-wise operations on arrays,
which is significantly faster than using traditional Python lists. This efficiency stems from its
implementation in C, allowing for lower-level optimizations. NumPy also includes a comprehensive
set of mathematical functions that can operate on arrays, including linear algebra, Fourier
transforms, and random number generation. In addition to its array capabilities, NumPy provides
tools for integrating with other languages, such as C and Fortran, making it a versatile choice for
performance-critical applications. It serves as the backbone for many other scientific computing
libraries, including SciPy, pandas, and Matplotlib, establishing itself as an essential component
of the scientific Python ecosystem. NumPy’s array operations are broadcastable, meaning that
arrays of different shapes can still be used together in calculations, making it easier to handle
data of varying dimensions. This flexibility is particularly useful in data analysis and machine
learning tasks.

1 1. Numpy Arrays from Python DataStructures,Intrinsic

[1] Numpy
# To define a Objects
new ndarrayand
usingRandom Functions
array() method
: import numpy as np
2a =1.np.array([1,
3]) a
1 Arrays2,from python data structures

[1] : array([1, 2, 3])

[2] : #checking the type

type(a)
[2] : numpy.ndarray

[3] : #checking dtype

a.dtype

1
[3] : dtype('int32')

[4] : #ndim
a.ndim
[4]: 1

[5] : #size
a.size
[5]: 3

[6] : #shape
a.shape
[6]: (3,)

[7] : #numpy 1-D array

import numpy as np
a=[1,2,3,4,5]
b=np.array(a)
print(a)
[1, 2, 3, 4, 5]
[8] : #Creation of ndarrays using array() method
#2-D array
import numpy as np
x=[1,2,3]
y=[3,4,5]
z=np.array((x,y))
print(z)
[[1 2 3]
[3 4 5]]
[9] :
#Tuple 1-D
m=(1,2,3)
c=np.array(m)
print(c)
[1 2 3]
[10] : #Tuple 2-D array
import numpy as np
a=(1,2,3,4,5)
b=(6,7,8,9,1)
c=np.array((a,b))

2
print(c)

[[1 2 3 4 5]
[6 7 8 9 1]]
[11] : #set
a=[1,2,3,4,5]
c=set(a)
np.array(c)
[11] : array({1, 2, 3, 4, 5}, dtype=object)

[12] : #Set
l={1,2,3,4}
print(np.array(l))
{1, 2, 3, 4}
[13] : #dictionary
import numpy as np
dict={'a':1,'b':2,'c':3}
z=np.array(list(dict.items()
)) print(z)
a=np.array(list(dict.keys())
) print(a)
[['a' '1']
['b' '2']
['c' '3']]
['a' 'b' 'c']

3 1.2 Intirinsic Numpy Objects

[14] : Intrinsic NumPy objects are fundamental data structures provided by the NumPy library, which
are optimized for numerical computations and provide efficient operations on large datasets.
#using arange() method
a=np.array(np.arange(9))
print(a)
[0 1 2 3 4 5 6 7 8]
[15] : #Zeros() method
a=np.zeros(3)
print(a)
[0. 0. 0.]

3
[16] : b=np.zeros([3,3])
print(b)

[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
[17] : #zeros_like()
x=np.array([[1,3,7],[2,5,9]])
x
[17] : array([[1, 3, 7],
[2, 5, 9]])

[18] : d=np.zeros_like(x)
d

[18] : array([[0, 0, 0],

[0, 0, 0]])

[19] : #using ones() method

a=np.ones(4)
print(a)
[1. 1. 1. 1.]
[20] : b=np.ones([3,3])
print(b)

[[1. 1. 1.]
[1. 1. 1.]
[1. 1. 1.]]
[21] :
#using eye() method
a=np.eye(3)
print(a)
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
[22] :
c=np.eye(3,k=1)
print(c)
[[0. 1. 0.]
[0. 0. 1.]
[0. 0. 0.]]

4
[23] : #using identity() method
a=np.identity(3)
print(a)

[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
[24] :
#using full() method
d=np.full((2,2),7)
print(d)
[[7 7]
[7 7]]
[25] : np.arange(15)

[25] : array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])

[26] : x=np.arange(6,dtype=in
t) np.full_like(x,1)

[26] : array([1, 1, 1, 1, 1, 1])

[27] : c=np.full_like(x,0.1)
c

[27] : array([0, 0, 0, 0, 0, 0])

[28] : d=np.full_like(x,0.1,dtype=np.double)
d

[28] : array([0.1, 0.1, 0.1, 0.1, 0.1, 0.1])

[29] : #using empty() method

a=np.empty((2,3))
print(a)

[[0.1 0.1 0.1]

[0.1 0.1 0.1]]
[30] : np.empty((2, 3, 2))

[30]: array([[[1.05337787e-311, 2.86558075e-

322], [0.00000000e+000,
0.00000000e+000], [1.10343781e-312,
1.31370903e-076]],

[[5.20093491e-090, 5.69847262e-066],

5
[5.51292779e+169, 4.85649086e-033],
[6.48224659e+170, 5.82471487e+257]]])

[31]: #empty_like()
a=([1,2,3],[4,5,6])
np.empty_like(a)

[31]: array([[1730487296, 496, 0],

[ 0, 131074, 168442489]])

[32]: #using diag() mrthod

np.diag([1,2,3,4])

[32]: array([[1, 0, 0,
0],
[0, 2, 0, 0],
[0, 0, 3, 0],
[0, 0, 0, 4]])

[33]: #creting meshgrid()

x=np.array([1,2,3])
y=np.array([4,5,6])
x,y=np.meshgrid(x,y)
print(x)
print(y)
[[1 2 3]
[1 2 3]
[1 2 3]]
[[4 4 4]
[5 5 5]
[6 6 6]]

3.1 1.3 Random Functions

The random functions in NumPy are essential for simulations, statistical sampling, and generating
synthetic data. They help facilitate various operations in scientific computing, machine learning,
[34] and data analysis.
: #randint()method
from numpy import
random x =
random.randint(100)
print(x)
15

6
[35]: #choice() method
y=np.random.bytes(7)
print(y)
a=np.random.choice(['true','false'],size=(2,3))
print(a)
b"t'\n\x16\x14QB"
[['true' 'false'
'false']
[36] ['true' 'false' 'true']]
: #complex number
x = random.rand(1) +
random.rand(1)*1j print (x)
print(x.real)
print(x.imag)
[0.08421058+0.69654499j]
[0.08421058]
[0.69654499]
[37]
#complex number using the rand()method
:
x = random.rand(1,5) +
random.rand(1,5)*1j print (x)

[[0.29653563+0.94629414j 0.56539718+0.58965768j 0.83340819+0.82456817j

0.16209606+0.15309722j 0.92519953+0.01018444j]]
[38]
#random() method
:
np.random.random(size=(2,2))+1j*np.random.random(size=(2,2))

[38] : array([[0.90898124+0.87349692j, 0.64895681+0.87327894j],

[0.7544518 +0.122983j, 0.4716534 +0.77610277j]])

[39] : #permutation()
np.random.permutation(5)
[39] : array([0, 3, 4, 1, 2])

[40] : a=np.array(5)
b=np.random.choice(a,size=5,p=[0.1,0.2,0.3,0.2,0.2])
print(b)

[4 3 0 3 4]
[41] : #randint()
np.random.randint(1,5)
[41]: 3

7
[42] : #randn()
a=np.random.randn(1,10)
print(a)

[[ 0.08009351 1.04758386 -0.15977457 0.60779634 0.12686552 -2.29032851

-0.53667358 -0.69266066 1.42867051 -0.34056088]]
[43] :
#choice()
a=np.array(['apple','bananaa','cherry'])
b=np.random.choice(a)
print(b)
bananaa
[44] : #shuffle()
np.random.shuffle(a)
print(a)
['cherry' 'apple' 'bananaa']

4 2.Manipulation Of Numpy Arrays

5 2.1 Indexing
[45] : Indexing in NumPy refers to accessing individual elements or groups of elements within an array

#integer indexing
import numpy as np
x = np.array([[1, 2], [3, 4], [5,
6]]) y = x[[0,1,2], [0,1,0]]
print(x[0,1])
2
[46] : a=[3,4,5,6,7]
print(a[0])
3
[47] : arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
arr3d[0]
[47] : array([[1, 2, 3],
[4, 5, 6]])

[48] : #copy()
old_values = arr3d[0].copy()
arr3d[0] = 42

8
print(arr3d)

[[[42 42 42]
[42 42 42]]

[[ 7 8 9]
[49] : [10 11 12]]]
import numpy as np

arr = np.array([1, 2, 3, 4])

7
print(arr[2] + arr[3])
[50] : import numpy as np

arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])

print( arr[0, 1])

2
[51] : import numpy as np

arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])

print( arr[1, 4])

10
[52] : import numpy as np

arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

print(arr[0, 1, 2])
6
[53] : import numpy as np

arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])

print( arr[1, -1])

9
6 2.2 Slicing
Slicing in NumPy refers to the process of selecting a specific subset of elements from an array. It
allows you to create a new view of the original data without copying it, which can be very efficient
[54] : in terms of memory usage.
#slicing
import numpy as np
arr=np.array([5,6,7,8,9
]) print(arr[1:3])
[6 7]
[55] : import numpy as np
arr=np.array([5,6,7,3,6,8,9])
print(arr[1:])
[6 7 3 6 8 9]
[56] : import numpy as np
arr=np.array([1,2,3,4,5,8,9])
print(arr[1:])
[2 3 4 5 8 9]
[57] : arr=np.array([5,6,7,8,9
]) print(arr[:3])

[5 6 7]
[58] : arr=np.array([5,6,7,8,9])
print(arr[-3:-1])
[7 8]
[59] : arr=np.array([5,6,7,8,9
]) print(arr[:3])

[5 6 7]
[60] : arr=np.array([5,6,7,8,9
]) print(arr[:3])

[5 6 7]
[61] : arr=np.array([5,6,7,8,9])
print(arr[-3:-1])
[7 8]

10
[62] : #slicing parameters separated by a
#colon : (start:stop:step) directly to the ndarray object
arr=np.array([5,6,7,8,4,5,6,7,9])
print(arr[1:5:2])
[6 8]
[63] : arr=np.array([5,6,7,8,4,5,6,7,9])
print(arr[-1:-5:-1])

[9 7 6 5]

[64]: import numpy as np

arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])

print(arr[1, 1:4])

[7 8 9]

[65]: import numpy as np

arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])

print(arr[0:2, 2])

[3 8]

[66]: import numpy as np

arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])

print(arr[0:2, 1:4])

[[2 3 4]
[7 8 9]]
[67]
: #string indexing
b = "Hello, World!"
print(b[2:5])

llo
[68]: b = "Hello, World!"
print(b[:5])

Hello

11
[69]: b = "Hello, World!"
print(b[2:])

llo, World!

7 2.3 Re-Shaping
Reshaping in NumPy is the process of changing the shape (i.e., dimensions) of an existing array
without altering the data. This is particularly useful when you need to transform an array to fit a
[70] certain shape for further operations, such as machine learning or data processing task
:
import numpy as np

arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])

print(arr.shape)
(2, 4)
[71]: import numpy as np

arr = np.array([1, 2, 3, 4], ndmin=5)

print(arr)
print('shape of array :', arr.shape)
[[[[[1 2 3 4]]]]]
shape of array : (1, 1, 1, 1, 4)
[72]
: import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

arr1= arr.reshape(4,

3) print(arr1)
[[ 2 3]
1
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
[73]: import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

arr1 = arr.reshape(2, 2, 3)

12
print(arr1)

[[ 1 2 3]
[
[ 4 5 6]]

[[ 7 8 9]
[10 11 12]]]
[74]: import numpy as np
a=np.arange(8)
print(a.reshape(4,2))

[[0 1]
[2 3]
[4 5]
[75] [6 7]]
: a=np.arange(12).reshape(4,3)
print(a)
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]]

8 2.4 Joining Arrays

Joining arrays in NumPy is a way of combining two or more arrays into a single array. There are
[76] several ways to join arrays, depending on the desired result and the shape of the input arrays.
:
#concatenation
a1=np.arange(6).reshape(3,2)
a2=np.arange(6).reshape(3,2)
print(np.concatenate((a1,a2),axis=1))

[[0 1 0 1]
[2 3 2 3]
[4 5 4 5]]
[77] #numpy.hstack and vstack
: a = np.array([[1,2],[3,4]])
b = np.array([[5,6],[7,8]])
print(np.stack((a,b)))

[[[1 2]
[3 4]]

13
[[5 6]
[7 8]]]
[78]
: #stack()
print(np.stack((a,b),axis=0))

[[[1 2]
[3 4]]

[[5 6]
[7 8]]
]
[79]: print(np.stack((a,b),axis=1))

[[[1 2]
[5 6]]

[[3 4]
[7 8]]
]
[80]: #hstack()
ch = np.hstack((a,b))
print(ch)

[[1 2 5 6]
[3 4 7 8]]
[81]
: #vstack()
ch = np.vstack((a,b))
print(ch)

[[1 2]
[3 4]
[5 6]
[7 8]]

9 2.5 Splitting
Splitting in NumPy involves dividing an array into multiple sub-arrays. This can be useful when
you need to partition data for different processing purposes or when dealing with chunks of data
[82] in a structured way.
:
import numpy as np
a = np.arange(9)
print(a)

[0 1 2 3 4 5 6 7 8]

14
[83]: #split()
b = np.split(a,3)
print(b)
[array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]
[84]: #horizontal split-hsplit()
a = np.arange(12).reshape(4,3)
b=np.hsplit(a,3)
print(b)
[array([[0],
[3],
[6],
[9] ]), array([[ 1],
[ 4],
[ 7],
[10] ]), array([[ 2],
[ 5],
[85] [ 8],
: [11] ])]

#verticsl split-vsplit()
b=np.vsplit(a,2)
print(b) 1, 2],
[array([[0,
[3, 4, 5]]), array([[ 6, 7, 8],
[ 9, 10, 11]])]

10 3.Computation On Numpy Arrays Using Universal Functions

11 3.1 Unary Universal Functions

Unary Universal Functions (also known as unary ufuncs) in NumPy are mathematical functions
[86] that operate on a single input array element-wise. These functions apply a specific mathematical
: operation to each element of an array independently, resulting in an output array of the same shape.
arr = np.arange(10)
print(arr)
[0 1 2 3 4 5 6 7 8 9]
[87]: #sqrt() function
np.sqrt(arr)

15
[87] : array([0. , 1. , 1.41421356, 1.73205081, 2. ,
2.23606798, 2.44948974, 2.64575131, 2.82842712, 3. ])

[88] : #exp()
np.exp(arr)
[88] : array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00,
2.00855369e+01, 5.45981500e+01, 1.48413159e+02, 4.03428793e+02,
1.09663316e+03,
2.98095799e+03, 8.10308393e+03])

[89] : #min()
np.min(arr)
[89]: 0

[90] : #max()
np.max(arr)
[90]: 9

[91] : #average()
np.average(arr)
[91]: 4.5

[92] : #abs()
print(np.abs(arr))
[0 1 2 3 4 5 6 7 8 9]
[93] : #fabs()
arr=np.arange(0,-5,-0.5)
print(np.fabs(arr))
[0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5]

12 3.2 Binary Universal Functions

Binary Universal Functions (also known as binary ufuncs) operate on two input arrays elementwise.
These functions require two arrays (or one array and one scalar) and perform a mathematical
[94] : operation between corresponding elements

x =
np.random.randn(8) y
= np.random.randn(8)
[print(x)
1.11097262 -0.26995231 0.0060993 1.04398907 -1.82141342 0.00998652
0.08274781 0.82046885]

16
[95] :
print(y)

[-0.05342373 0.10817525 -0.4610533 0.5755554 -0.66695438 0.25344274

1.40395846 -0.87447163]
[96] :
np.maximum(x, y)
[96] : array([ 1.11097262, 0.10817525, 0.0060993 , 1.04398907, -0.66695438,
0.25344274, 1.40395846, 0.82046885])

[97] : arr = np.random.randn(7) * 5

remainder, whole_part =
np.modf(arr) print(remainder)

[-0.98958028 0.75318997 0.47148313 0.96309562 -0.84443205 0.60019609

0.41412946]
[98] :
print(whole_part)
[-6. 7. 2. 1. -2. 5. 3.]

[99] : import numpy as np

a = np.arange(9).reshape(3,3)
b = np.array([[10,10,10],[10,10,10],[10,10,10]])
print(np.add(a,b))
[[10 11 12]
[13 14 15]
[16 17 18]]
[100] :
np.subtract(a,b)
[100] : array([[-10, -9, -8],
[ -7, -6, -5],
[ -4, -3, -2]])

[101] : np.multiply(a,b)

[101] : array([[ 0, 10,

20],
[30, 40, 50],
[60, 70, 80]])

[102] : np.divide(a,b)

[102] : array([[0. , 0.1,

0.2],
[0.3, 0.4, 0.5],
[0.6, 0.7, 0.8]])

17
[103] : import numpy as np
a = np.array([10,100,1000])
np.power(a,2)
[103] : array([ 100, 10000, 1000000], dtype=int32)

13 4.Compute Statistical and Mathematical Methods and

Com- parison Operations on rows/columns
13.1 4.1 Mathematical and Statistical methods on Numpy Arrays
[104] : NumPy provides a variety of mathematical and statistical methods to perform operations on arrays.

a = np.array([[3,7,5],[8,4,3],[2,4,9]])
[104]a: array([[3, 7,
5],
[8, 4, 3],
[2, 4, 9]])

[105] : #sum()
a.sum()
[105]: 45

[106] : #percentile()
import numpy as np
a = np.array([[30,40,70],[80,20,10],[50,90,60]])
np.percentile(a,90)
[106]: 82.0

[107] : arr = np.random.randn(5, 4)

[108] : #mean()
arr.mean()
[108]: -0.14756616582071838

[109] : arr.mean(axis=1)

[109] : array([-0.93641711, 0.12758996, -0.44993246, 0.13099294, 0.38993583])

[110] : #median()
np.median(arr)
[110]: -0.28413298907449897

18
[111] : #standard deviation
np.std(arr)
[111]: 0.9329450218698545

[112] : #variance
np.var(arr)
[112]: 0.8703864138317433

[113] : #sum()
arr.sum(axis=0)
[113] : array([ 0.68253865, -2.88096912, -2.108008 , 1.35511515])

[114] : arr = np.array([0, 1, 2, 3, 4, 5, 6, 7])

print(arr.cumsum())

[ 0 1 3 6 10 15 21 28]
[115] : arr = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
print(arr.cumsum(axis=0))
[[ 1 2]
0
[ 3 5 7]
[ 9 12 15]]
[116] : print(arr.cumprod(axis=1))

[[ 0 0 0]
[ 3 12 60]
[ 6 42 336]]

13.2 4.2 Comparison Operations

Comparison operations in NumPy allow element-wise comparison between arrays or with scalars.
[117] :
#array_equal()
a=np.array([[1,2],[3,4]])
b=np.array([[1,2],[3,4]])
print(np.array_equal(a,b))
True
[118] : a=np.array([1,15,6,8])
b=np.array([11,12,6,4])

19
[119] : #greater()
print(np.greater(a,b))

[False True False True]

[120] :
print(np.greater(a[0],b[2]))
#greater_equal
[121] : False
print(np.greater_equal(a,b))

[False True True True]

[122] : #less()
print(np.less(a[0],b[2]))

True
[123] :
[print(np.less(a,b))
True False False False]

[124] : #less_equal()
print(np.less_equal(a,b))
[ True False True False]

14 5.Computation on Numpy Arrays using Sorting,unique

and Set Operations
14.1 5.1 Sorting
[125] :
Sorting helps to arrange elements of an array in a particular order.
import numpy as np
a = np.array([[3,7],[9,1]])
print(a)
[[3 7]
[9 1]]
[126] : #sort()
np.sort(a)
[126] : array([[3, 7],
[1, 9]])

20
[127] : np.sort(a,axis=0)

[127] : array([[3, 1],

[9, 7]])

[128] : np.sort(a,axis=1)

[128] : array([[3, 7],

[1, 9]])

[129] : arr = np.random.randn(5, 3)

print(arr)

[[-0.92147727 -0.67857177 -0.04478315]

[-0.30378745 -0.95433394 -1.83418572]
[-0.48103436 -0.55413111 1.28233061]
[ 0.76260305 1.30994277 0.32818117]
[130] : [ 1.87598839 -0.35057108 0.47603584]]
arr.sort(1)
print(arr)
[[-0.92147727 -0.67857177 -0.04478315]
[-1.83418572 -0.95433394 -0.30378745]
[-0.55413111 -0.48103436 1.28233061]
[ 0.32818117 0.76260305 1.30994277]
[-0.35057108 0.47603584 1.87598839]]

14.2 5.2 Unique Operation

[131] :
NumPy provides functions that perform set operations on arrays
#unique()
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
print(np.unique(names))
['Bob' 'Joe' 'Will']
[132] : # Contrast np.unique with the pure Python alternative:
sorted(set(names))
[132]: ['Bob', 'Joe', 'Will']
[133]:
ints = np.array([3, 3, 3, 2, 2, 1, 1, 4, 4])
print(np.unique(ints))
[1 2 3 4]

21
[134]: 14.3 5.3 Set Operations
#in1d()method
import numpy as np
values = np.array([6, 0, 0, 3, 2, 5, 6])
print(np.in1d(values, [2, 3, 6]))

[ True False False True True False True]

[135]: arr1=np.array([1,2,3,4])
arr2=np.array([3,4,5,6])
[136]: #union1d()
print(np.union1d(arr1,arr2))
[1 2 3 4 5 6]
[137]: #intersect1d()
print(np.intersect1d(arr1,arr2))
[3 4]
[138]: #setdiff1d()
print(np.setdiff1d(arr1,arr2))
[1 2]
[139]: #setxor1d()
print(np.setxor1d(arr1,arr2))
[1 2 5 6]

15 6.Load an image file and do crop and flip operation

using Numpy indexing
To load and manipulate images with NumPy, you can use the Pillow (PIL) library to load an
[8] image and convert it into a NumPy array.
:
from PIL import Image
img=Image.open("img.jpg")
img.format
[8] : 'JPEG'

[9] : import numpy as np

a=np.array(img)
print(a)

22
[[[242 242 242]
[242 242 242]
[242 242 242]
…
[195 195 195]
[195 195 195]
[195 195 195]]

[[242 242 242]

[242 242 242]
[242 242 242]
…
[195 195 195]
[195 195 195]
[195 195 195]]

[[242 242 242]

[242 242 242]
[242 242 242]
…
[195 195 195]
[195 195 195]
[195 195 195]]

[[208 208 208]

[208 208 208]
[206 206 206]
…
[163 163 163]
[163 163 163]
[163 163 163]]

[[208 208 208]

[207 207 207]
[206 206 206]
…
[164 164 164]
[164 164 164]
[164 164 164]]

[[207 207 207]

[207 207 207]
[205 205 205]
…
[165 165 165]
[165 165 165]
23
24
[165 165 165]]]
[10] : from IPython.display import display
# Display the original, cropped and flipped images
display(Image.fromarray(a))

25
[11] : crop_img=a[100:900,100:900,:]
img_out=Image.fromarray(crop_img)
img_out
[11]:

[12] : flipped_img=np.flipud(a)
display(Image.fromarray(flipped_img))

26
[ ]:

27
Data Manipulation with Pandas

1 1.create pandas series from python List ,Numpy

Arrays and Dictionary

2 1.1Pandas Series From Python List

[1] import pandas as pd
: import numpy as np
data=[4,7,-5,3]
a=pd.Series(data)
print(a)

0 4
1 7
2 -5
3 3
dtype: int64

[2] # import pandas lib. as pd

: import pandas as pd

# create Pandas Series with define indexes

x = pd.Series([10, 20, 30, 40, 50], index =['a', 'b', 'c', 'd', 'e'])

# print the Series

print(x)
a 10
b 20
c 30
d 40
e 50
[3] dtype: int64
:
import pandas as pd

ind = [10, 20, 30, 40, 50, 60, 70]

27
lst = ['G', 'h', 'i',
'j',
'k', 'l', 'm']

# create Pandas Series with define indexes

x = pd.Series(lst, index = ind)

# print the Series

print(x)
10 G
20 h
30 i
40 j
50 k
60 l
70 m
dtype: object
[4] : import pandas as pd
3import
1.2numpy
Pandas
as np Series From Numpy arrays

# numpy array
data = np.array(['a', 'b', 'c', 'd', 'e'])

# creating series
s = pd.Series(data)
print(s)

0 a
1 b
2 c
3 d
4 e
dtype: object

[5] : # importing Pandas & numpy

import pandas as pd
import numpy as np

# numpy array
data = np.array(['a', 'b', 'c', 'd', 'e'])

# creating series
s = pd.Series(data, index =[1000, 1001, 1002, 1003, 1004])
print(s)

28
1000 a
1001 b
1002 c
1003 d
1004 e
[6] : dtype: object

numpy_array = np.array([1, 2.8, 3.0, 2, 9, 4.2])

# Convert NumPy array to Series
s = pd.Series(numpy_array, index=list('abcdef'))
print("Output Series:")
Output Series:
print(s)
a 1.0
b 2.8
c 3.0
d 2.0
e 9.0
f 4.2
dtype: float64
[7] : import pandas as pd
4 1.3 Pandas Series From Dictionary
# create a dictionary
dictionary = {'D': 10, 'B': 20, 'C': 30}

# create a series
series = pd.Series(dictionary)

print(series)
D 10
B 20
C 30
dtype: int64
# import the pandas lib as pd
import pandas as pd
[8] :
# create a dictionary
dictionary = {'A': 50, 'B': 10, 'C': 80}

# create a series
series = pd.Series(dictionary, index=['B','C','A'])

29
print(series)

B 10
C 80
A 50
dtype: int64
[9] :
import pandas as pd

# create a dictionary
dictionary = {'A': 50, 'B': 10, 'C': 80}

# create a series
series = pd.Series(dictionary, index=['B', 'C', 'D', 'A'])
B 10.0
Cprint(series)
80.0
D NaN
A 50.0
dtype: float64

4.1 2. Data Manipulation with Pandas Series

[10] :
4.2 2.1 Indexing
import pandas as pd
import numpy as np

# creating simple array

data =
np.array(['s','p','a','n','d','a','n','a']) ser
= pd.Series(data,index=[10,11,12,13,14,15,16,17])
nprint(ser[16])
[11] : import pandas as pd

Date = ['1/1/2018', '2/1/2018', '3/1/2018', '4/1/2018']

Index_name = ['Day 1', 'Day 2', 'Day 3', 'Day
4'] sr = pd.Series(data = Date,
index = Index_name )
print(sr)
Day 1 1/1/2018
Day 2 2/1/2018

30
Day 3 3/1/2018
Day 4 4/1/2018
dtype: object
[12] :
print(sr['Day 1'])
1/1/2018

[13] : import numpy as np

import pandas as pd
s=pd.Series(np.arange(5.),index=['a','b','c','d','
e']) print(s)
a 0.0
b 1.0
c 2.0
d 3.0
e 4.0
dtype: float64
[24]
: 4.3 2.2 Selecting
import numpy as np
import pandas as pd
s=pd.Series(np.arange(5.),index=['a','b','c','d','
e']) print(s)
a 0.0
b 1.0
c 2.0
d 3.0
e 4.0
dtype: float64
[19] : s['b']

[19]: 1.0

[26]: s[['b','a','d']]

[26] : b 1.0
a 0.0
d 3.0
dtype: float64

[27] : s['b':'e']

31
[27]: 1.0
b
c 2.0
d 3.0
e 4.0
dtype: float64
[20] : s[1]

[20]: 1.0

[21] : s[2:4]

[21]: c 2.0
d 3.0
dtype: float64

[23]: s[[1,3]]

[23]: b 1.0
d 3.0
dtype: float64

[28] :
print(s[[0, 2, 4]])

a 0.0
c 2.0
e 4.0
dtype: float64
[4]
: 4.4 2.3 Filtering
import numpy as np
import pandas as pd
s=pd.Series(np.arange(5.),index=['a','b','c','d','
e']) print(s)
a 0.0
b 1.0
c 2.0
d 3.0
e 4.0
dtype: float64
[32]: s[s<2]

[32]: a 0.0
dtype: float64

32
[36]: s[s>2]

[36]: b 5.0
d 3.0
e 4.0
dtype: float64

[35]: s[s!=2]
[35]: 0.0
a
b 5.0
d 3.0
e 4.0
dtype: float64
[38]: s[(s>2)&(s<5) ]

[38]: d 3.0
e 4.0
dtype: float64

[33]: s['b':'c']

[33]: b 5.0
c 2.0
dtype: float64
[7]
print(s[1:2]==5)
:
b True
dtype: bool

[42] s[s.isin([2,4])]
:

[42]: c 2.0
e 4.0
dtype: float64

[8] 4.5 2.4 Arithmetic Operations

: import pandas as pd
series1 = pd.Series([1, 2, 3, 4, 5])
series2 = pd.Series([6, 7, 8, 9, 10])

[3]: series3 = series1 + series2

print(series3)

33
0 7
1 9
2 11
3 13
4 15
[4] dtype: int64
:
series3 = series1 - series2
print(series3)
0 -5
1 -5
2 -5
3 -5
4 -5
[5] dtype: int64
:
series3 = series1 *series2
print(series3)
0 6
1 14
2 24
3 36
4 50
[6] dtype: int64
:
series3 = series1 /series2
print(series3)
0 0.166667
1 0.285714
2 0.375000
3 0.444444
4 0.500000
[9] dtype: float64
:
series3 = series1 %series2
print(series3)
0 1
1 2
2 3
3 4
4 5
dtype: int64

34
[10] 4.6 2.5 Ranking
: import pandas as pd
s=pd.Series([121,211,153,214,115,116,237,118,219,120])
s.rank(ascending=True)

[10]: 0 5.0
1 7.0
2 6.0
3 8.0
4 1.0
5 2.0
6 10.0
7 3.0
8 9.0
9 4.0
dtype: float64

[49]: s.rank(ascending=False)

[49]: 0 6.0
1 4.0
2 5.0
3 3.0
4 10.0
5 9.0
6 1.0
7 8.0
8 2.0
9 7.0
dtype: float64

[11]: s.rank(method='min')

[11]: 0 5.0
1 7.0
2 6.0
3 8.0
4 1.0
5 2.0
6 10.0
7 3.0
8 9.0
9 4.0
dtype: float64

[12]: s.rank(method='max')

35
[12]: 0 5.0
1 7.0
2 6.0
3 8.0
4 1.0
5 2.0
6 10.0
7 3.0
8 9.0
9 4.0
dtype: float64

[50]: s.rank(method='first')

[50]: 0 5.0
1 7.0
2 6.0
3 8.0
4 1.0
5 2.0
6 10.0
7 3.0
8 9.0
9 4.0
dtype: float64

[52] 4.7 2.6pandas

import Sorting
as pd
: sr = pd.Series([19.5, 16.8, 22.78, 20.124, 18.1002])
print(sr)

0 19.5000
1 16.8000
2 22.7800
3 20.1240
4 18.1002
dtype: float64

[8] sr.sort_values(ascending = False)

:
[8]: 22.7800
2
3 20.1240
0 19.5000
4 18.1002
1 16.8000
36
dtype: float64

37
[53]: sr.sort_values(ascending = True)
[53]: 16.8000
1
4 18.1002
0 19.5000
3 20.1240
2 22.7800
dtype: float64

[55]: sr.sort_index()
[55]: 19.5000
0
1 16.8000
2 22.7800
3 20.1240
4 18.1002

dtype: float64
[58]
: print(sr.sort_values(kind))

1 16.8000
4 18.1002
0 19.5000
3 20.1240
2 22.7800
dtype: float64
[40]
4.8 2.7 checking null values
:
s=pd.Series({'ohio':35000,'teyas':71000,'oregon':16000,'utah':500
0}) print(s)
states=['california','ohio','Texas','oregon']
x=pd.Series(s,index=states)
print(x)

ohio 35000
teyas 71000
oregon 16000
utah 5000
dtype: int64
california
NaN ohio
35000.0
Texas NaN
oregon 16000.0
dtype: float64

[42] x.isnull()
:
38
[42]: california
True
ohio
False
Texas True
oregon False
dtype: bool

[44]: x.notnull()

[44]: california
False ohio
True
Texas False
oregon
True dtype: bool

[19] 4.9 2.8 Concatenation

: # creating the Series
series1 = pd.Series([1, 2, 3])
series2 = pd.Series(['A', 'B',
'C'])
[65]: # concatenating
display(pd.concat([series1, series2]))
0 1
1 2
2 3
0 A
1 B
2 C
[66] dtype: object
:
display(pd.concat([series1, series2],
axis = 1))
0 1
0 1 A
1 2 B
[67] 2 3 C
:
display(pd.concat([series1, series2],
axis = 0))
0 1
1 2
2 3
0 A
1 B

39
2 C
dtype: object
[21]
: print(pd.concat([series1, series2], ignore_index=True))

0 1
1 2
2 3
3 A
4 B
5 C
[22] dtype: object
:
print(pd.concat([series1, series2], ignore_index=False))

0 1
1 2
2 3
0 A
1 B
[69]
2 C
:
dtype: object

print(pd.concat([series1, series2], keys=['series1', 'series2']))

series1 0 1
1 2
2 3
series2 0 A
1 B
2 C
[16] dtype: object
:
4.10 3 .Creating DataFrames from List and Dictionary
4.11 3.1 From List
data = [1, 2, 3, 4, 5]

# Convert to DataFrame
df = pd.DataFrame(data, columns=['Numbers'])
print(df)
Numbers
0 1
1 2
2 3
3 4
4 5

40
[70]: import pandas as pd
nme = ["aparna", "pankaj", "sudhir", "Geeku"]
deg = ["MBA", "BCA", "M.Tech",
"MBA"] scr = [90, 40, 80, 98]
dict = {'name': nme, 'degree': deg, 'score':
scr} df = pd.DataFrame(dict)
print(df)
name degree score
0 aparna MBA 90
1 pankaj BCA 40
2 sudhir M.Tech 80
[38]: import pandas as pd 98
3 Geeku MBA
data = [['G', 10], ['h', 15], ['i', 20]]
# Create the pandas Dataframe
df = pd.DataFrame(data, columns = ['Name', 'Age'])
# print dataframe.
print(df)
Name Age
0 G 10
1 h 15
2 i 20

[39] 4.12 3.2 From Dictionary

:
df=pd.DataFrame({'a':[4,5,6],'b':[7,8,9],'c':[10,11,12]},index=[1,2,3])
print(df)
a b c
1 4 7 10
2 5 8 11
3 6 9 12
[13] df=pd.DataFrame({'state':['AP','AP','AP','TS','TS','TS'],'year':
: 𝗌[2000,2001,2002,2000,2001,2002],'pop':[1.5,1.7,3.6,2.4,2.9,3.2]})

print(df)

state year pop

0 AP 2000 1.5
1 AP 2001 1.7
2 AP 2002 3.6
3 TS 2000 2.4
4 TS 2001 2.9
5 TS 2002 3.2

41
[14] : df=pd.DataFrame({'a':[4,5,6],'b':[7,8,9]},index=pd.MultiIndex.
𝗌from_tuples([('d',1),('d',2),('e',2)] ,names=['n','v']))
print(df)

a b
n v
d 1 4 7
2 5 8
e 2 6 9
[71]: df=pd.DataFrame({'ap':{'a':0.0,'c':3.0,'d':6.0},'ts':
{'a':1.0,'c':4.0,'d':7.
𝗌0},'tn':{'a':2.0,'c':5.0,'d':8.0}})

[71]: df.reindex(['a','b','c','d'])
ap ts tn
a 0.0 1.0 2.0
b NaN NaN NaN
c 3.0 4.0 5.0
d 6.0 7.0 8.0

4.13 4.Import various file formats to pandas DataFrames and preform

the following
4.14 4.1 Importing file
[10]
: import pandas as pd
data=pd.read_csv('bird.csv')
data

[10]: id huml humw ulnal ulnaw feml fem tibl tibw tarl tarw \
w
0 0 80.78 6.68 72.01 4.88 41.81 3.70 5.50 4.03 38.70 3.84
1 1 88.91 6.63 80.53 5.59 47.04 4.30 80.22 4.51 41.50 4.01
2 2 79.97 6.37 69.26 5.28 43.07 3.90 75.35 4.04 38.31 3.34
3 3 77.65 5.70 65.76 4.77 40.04 3.52 69.17 3.40 35.78 3.41
4 4 62.80 4.84 52.09 3.73 33.95 2.72 56.27 2.96 31.88 3.13
.. … … … … … … … … … … …
415 415 17.96 1.63 19.25 1.33 18.36 1.54 31.25 1.33 21.99 1.15
416 416 19.21 1.64 20.76 1.49 19.24 1.45 33.21 1.28 23.60 1.15
417 417 18.79 1.63 19.83 1.53 20.96 1.43 34.45 1.41 22.86 1.21
418 418 20.38 1.78 22.53 1.50 21.35 1.48 36.09 1.53 25.98 1.24
419 419 17.89 1.44 19.26 1.10 17.62 1.34 29.81 1.24 21.69 1.05

type
0 SW
1 SW
2 SW
3 SW

42
4 SW
.. …
415 SO
416 SO
417 SO
418 SO
419 SO

[420 rows x 12 columns]

[15] : 4.15 4.2 display top and bottom five rows

data.head(5)
[15] : id huml humw ulnal ulnaw feml femw tibl tibw tarl tarw type
0 0 80.78 6.68 72.01 4.88 41.81 3.70 5.50 4.03 38.70 3.84 SW
1 1 88.91 6.63 80.53 5.59 47.04 4.30 80.22 4.51 41.50 4.01 SW
2 2 79.97 6.37 69.26 5.28 43.07 3.90 75.35 4.04 38.31 3.34 SW
3 3 77.65 5.70 65.76 4.77 40.04 3.52 69.17 3.40 35.78 3.41 SW
4 4 62.80 4.84 52.09 3.73 33.95 2.72 56.27 2.96 31.88 3.13 SW

[16] : data.tail(5)

[16]: id huml humw ulnal ulnaw feml femw tibl tibw tarl tarw \
415 415 17.96 1.63 19.25 1.33 18.36 1.54 31.25 1.33 21.99 1.15
416 416 19.21 1.64 20.76 1.49 19.24 1.45 33.21 1.28 23.60 1.15
417 417 18.79 1.63 19.83 1.53 20.96 1.43 34.45 1.41 22.86 1.21
418 418 20.38 1.78 22.53 1.50 21.35 1.48 36.09 1.53 25.98 1.24
419 419 17.89 1.44 19.26 1.10 17.62 1.34 29.81 1.24 21.69 1.05

type
415 SO
416 SO
417 SO
418 SO
419 SO

[17] : 4.16 4.3 Get shape,data type,null values,index and column details
data.shape
[17]: (420, 12)

[18] : data.dtypes

43
[18] : id int64
huml float64
humw float64
ulnal float64
ulnaw float64
feml float64
femw float64
tibl float64
tibw float64
tarl float64
tarw float64
type
object dtype:
object

[19] : data.isnull().sum()

[19]: id 0
huml 1
humw 1
ulnal 3
ulnaw 2
feml 2
femw 1
tibl 2
tibw 1
tarl 1
tarw 1
type 0
dtype: int64

[20] : data.columns

[20]: Index(['id', 'huml', 'humw', 'ulnal', 'ulnaw', 'feml', 'femw', 'tibl',

'tibw', 'tarl', 'tarw', 'type'],
dtype='object')

[21]: data.index

[21]: RangeIndex(start=0, stop=420, step=1)

[24] 4.17 4.4 Select/Delete the records rows/columns based on conditions

data.loc[data['huml']>4]
:

[24]: id huml humw ulnal ulnaw feml femw tibl tibw tarl tarw \
0 0 80.78 6.68 72.01 4.88 41.81 3.70 5.50 4.03 38.70 3.84
1 1 88.91 6.63 80.53 5.59 47.04 4.30 80.22 4.51 41.50 4.01
2 2 79.97 6.37 69.26 5.28 43.07 3.90 75.35 4.04 38.31 3.34

44
3 3 77.65 5.70 65.76 4.77 40.04 3.52 69.17 3.40 35.78 3.41
4 4 62.80 4.84 52.09 3.73 33.95 2.72 56.27 2.96 31.88 3.13
.. … … … … … … … … … … …
415 415 17.96 1.63 19.25 1.33 18.36 1.54 31.25 1.33 21.99 1.15
416 416 19.21 1.64 20.76 1.49 19.24 1.45 33.21 1.28 23.60 1.15
417 417 18.79 1.63 19.83 1.53 20.96 1.43 34.45 1.41 22.86 1.21
418 418 20.38 1.78 22.53 1.50 21.35 1.48 36.09 1.53 25.98 1.24
419 419 17.89 1.44 19.26 1.10 17.62 1.34 29.81 1.24 21.69 1.05

type
0 SW
1 SW
2 SW
3 SW
4 SW
.. …
415 SO
416 SO
417 SO
418 SO
419 SO

[419 rows x 12 columns]

[25] data.drop([0,3])
:
[25]: id huml humw ulnal ulnaw feml fem tibl tibw tarl tarw \
w
1 1 88.91 6.63 80.53 5.59 47.04 4.30 80.22 4.51 41.50 4.01
2 2 79.97 6.37 69.26 5.28 43.07 3.90 75.35 4.04 38.31 3.34
4 4 62.80 4.84 52.09 3.73 33.95 2.72 56.27 2.96 31.88 3.13
5 5 61.92 4.78 50.46 3.47 49.52 4.41 56.95 2.73 29.07 2.83
6 6 79.73 5.94 67.39 4.50 42.07 3.41 71.26 3.56 37.22 3.64
.. … … … … … … … … … … …
415 415 17.96 1.63 19.25 1.33 18.36 1.54 31.25 1.33 21.99 1.15
416 416 19.21 1.64 20.76 1.49 19.24 1.45 33.21 1.28 23.60 1.15
417 417 18.79 1.63 19.83 1.53 20.96 1.43 34.45 1.41 22.86 1.21
418 418 20.38 1.78 22.53 1.50 21.35 1.48 36.09 1.53 25.98 1.24
419 419 17.89 1.44 19.26 1.10 17.62 1.34 29.81 1.24 21.69 1.05

type
1 SW
2 SW
4 SW
5 SW
6 SW
.. …

45
415 SO
416 SO
417 SO
418 SO
419 SO

[418 rows x 12 columns]

[27]: data.drop(data[data['huml']>4.3].index)

[27] : id huml humw ulnal ulnaw feml femw tibl tibw tarl tarw type
342 342 NaN NaN NaN NaN 32.54 2.65 55.06 2.81 38.94 2.25 SO

[28] : data.loc[6,'ulnal']

[28]: 67.39

[29] : data.loc[11:15][['huml','humw']]

[29] : huml humw

11 186.00 9.83
12 172.00 8.44
13 148.91 6.78
14 149.19 6.98
15 140.59 6.59

[30] : 4.18
data
4.5 Sorting and Ranking operations in DataFrame

[30]: id huml humw ulnal ulnaw feml fem tibl tibw tarl tarw \
w
0 0 80.78 6.68 72.01 4.88 41.81 3.70 5.50 4.03 38.70 3.84
1 1 88.91 6.63 80.53 5.59 47.04 4.30 80.22 4.51 41.50 4.01
2 2 79.97 6.37 69.26 5.28 43.07 3.90 75.35 4.04 38.31 3.34
3 3 77.65 5.70 65.76 4.77 40.04 3.52 69.17 3.40 35.78 3.41
4 4 62.80 4.84 52.09 3.73 33.95 2.72 56.27 2.96 31.88 3.13
.. … … … … … … … … … … …
415 415 17.96 1.63 19.25 1.33 18.36 1.54 31.25 1.33 21.99 1.15
416 416 19.21 1.64 20.76 1.49 19.24 1.45 33.21 1.28 23.60 1.15
417 417 18.79 1.63 19.83 1.53 20.96 1.43 34.45 1.41 22.86 1.21
418 418 20.38 1.78 22.53 1.50 21.35 1.48 36.09 1.53 25.98 1.24
419 419 17.89 1.44 19.26 1.10 17.62 1.34 29.81 1.24 21.69 1.05

type
0 SW
1 SW
2 SW

46
3 SW
4 SW
.. …
415 SO
416 SO
417 SO
418 SO
419 SO

[420 rows x 12 columns]

[31] : data.sort_index(ascending=False)

[31]: id huml humw ulnal ulnaw feml fem tibl tibw tarl tarw \
w
419 419 17.89 1.44 19.26 1.10 17.62 1.34 29.81 1.24 21.69 1.05
418 418 20.38 1.78 22.53 1.50 21.35 1.48 36.09 1.53 25.98 1.24
417 417 18.79 1.63 19.83 1.53 20.96 1.43 34.45 1.41 22.86 1.21
416 416 19.21 1.64 20.76 1.49 19.24 1.45 33.21 1.28 23.60 1.15
415 415 17.96 1.63 19.25 1.33 18.36 1.54 31.25 1.33 21.99 1.15
.. … … … … … … … … … … …
4 4 62.80 4.84 52.09 3.73 33.95 2.72 56.27 2.96 31.88 3.13
3 3 77.65 5.70 65.76 4.77 40.04 3.52 69.17 3.40 35.78 3.41
2 2 79.97 6.37 69.26 5.28 43.07 3.90 75.35 4.04 38.31 3.34
1 1 88.91 6.63 80.53 5.59 47.04 4.30 80.22 4.51 41.50 4.01
0 0 80.78 6.68 72.01 4.88 41.81 3.70 5.50 4.03 38.70 3.84

type
419 SO
418 SO
417 SO
416 SO
415 SO
.. …
4 SW
3 SW
2 SW
1 SW
0 SW

[420 rows x 12 columns]

[32] : data.sort_values(['ulnaw']).head(6)

[32] : id huml humw ulnal ulnaw feml femw tibl tibw tarl tarw \
369 369 13.48 1.27 16.00 1.00 12.67 1.10 23.12 0.88 16.34 0.89
413 413 12.95 1.16 14.09 1.03 13.03 1.03 22.13 0.96 15.19 1.02
395 395 15.62 1.28 18.52 1.06 15.75 1.17 28.63 1.03 21.39 0.88

47
367 367 13.31 1.17 16.47 1.06 12.32 0.93 22.47 0.95 15.97 0.75
414 414 13.63 1.16 15.22 1.06 13.75 0.99 23.13 0.96 15.62 1.01
376 376 13.52 1.28 17.88 1.07 15.10 1.05 25.14 1.23 17.81 0.69

type
369 SO
413 SO
395 SO
367 SO
414 SO
376 SO

[33] : data.sort_values(by=['ulnaw','ulnal']).head(6)

[33]: id huml humw ulnal ulnaw feml femw tibl tibw tarl tarw \
369 369 13.48 1.27 16.00 1.00 12.67 1.10 23.12 0.88 16.34 0.89
413 413 12.95 1.16 14.09 1.03 13.03 1.03 22.13 0.96 15.19 1.02
414 414 13.63 1.16 15.22 1.06 13.75 0.99 23.13 0.96 15.62 1.01
367 367 13.31 1.17 16.47 1.06 12.32 0.93 22.47 0.95 15.97 0.75
395 395 15.62 1.28 18.52 1.06 15.75 1.17 28.63 1.03 21.39 0.88
376 376 13.52 1.28 17.88 1.07 15.10 1.05 25.14 1.23 17.81 0.69

type
369 SO
413 SO
414 SO
367 SO
395 SO
376 SO

[34] : data.rank().head(10)

[34]: id huml humw ulnal ulnaw feml femw tibl tibw tarl tarw \
0 1.0 289.0 344.0 275.0 325.5 289.0 295.0 1.0 302.0 272.0 328.0
1 2.0 308.0 343.0 284.0 343.0 312.0 320.0 308.0 327.5 285.0 333.0
2 3.0 286.0 336.0 268.0 334.0 295.0 303.5 292.0 303.5 271.0 305.5
3 4.0 284.0 308.0 255.0 313.5 279.0 288.0 270.0 272.5 247.0 310.5
4 5.0 248.0 281.0 227.5 258.0 224.0 225.5 231.0 250.0 211.0 294.0
5 6.0 246.0 275.0 223.0 242.0 326.0 322.0 234.0 234.0 181.0 268.5
6 7.0 285.0 321.0 262.0 304.0 292.0 282.5 279.0 280.5 259.0 320.0
7 8.0 304.0 306.0 278.0 306.0 300.0 299.0 296.0 295.5 266.0 324.0
8 9.0 362.0 370.0 354.0 362.0 365.0 356.5 363.5 359.0 352.0 346.0
9 10.0 387.0 399.0 381.5 383.0 382.0 398.0 382.0 397.0 392.0 377.0

typ
0 e
274.5
1 274.5

48
2 274.5
3 274.5
4 274.5
5 274.5
6 274.5
7 274.5
8 274.5
9 274.5

[35] : data.rank().head(2)

[35]: id huml humw ulnal ulnaw feml femw tibl tibw tarl tarw \
0 1.0 289.0 344.0 275.0 325.5 289.0 295.0 1.0 302.0 272.0 328.0
1 2.0 308.0 343.0 284.0 343.0 312.0 320.0 308.0 327.5 285.0 333.0

type
0 274.5
1 274.5

[15]: data.rank(ascending=False).head(5)
[15]: PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket \
0 891.0 617.0 246.0 783.0 289.0 497.0 179.0 552.5 220.0
1 890.0 171.5 783.5 701.0 734.5 183.0 179.0 552.5 112.0
2 889.0 171.5 246.0 538.0 734.5 404.5 587.5 552.5 17.0
3 888.0 171.5 783.5 619.0 734.5 226.5 179.0 552.5 824.5
4 887.0 617.0 246.0 876.0 289.0 226.5 587.5 552.5 283.0

Fare Cabin Embarke

d
0 815.0 NaN 322.5
1 103.0 94.0 805.5
2 659.5 NaN 322.5
3 144.0 134.5 322.5
4 628.0 NaN 322.5

4.18.1 4.6 Statistical operations

[23]: import pandas as pd
data=pd.read_csv('gym-
track.csv') data
[23]: Age Gender Weight (kg) Height (m) Max_BPMAvg_BPM Resting_BPM \
0 56 Male 88.3 1.71 180 157 60
1 46 Female 74.9 1.53 179 151 66
2 32 Female 68.1 1.66 167 122 54
3 25 Male 53.2 1.70 190 164 56
4 38 Male 46.1 1.79 188 158 68

49
.. … … … … … … …
968 24 Male 87.1 1.74 187 158 67
969 25 Male 66.6 1.61 184 166 56
970 59 Female 60.4 1.76 194 120 53
971 32 Male 126.4 1.83 198 146 62
972 46 Male 88.7 1.63 166 146 66

Session_Duration (hours) Calories_Burne Workout_Typ Fat_Percentage \

d e
0 1.69 1313.0 Yoga 12.6
1 1.30 883.0 HIIT 33.9
2 1.11 677.0 Cardi 33.4
o
3 0.59 532.0 Strengt 28.8
h
4 0.64 556.0 Strengt 29.2
.. … … h …
968 1.57 1364.0 … 10.0
Strengt
h
969 1.38 1260.0 Strengt 25.0
h
970 1.72 929.0 Cardi 18.8
o
971 1.10 883.0 HIIT 28.2
972 0.75 542.0 Strengt 28.8
h

Water_Intake (liters) Workout_Frequenc (days/week) Experience_Leve \

y l
0 3.5 4 3
1 2.1 4 2
2 2.3 4 2
3 2.1 3 1
4 2.8 3 1
.. … … …
968 3.5 4 3
969 3.0 2 1
970 2.7 5 3
971 2.1 3 2
972 3.5 2 1

BM
0 I
30.20
1 32.00
2 24.71
3 18.41
4 14.39
50
.. …
968 28.77
969 25.69
970 19.50
971 37.74
972 33.38

[973 rows x 15 columns]

51
[25]: data['Age'].mean()

[25]: 38.68345323741007

[28]: data['Age'].median()

[28]: 40.0

[29]: data['Age'].std()

[29]: 12.180927866987108

[30] : data['Age'].sum()

[30]: 37639

[31] : data['Age'].var()

[31]: 148.37500370074312

4.18.2 4.7 count and Uniqueness of given Categorical values

[35]: data.count()

[35]: Age 973

Gender 973
Weight (kg) 973
Height (m) 973
Max_BPM 973
Avg_BPM 973
Resting_BPM 973
Session_Duration (hours) 973
Calories_Burned 973
Workout_Type 973
Fat_Percentage 973
Water_Intake (liters) 973
Workout_Frequency (days/week) 973
Experience_Level 973
BMI 973
dtype: int64

52
Data cleaning and preparation

a) handling missing data by detecting dropping and

replacing/filling mising valus
import pandas as pd
import numpy as np

Student performance
Import any csv file to pandas data frame and perform the following
# Load the CSV file into a Pandas DataFrame
df = pd.read_csv('student.csv')
df

Hours_Studied Attendance Parental_Involvement

Access_to_Resources \
0 23 84 Low
High
1 19 64 Low
Medium
2 24 98 Medium
Medium
3 29 89 Low
Medium
4 19 92 Medium
Medium
... ... ... ... .
..
6602 25 69 High
Medium
6603 23 76 High
Medium
6604 20 90 Medium
Low
6605 10 86 High
High
6606 15 67 Medium
Low
Extracurricular_Activities Sleep_Hours Previous_Scores \
0 No 7 73
1 No 8 59
2 Yes 7 91
3 Yes 8 98
4 Yes 6 65
... ... ... ...
6602 No 7 76

51
6603 No 8 81
6604 Yes 6 65
6605 Yes 6 91
6606 Yes 9 94

Motivation_Level Internet_Access Tutoring_Sessions Family_Income

\
0 Low Yes 0 Low

1 Low Yes 2 Medium

2 Medium Yes 2 Medium

3 Medium Yes 1 Medium

4 Medium Yes 3 Medium

... ... ... ... ...

6602 Medium Yes 1 High

6603 Medium Yes 3 Low

6604 Low Yes 3 Low

6605 High Yes 2 Low

6606 Medium Yes 0 Medium

Teacher_Quality School_Type Peer_Influence Physical_Activity \

0 Medium Public Positive 3
1 Medium Public Negative 4
2 Medium Public Neutral 4
3 Medium Public Negative 4
4 High Public Neutral 4
... ... ... ... ...
6602 Medium Public Positive 2
6603 High Public Positive 2
6604 Medium Public Negative 2
6605 Medium Private Positive 3
6606 Medium Public Positive 4

Learning_Disabilities Parental_Education_Level Distance_from_Home

\
0 No High School Near

1 No College Moderate

2 No Postgraduate Near

52
3 No High School Moderate

4 No College Near

... ... ... ...

6602 No High School Near

6603 No High School Near

6604 No Postgraduate Near

6605 No High School Far

6606 No Postgraduate Near

Gender Exam_Score
0 Male 67
1 Female 61
2 Male 74
3 Male 71
4 Female 70
... ... ...
6602 Female 68
6603 Female 69
6604 Female 68
6605 Female 68
6606 Male 64

[6607 rows x 20 columns]

# Display the first few rows of the DataFrame to understand the data
print("Original DataFrame:")
print(df.head())

Original DataFrame:
Hours_Studied Attendance Parental_Involvement Access_to_Resources
\
0 23 84 Low High

1 19 64 Low Medium

2 24 98 Medium Medium

3 29 89 Low Medium

4 19 92 Medium Medium

Extracurricular_Activities Sleep_Hours Previous_Scores

53
Motivation_Level \
0 No 7 73
Low
1 No 8 59
Low
2 Yes 7 91
Medium
3 Yes 8 98
Medium
4 Yes 6 65
Medium

Internet_Access Tutoring_Sessions Family_Income Teacher_Quality \

0 Yes 0 Low Medium
1 Yes 2 Medium Medium
2 Yes 2 Medium Medium
3 Yes 1 Medium Medium
4 Yes 3 Medium High

School_Type Peer_Influence Physical_Activity Learning_Disabilities

\
0 Public Positive 3 No

1 Public Negative 4 No

2 Public Neutral 4 No

3 Public Negative 4 No

4 Public Neutral 4 No

Parental_Education_Level Distance_from_Home Gender Exam_Score

0 High School Near Male 67
1 College Moderate Female 61
2 Postgraduate Near Male 74
3 High School Moderate Male 71
4 College Near Female 70

# 1. Detect missing data

missing_data = df.isnull()
print("\nMissing Data:")
print(missing_data.head(10))

Missing Data:
Hours_Studied Attendance Parental_Involvement
Access_to_Resources \
0 False False False
False
1 False False False

54
False
2 False False False
False
3 False False False
False
4 False False False
False
5 False False False
False
6 False False False
False
7 False False False
False
8 False False False
False
9 False False False
False

Extracurricular_Activities Sleep_Hours Previous_Scores

Motivation_Level \
0 False False False
False
1 False False False
False
2 False False False
False
3 False False False
False
4 False False False
False
5 False False False
False
6 False False False
False
7 False False False
False
8 False False False
False
9 False False False
False

Internet_Access Tutoring_Sessions Family_Income Teacher_Quality

\
0 False False False False

1 False False False False

2 False False False False

3 False False False False

55
4 False False False False

5 False False False False

6 False False False False

7 False False False False

8 False False False False

9 False False False False

School_Type Peer_Influence Physical_Activity

Learning_Disabilities \
0 False False False
False
1 False False False
False
2 False False False
False
3 False False False
False
4 False False False
False
5 False False False
False
6 False False False
False
7 False False False
False
8 False False False
False
9 False False False
False

Parental_Education_Level Distance_from_Home Gender Exam_Score

0 False False False False
1 False False False False
2 False False False False
3 False False False False
4 False False False False
5 False False False False
6 False False False False
7 False False False False
8 False False False False
9 False False False False

56
# No of null values
n=df.isnull().sum()
n

Hours_Studied 0
Attendance 0
Parental_Involvement 0
Access_to_Resources 0
Extracurricular_Activities 0
Sleep_Hours 0
Previous_Scores 0
Motivation_Level 0
Internet_Access 0
Tutoring_Sessions 0
Family_Income 0
Teacher_Quality 78
School_Type 0
Peer_Influence 0
Physical_Activity 0
Learning_Disabilities 0
Parental_Education_Level 90
Distance_from_Home 67
Gender 0
Exam_Score 0
dtype: int64

# 2. Drop rows with missing values

df_dropna = df.dropna()
print("\nDataFrame after dropping rows with missing values:")
print(df_dropna.head(10))

DataFrame after dropping rows with missing values:

Hours_Studied Attendance Parental_Involvement Access_to_Resources
\
0 23 84 Low High

1 19 64 Low Medium

2 24 98 Medium Medium

3 29 89 Low Medium

4 19 92 Medium Medium

5 19 88 Medium Medium

6 29 84 Medium Low

7 25 78 Low High

57
8 17 94 Medium High

9 23 98 Medium Medium

Extracurricular_Activities Sleep_Hours Previous_Scores

Motivation_Level \
0 No 7 73
Low
1 No 8 59
Low
2 Yes 7 91
Medium
3 Yes 8 98
Medium
4 Yes 6 65
Medium
5 Yes 8 89
Medium
6 Yes 7 68
Low
7 Yes 6 50
Medium
8 No 6 80
High
9 Yes 8 71
Medium

Internet_Access Tutoring_Sessions Family_Income Teacher_Quality \

0 Yes 0 Low Medium
1 Yes 2 Medium Medium
2 Yes 2 Medium Medium
3 Yes 1 Medium Medium
4 Yes 3 Medium High
5 Yes 3 Medium Medium
6 Yes 1 Low Medium
7 Yes 1 High High
8 Yes 0 Medium Low
9 Yes 0 High High

School_Type Peer_Influence Physical_Activity Learning_Disabilities

\
0 Public Positive 3 No

1 Public Negative 4 No

2 Public Neutral 4 No

3 Public Negative 4 No

58
4 Public Neutral 4 No

5 Public Positive 3 No

6 Private Neutral 2 No

7 Public Negative 2 No

8 Private Neutral 1 No

9 Public Positive 5 No

Parental_Education_Level Distance_from_Home Gender Exam_Score

0 High School Near Male 67
1 College Moderate Female 61
2 Postgraduate Near Male 74
3 High School Moderate Male 71
4 College Near Female 70
5 Postgraduate Near Male 71
6 High School Moderate Male 67
7 High School Far Male 66
8 College Near Male 69
9 High School Moderate Male 72

# 3. Fill missing values with a specific value (e.g., mean, median, or

custom value)
# Let's fill missing values in the 'Age' column with the mean value of
that column
mean_chas = df['Attendance'].mean()
df_fillna = df.fillna({'Attendance': mean_chas}) print("\
nDataFrame after filling missing values:")
print(df_fillna.head(10))

DataFrame after filling missing values:

Hours_Studied Attendance Parental_Involvement Access_to_Resources
\
0 23 84 Low High

1 19 64 Low Medium

2 24 98 Medium Medium

3 29 89 Low Medium

4 19 92 Medium Medium

5 19 88 Medium Medium

6 29 84 Medium Low

59
7 25 78 Low High

8 17 94 Medium High

9 23 98 Medium Medium

Extracurricular_Activities Sleep_Hours Previous_Scores

Motivation_Level \
0 No 7 73
Low
1 No 8 59
Low
2 Yes 7 91
Medium
3 Yes 8 98
Medium
4 Yes 6 65
Medium
5 Yes 8 89
Medium
6 Yes 7 68
Low
7 Yes 6 50
Medium
8 No 6 80
High
9 Yes 8 71
Medium

Internet_Access Tutoring_Sessions Family_Income Teacher_Quality \

School_Type Peer_Influence Physical_Activity Learning_Disabilities

\
0 Public Positive 3 No

1 Public Negative 4 No

2 Public Neutral 4 No

60
3 Public Negative 4 No

4 Public Neutral 4 No

5 Public Positive 3 No

6 Private Neutral 2 No

7 Public Negative 2 No

8 Private Neutral 1 No

9 Public Positive 5 No

Parental_Education_Level Distance_from_Home Gender Exam_Score

# 4. Replace missing values conditionally

# For example, replace missing values in 'City' with 'Unknown'
df_replace = df.fillna({'Hours_Studied': 'Unknown'}) print("\
nDataFrame after replacing missing values:")
print(df_replace.head())

DataFrame after replacing missing values:

Hours_Studied Attendance Parental_Involvement Access_to_Resources
\
0 23 84 Low High

1 19 64 Low Medium

2 24 98 Medium Medium

3 29 89 Low Medium

4 19 92 Medium Medium

Extracurricular_Activities Sleep_Hours Previous_Scores

Motivation_Level \

61
0 No 7 73
Low
1 No 8 59
Low
2 Yes 7 91
Medium
3 Yes 8 98
Medium
4 Yes 6 65
Medium

Internet_Access Tutoring_Sessions Family_Income Teacher_Quality \

0 Yes 0 Low Medium
1 Yes 2 Medium Medium
2 Yes 2 Medium Medium
3 Yes 1 Medium Medium
4 Yes 3 Medium High

School_Type Peer_Influence Physical_Activity Learning_Disabilities

\
0 Public Positive 3 No

1 Public Negative 4 No

2 Public Neutral 4 No

3 Public Negative 4 No

4 Public Neutral 4 No

Parental_Education_Level Distance_from_Home Gender Exam_Score

0 High School Near Male 67
1 College Moderate Female 61
2 Postgraduate Near Male 74
3 High School Moderate Male 71
4 College Near Female 70

b) transform data using apply() and map() method

# Load the CSV file into a Pandas DataFrame
# Replace 'data.csv' with the actual file path if needed
df=pd.read_csv('student.csv')

# Display the first few rows of the DataFrame to understand the data
print("Original DataFrame:")
print(df.head())

Original DataFrame:
Hours_Studied Attendance Parental_Involvement Access_to_Resources

62
\
0 23 84 Low High

1 19 64 Low Medium

2 24 98 Medium Medium

3 29 89 Low Medium

4 19 92 Medium Medium

Extracurricular_Activities Sleep_Hours Previous_Scores

Motivation_Level \
0 No 7 73
Low
1 No 8 59
Low
2 Yes 7 91
Medium
3 Yes 8 98
Medium
4 Yes 6 65
Medium

Internet_Access Tutoring_Sessions Family_Income Teacher_Quality \

0 Yes 0 Low Medium
1 Yes 2 Medium Medium
2 Yes 2 Medium Medium
3 Yes 1 Medium Medium
4 Yes 3 Medium High

School_Type Peer_Influence Physical_Activity Learning_Disabilities

\
0 Public Positive 3 No

1 Public Negative 4 No

2 Public Neutral 4 No

3 Public Negative 4 No

4 Public Neutral 4 No

Parental_Education_Level Distance_from_Home Gender Exam_Score

0 High School Near Male 67
1 College Moderate Female 61
2 Postgraduate Near Male 74
3 High School Moderate Male 71
4 College Near Female 70

63
# Assume 'Price' is a column that we want to transform

# 1. Transform using apply() method

# Let's square the values in the 'Price' column
df['Sleep_Hours'] = df['Previous_Scores'].apply(lambda x: x ** 2)
df

Hours_Studied Attendance Parental_Involvement

Extracurricular_Activities Sleep_Hours Previous_Scores \

0 No 5329 73
1 No 3481 59
2 Yes 8281 91
3 Yes 9604 98
4 Yes 4225 65
... ... ... ...
6602 No 5776 76
6603 No 6561 81
6604 Yes 4225 65
6605 Yes 8281 91
6606 Yes 8836 94

Motivation_Level Internet_Access Tutoring_Sessions Family_Income

\
0 Low Yes 0 Low

1 Low Yes 2 Medium

64
2 Medium Yes 2 Medium

3 Medium Yes 1 Medium

4 Medium Yes 3 Medium

... ... ... ... ...

6602 Medium Yes 1 High

6603 Medium Yes 3 Low

6604 Low Yes 3 Low

6605 High Yes 2 Low

6606 Medium Yes 0 Medium

Teacher_Quality School_Type Peer_Influence Physical_Activity \

Learning_Disabilities Parental_Education_Level Distance_from_Home

\
0 No High School Near

1 No College Moderate

2 No Postgraduate Near

3 No High School Moderate

4 No College Near

... ... ... ...

6602 No High School Near

6603 No High School Near

65
6604 No Postgraduate Near

6605 No High School Far

6606 No Postgraduate Near

Gender Exam_Score
0 Male 67
1 Female 61
2 Male 74
3 Male 71
4 Female 70
... ... ...
6602 Female 68
6603 Female 69
6604 Female 68
6605 Female 68
6606 Male 64

[6607 rows x 20 columns]

# 2. Transform using map() method

# Let's map a new column 'Price_category' based on the 'Price' values
Age_category_map = {0: 'Low', 1: 'Medium', 2: 'High'}
df['Sleep_Hours'] = df['Previous_Scores'].map(Age_category_map)

# Display the transformed DataFrame

print("\nDataFrame after transformation:")
print(df.head())

DataFrame after transformation:

Hours_Studied Attendance Parental_Involvement Access_to_Resources
\
0 23 84 Low High

1 19 64 Low Medium

2 24 98 Medium Medium

3 29 89 Low Medium

4 19 92 Medium Medium

Extracurricular_Activities Sleep_Hours Previous_Scores

Motivation_Level \
0 No NaN 73
Low
1 No NaN 59

66
Low
2 Yes NaN 91
Medium
3 Yes NaN 98
Medium
4 Yes NaN 65
Medium

Internet_Access Tutoring_Sessions Family_Income Teacher_Quality \

0 Yes 0 Low Medium
1 Yes 2 Medium Medium
2 Yes 2 Medium Medium
3 Yes 1 Medium Medium
4 Yes 3 Medium High

School_Type Peer_Influence Physical_Activity Learning_Disabilities

\
0 Public Positive 3 No

1 Public Negative 4 No

2 Public Neutral 4 No

3 Public Negative 4 No

4 Public Neutral 4 No

Parental_Education_Level Distance_from_Home Gender Exam_Score

0 High School Near Male 67
1 College Moderate Female 61
2 Postgraduate Near Male 74
3 High School Moderate Male 71
4 College Near Female 70

c) Detect and filter outliers

# Load the CSV file into a Pandas DataFrame
# Replace 'data.csv' with the actual file path if needed
df = pd.read_csv('titanic.csv')
df

PassengerId Survived Pclass \

0 1 0 3
1 2 1 1
2 3 1 3
3 4 1 1
4 5 0 3
.. ... ... ...

67
886 887 0 2
887 888 1 1
888 889 0 3
889 890 1 1
890 891 0 3

Name Sex Age

SibSp \
0 Braund, Mr. Owen Harris male 22.0
1
1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0
1
2 Heikkinen, Miss. Laina female 26.0
0
3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0
1
4 Allen, Mr. William Henry male 35.0
0
.. ... ... ...
...
886 Montvila, Rev. Juozas male 27.0
0
887 Graham, Miss. Margaret Edith female 19.0
0
888 Johnston, Miss. Catherine Helen "Carrie" female NaN
1
889 Behr, Mr. Karl Howell male 26.0
0
890 Dooley, Mr. Patrick male 32.0
0

Parch Ticket Fare Cabin Embarked

0 0 A/5 21171 7.2500 NaN S
1 0 PC 17599 71.2833 C85 C
2 0 STON/O2. 3101282 7.9250 NaN S
3 0 113803 53.1000 C123 S
4 0 373450 8.0500 NaN S
.. ... ... ... ... ...
886 0 211536 13.0000 NaN S
887 0 112053 30.0000 B42 S
888 2 W./C. 6607 23.4500 NaN S
889 0 111369 30.0000 C148 C
890 0 370376 7.7500 NaN Q

[891 rows x 12 columns]

68
Titanic
# Display the first few rows of the DataFrame to understand the data
print("Original DataFrame:")
print(df.head())

Original DataFrame:
PassengerId Survived Pclass \
0 1 0 3
1 2 1 1
2 3 1 3
3 4 1 1
4 5 0 3

Name Sex Age

Parch Ticket Fare Cabin Embarked

0 0 A/5 21171 7.2500 NaN S
1 0 PC 17599 71.2833 C85 C
2 0 STON/O2. 3101282 7.9250 NaN S
3 0 113803 53.1000 C123 S
4 0 373450 8.0500 NaN S

# Select the column to analyze for outliers (replace 'Value' with the
actual column name)
column_name = 'Fare'
# Calculate the z-scores for the selected column
z_scores = np.abs((df[column_name] - df[column_name].mean()) /
df[column_name].std())
z_scores.head(10)

0 0.502163
1 0.786404
2 0.488580
3 0.420494
4 0.486064
5 0.477848
6 0.395591
7 0.223957

69
8 0.424018
9 0.042931
Name: Fare, dtype: float64

# Define a threshold for outliers (e.g., z-score greater than 3)

z_score_threshold = 3

# Filter the DataFrame to keep rows without outliers

filtered_df = df[z_scores <= z_score_threshold]

# Display the DataFrame after filtering outliers

print("\nDataFrame after filtering outliers:")
print(filtered_df.head())

DataFrame after filtering outliers:

PassengerId Survived Pclass \
0 1 0 3
1 2 1 1
2 3 1 3
3 4 1 1
4 5 0 3

Name Sex Age

Parch Ticket Fare Cabin Embarked

0 0 A/5 21171 7.2500 NaN S
1 0 PC 17599 71.2833 C85 C
2 0 STON/O2. 3101282 7.9250 NaN S
3 0 113803 53.1000 C123 S
4 0 373450 8.0500 NaN S

70
d) perform vectorized string operations on pandas series
# Load the CSV file into a Pandas DataFrame
df = pd.read_csv('titanic.csv')
df

PassengerId Survived Pclass \

0 1 0 3
1 2 1 1
2 3 1 3
3 4 1 1
4 5 0 3
.. ... ... ...
886 887 0 2
887 888 1 1
888 889 0 3
889 890 1 1
890 891 0 3

Name Sex Age

Parch Ticket Fare Cabin Embarked

0 0 A/5 21171 7.2500 NaN S
1 0 PC 17599 71.2833 C85 C
2 0 STON/O2. 3101282 7.9250 NaN S
3 0 113803 53.1000 C123 S
4 0 373450 8.0500 NaN S

71
.. ... ... ... ... ...
886 0 211536 13.0000 NaN S
887 0 112053 30.0000 B42 S
888 2 W./C. 6607 23.4500 NaN S
889 0 111369 30.0000 C148 C
890 0 370376 7.7500 NaN Q

[891 rows x 12 columns]

# Assuming 'Name' is the column containing strings

# Convert all names to uppercase

df['Name']= df['Sex'].str.upper()
df

PassengerId Survived Pclass Name Sex Age SibSp Parch

\
0 1 0 3 MALE male 22.0 1 0

1 2 1 1 FEMALE female 38.0 1 0

2 3 1 3 FEMALE female 26.0 0 0

3 4 1 1 FEMALE female 35.0 1 0

4 5 0 3 MALE male 35.0 0 0

.. ... ... ... ... ... ... ... ...

886 887 0 2 MALE male 27.0 0 0

887 888 1 1 FEMALE female 19.0 0 0

888 889 0 3 FEMALE female NaN 1 2

889 890 1 1 MALE male 26.0 0 0

890 891 0 3 MALE male 32.0 0 0

Ticket Fare Cabin Embarked

0 A/5 21171 7.2500 NaN S
1 PC 17599 71.2833 C85 C
2 STON/O2. 3101282 7.9250 NaN S
3 113803 53.1000 C123 S
4 373450 8.0500 NaN S
.. ... ... ... ...
886 211536 13.0000 NaN S
887 112053 30.0000 B42 S
888 W./C. 6607 23.4500 NaN S
889 111369 30.0000 C148 C

72
890 370376 7.7500 NaN Q

[891 rows x 12 columns]

# Calculate the length of each

name df['Name'] =
df['Sex'].str.len() df

PassengerId Survived Pclass Name Sex Age SibSp

Parch \
0 1 0 3 4 male 22.0 1 0

1 2 1 1 6 female 38.0 1 0

2 3 1 3 6 female 26.0 0 0

3 4 1 1 6 female 35.0 1 0

4 5 0 3 4 male 35.0 0 0

.. ... ... ... ... ... ... ... ...

886 887 0 2 4 male 27.0 0 0

887 888 1 1 6 female 19.0 0 0

888 889 0 3 6 female NaN 1 2

889 890 1 1 4 male 26.0 0 0

890 891 0 3 4 male 32.0 0 0

Ticket Fare Cabin Embarked

[891 rows x 12 columns]

# Split the names based on a delimiter (e.g., space) and create a new
column for the first part of the name

73
df['Name'] = df['Sex'].str.split(' ').str[0]
df

PassengerId Survived Pclass Name Sex Age SibSp Parch

\
0 1 0 3 male male 22.0 1 0

1 2 1 1 female female 38.0 1 0

2 3 1 3 female female 26.0 0 0

3 4 1 1 female female 35.0 1 0

4 5 0 3 male male 35.0 0 0

.. ... ... ... ... ... ... ... ...

886 887 0 2 male male 27.0 0 0

887 888 1 1 female female 19.0 0 0

888 889 0 3 female female NaN 1 2

889 890 1 1 male male 26.0 0 0

890 891 0 3 male male 32.0 0 0

Ticket Fare Cabin Embarked

[891 rows x 12 columns]

# Display the transformed DataFrame

print("DataFrame after performing vectorized string operations:")
print(df.head())

DataFrame after performing vectorized string operations:

PassengerId Survived Pclass Name Sex Age SibSp
Parch \
0 1 0 3 male male 22.0 1 0

74
1 2 1 1 female female 38.0 1 0

2 3 1 3 female female 26.0 0 0

3 4 1 1 female female 35.0 1 0

4 5 0 3 male male 35.0 0 0

Ticket Fare Cabin Embarked

0 A/5 21171 7.2500 NaN S
1 PC 17599 71.2833 C85 C
2 STON/O2. 3101282 7.9250 NaN S
3 113803 53.1000 C123 S
4 373450 8.0500 NaN S

75
Data Wrangling

0.0.1 1. Concatenate / Join / Merge/ Reshape DataFrames.

Used to concatenate two or more DataFrame objects. By setting axis=0 it concatenates vertically
(rows), and by setting axis=1 it concatenates horizontally (columns).
[3]
: import pandas as pd
df1 = pd.DataFrame({'X': ['X0', 'X1'],# Column 'A' with values 'A0',
'A1'
'Y': ['Y0', 'Y1']})# Column 'B' with values 'B0', 'B1'
# Create the second DataFrame (df2) with columns 'A' and 'B' and two rows
df2 = pd.DataFrame({'X': ['X2', 'X3'],
'Y': ['Y2', 'Y3']})
# Concatenate df1 and df2 vertically (axis=0) to stack rows
# This combines the two DataFrames by adding the rows of df2 below the rows of␣
𝗌df1

result = pd.concat([df1, df2],

axis=0) df1

[3] : X A1Y
0 X0 Y0
1 X1 Y1

[4] : df2

[4] : X Y
0 X2 Y2
1 X3 Y3

[5] : result

[5]: X A1Y Y
0 X0 Y0 NaN
1 X1 Y1 NaN
0 X2 NaN Y2
1 X3 NaN Y3

76
0.0.2 MERGE
Used to merge two data frames based on a key column, similar to SQL joins. Options include
how=’inner’, how=’outer’, how=’left’, and how=’right’ for different types of joins.
[8]
: import pandas as pd
# Create DataFrame 1
df1 = pd.DataFrame({'key': ['x', 'y', 'z'], 'value1': [1, 2, 3]})
# Create DataFrame 2
df2 = pd.DataFrame({'key': ['y', 'z', 'a'], 'value2': [4, 5, 6]})
# Merge DataFrames on 'key' column using inner join
result = pd.merge(df1, df2, on='key',
how='inner') df1

[8] : key value1

0 x 1
1 y 2
2 z 3

[9] : df2

[9] : key value2

0 y 4
1 z 5
2 a 6

[10] : result

[10] : key value1 value2

0 y 2 4
1 z 3 5
[11] : import pandas as pd
# Create DataFrame 1
# Create DataFrame 1
df1 = pd.DataFrame({'key': ['x', 'y', 'z'], 'value1': [1, 2, 3]})
# Create DataFrame 2
df2 = pd.DataFrame({'key': ['y', 'z', 'a'], 'value2': [4, 5, 6]})
# Merge DataFrames on 'key' column using outer join
result = pd.merge(df1, df2, on="key", how='outer')
[11] : df1
key value1
0 x 1
1 y 2
2 z 3

[12] : df2

77
[12] : key value2
0 y 4
1 z 5
2 a 6

[13] : result

[13]: key value1 value2

0 x 1.0 NaN
1 y 2.0 4.0
2 z 3.0 5.0
3 a NaN 6.0
0.0.3 JOIN
A join is a way to combine data from two or more tables (or DataFrames) based on a common
column, known as the join key.
[18]
: df1 = pd.DataFrame({"x": ["x0", "x1", "x2"], "y": ["y0", "y1", "y2"]},
index=["j0", "j1", "j2"]) # Create DataFrame 2
df2 = pd.DataFrame({"z": ["z0", "z2", "z3"], "a": ["a0", "a2", "a3"]},
index=["K0", "K2", "K3"])
# Print DataFrame 1
print(df1)
# Print DataFrame 2
print(df2)
# Join DataFrames 1 and 2 on index (default)
df3 = df1.join(df2)
print(df3)

x y
j0 x0 y0
j1 x1 y1
j2 x2 y2
z a
K0 z0 a0
K2 z2 a2
K3 z3 a3
x y z a
j0 x0 y0 NaN NaN
j1 x1 y1 NaN NaN
j2 x2 y2 NaN NaN

0.0.4 INNER JOIN

Returns rows with matching keys in both DataFrames.

78
[21]: #inner join
# Create DataFrame 1
df1 = pd.DataFrame({"x": ["x0", "x1", "x2"], "y": ["y0", "y1", "y2"]},
index=["j0", "j1", "j2"]) # Create DataFrame 2
df2 = pd.DataFrame({"x": ["x0", "x1", "x3"],"z": ["z0", "z2",
"z3"], "a": ["a0", "a2", "a3"]},
index=["K0", "K2", "K3"])
df4 = df1.merge(df2,on="x", how='inner')
print(df4)
x y z a
0 x0 y0 z0
a0
1 x1 y1 z2
a2
[22]
: 0.0.5 FULL OUTER JOIN
Returns all rows from both DataFrames.
# full outer join
df5 = df1.merge(df2,on="x", how='outer')
print(df5)
x y z a
0 x0 y0 z0 a0
1 x1 y1 z2 a2
2 x2 y2 NaN NaN
3 x3 NaN z3 a3

0.0.6 LEFT OUTER JOIN

Returns all rows from the left DataFrame and matching rows from the right DataFrame.

0.0.7 RIGHT OUTER JOIN

Returns all rows from the right DataFrame and matching rows from the left DataFrame.
[25] #right outer join
: df7 = df1.merge(df2,on="x",how='right')
print(df7)

x y z a
0 x0 y0 z0 a0
1 x1 y1 z2 a2
2 x3 NaN z3 a3

0.0.8 RESHAPE
Reshaping functions like pivot and melt are used to transform the layout of data frames.

79
[30]: import pandas as pd
# Create Series 1
s1 = pd.Series([0, 1, 2, 3], index=['a', 'b', 'c', 'd'])
# Create Series 2
s2 = pd.Series([4, 5, 6], index=['c', 'd', 'e'])
# Concatenate Series into DataFrame
df = pd.concat([s1, s2], keys=['one', 'two'])
print(df)
one a 0
b 1
c 2
d 3
two c 4
d 5
e 6

dtype: int64
[31]
: print(df.unstack())

a b c d
e one 0.01.0 2.0 3.0
NaN
[ ] two NaN NaN 4.0 5.0 6.0
:

80
81
84
85
86
87
88
89
90
91
92
93

Numpy Numerical Python - Unit3
No ratings yet
Numpy Numerical Python - Unit3
69 pages
Numpy 1721963082
No ratings yet
Numpy 1721963082
68 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Datasciencepythonlab
No ratings yet
Datasciencepythonlab
77 pages
NumPy: From Basic To Advance
No ratings yet
NumPy: From Basic To Advance
119 pages
Module Numpy
No ratings yet
Module Numpy
67 pages
python-notes-BCC-302 (Unit - 05)
No ratings yet
python-notes-BCC-302 (Unit - 05)
25 pages
200 Acmlfo 10 en M01SG
No ratings yet
200 Acmlfo 10 en M01SG
24 pages
Numpy Merged
No ratings yet
Numpy Merged
93 pages
Session 13 Numpy Fundamentals
No ratings yet
Session 13 Numpy Fundamentals
14 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Python Presentation 3
No ratings yet
Python Presentation 3
44 pages
Numpy
No ratings yet
Numpy
64 pages
Numpy
No ratings yet
Numpy
24 pages
Numpy
No ratings yet
Numpy
51 pages
3rd Unit
100% (1)
3rd Unit
75 pages
Numpy
No ratings yet
Numpy
18 pages
Numpy
No ratings yet
Numpy
8 pages
Numpy Complete Notes
No ratings yet
Numpy Complete Notes
68 pages
Numpy Tutorial
No ratings yet
Numpy Tutorial
19 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Numpy (Numerical Python)
No ratings yet
Numpy (Numerical Python)
80 pages
Numpy Basics
No ratings yet
Numpy Basics
66 pages
Ids 6 Experiments
No ratings yet
Ids 6 Experiments
27 pages
Pyroflex 7081 F Black
No ratings yet
Pyroflex 7081 F Black
1 page
Introduction To Numerical Computing With Numpy Manual
No ratings yet
Introduction To Numerical Computing With Numpy Manual
34 pages
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
Num Py
No ratings yet
Num Py
21 pages
Numerical Python Numpy
No ratings yet
Numerical Python Numpy
28 pages
NumPy Basics
No ratings yet
NumPy Basics
23 pages
Numpy - Basics
No ratings yet
Numpy - Basics
18 pages
Jovia Report
No ratings yet
Jovia Report
18 pages
Num Py
No ratings yet
Num Py
13 pages
NUMPY
No ratings yet
NUMPY
8 pages
Computer Assignment 1
No ratings yet
Computer Assignment 1
38 pages
Numpy Part 1
No ratings yet
Numpy Part 1
33 pages
Swarang Raut EDVA Experiment 1 Numpy Pandas
No ratings yet
Swarang Raut EDVA Experiment 1 Numpy Pandas
58 pages
Lets Begin With Numpy
No ratings yet
Lets Begin With Numpy
16 pages
UNIT-03 Numpy
No ratings yet
UNIT-03 Numpy
49 pages
Unintentional Injury
No ratings yet
Unintentional Injury
48 pages
Unit 4 Numpy
No ratings yet
Unit 4 Numpy
14 pages
15 Numpy
No ratings yet
15 Numpy
32 pages
Unit Iii Using Numpy
No ratings yet
Unit Iii Using Numpy
23 pages
Numpy Tutorial by Expertized Guy
No ratings yet
Numpy Tutorial by Expertized Guy
12 pages
Day 3.numpy - Complete - Guide
No ratings yet
Day 3.numpy - Complete - Guide
17 pages
2 - Numpy - Tutorial - Ipynb - Colaboratory
No ratings yet
2 - Numpy - Tutorial - Ipynb - Colaboratory
10 pages
M.Tech. (AR16) Regulations Curriculum&Syllabi PDF
No ratings yet
M.Tech. (AR16) Regulations Curriculum&Syllabi PDF
198 pages
L2. Numpy
No ratings yet
L2. Numpy
24 pages
Reflective Essay On Death
100% (2)
Reflective Essay On Death
8 pages
مؤشرات التنمية المستدامة على المدن المصرية
No ratings yet
مؤشرات التنمية المستدامة على المدن المصرية
7 pages
Gorsuch MC Pherson Revised IEScale
No ratings yet
Gorsuch MC Pherson Revised IEScale
8 pages
Unit 1
No ratings yet
Unit 1
170 pages
Project of Sea Models of Transport
No ratings yet
Project of Sea Models of Transport
8 pages
Unit 4 Final
No ratings yet
Unit 4 Final
100 pages
McClellands Social Motivations
No ratings yet
McClellands Social Motivations
6 pages
Experiment 3
No ratings yet
Experiment 3
3 pages
Lab 1 - Introduction
No ratings yet
Lab 1 - Introduction
14 pages
Posisi Stek Tangga Dan Revisi Denah Sloof GA - 11-12-2024
No ratings yet
Posisi Stek Tangga Dan Revisi Denah Sloof GA - 11-12-2024
2 pages
NumPy 2
No ratings yet
NumPy 2
11 pages
Geog GR 10 Map Work Scope 2024 Term 1
No ratings yet
Geog GR 10 Map Work Scope 2024 Term 1
3 pages
Pythonmodules
No ratings yet
Pythonmodules
6 pages
Num Py
No ratings yet
Num Py
31 pages
Latest Home Loan History of Interest Rate31032018
No ratings yet
Latest Home Loan History of Interest Rate31032018
8 pages
5 Differentiate Between Formal and Informal Planning Formal Planning Is An - Course Hero
No ratings yet
5 Differentiate Between Formal and Informal Planning Formal Planning Is An - Course Hero
20 pages
Nusantara Innovation Journal Template
No ratings yet
Nusantara Innovation Journal Template
3 pages
Numpy
No ratings yet
Numpy
27 pages
Financial Markets: Hot Topic of The Economy
No ratings yet
Financial Markets: Hot Topic of The Economy
8 pages
ĐỀ THI HUNG VƯƠNG ANH 10
No ratings yet
ĐỀ THI HUNG VƯƠNG ANH 10
16 pages
5-Detail Guide-Two Point Perspective
No ratings yet
5-Detail Guide-Two Point Perspective
2 pages
Basic Array Creation and Operations
No ratings yet
Basic Array Creation and Operations
27 pages
Numpy
No ratings yet
Numpy
14 pages
Unit 3
No ratings yet
Unit 3
42 pages
Palmer, A Fashion Theory Vol 12 'Reviewing Fashion Exhibitions'
No ratings yet
Palmer, A Fashion Theory Vol 12 'Reviewing Fashion Exhibitions'
6 pages
Csc-4101Artificial Intelligence: Course Information Sheet
No ratings yet
Csc-4101Artificial Intelligence: Course Information Sheet
3 pages
Num Py
No ratings yet
Num Py
15 pages
Numpy, Pandas and Matplotlib
No ratings yet
Numpy, Pandas and Matplotlib
60 pages
Exp 12345
No ratings yet
Exp 12345
15 pages
Contractura 11 PDF
No ratings yet
Contractura 11 PDF
2 pages
Mds1111 Merged Numbered
No ratings yet
Mds1111 Merged Numbered
41 pages
GProg Python 6-Print
No ratings yet
GProg Python 6-Print
14 pages
Shreesha Tantry - CV
No ratings yet
Shreesha Tantry - CV
3 pages
Junior WAEC Past Question - Computer
100% (1)
Junior WAEC Past Question - Computer
16 pages
Tentative NumPy Tutorial
No ratings yet
Tentative NumPy Tutorial
30 pages
pg3 Foundation Plan
No ratings yet
pg3 Foundation Plan
1 page
Financial Reporting Subject Outline 2
No ratings yet
Financial Reporting Subject Outline 2
6 pages
Customer Relationship Management (CRM) Is A Management
No ratings yet
Customer Relationship Management (CRM) Is A Management
59 pages
NumPy Notes
No ratings yet
NumPy Notes
13 pages
A Tempora Mutantur Adventure Location: Ruin Description
No ratings yet
A Tempora Mutantur Adventure Location: Ruin Description
2 pages
The Phantom Airmanrter
100% (1)
The Phantom Airmanrter
29 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.