0% found this document useful (0 votes)
3 views

Unit 3

NumPy is a Python library designed for efficient array manipulation, providing an N-dimensional array object called ndarray that supports fast mathematical operations. It allows for easy creation of arrays, manipulation of data types, and supports various operations like indexing, slicing, and Boolean indexing. The library is optimized for performance, making it essential for data science applications where speed and resource management are critical.

Uploaded by

Barun Shrestha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Unit 3

NumPy is a Python library designed for efficient array manipulation, providing an N-dimensional array object called ndarray that supports fast mathematical operations. It allows for easy creation of arrays, manipulation of data types, and supports various operations like indexing, slicing, and Boolean indexing. The library is optimized for performance, making it essential for data science applications where speed and resource management are critical.

Uploaded by

Barun Shrestha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 42

Unit

NumP Librari
y es

NumPy stands for Numerical Python. It is a Python library used for working with arrays.
It also has functions for working in domain of linear algebra, Fourier transform, and
matrices. In Python we have lists that serve the purpose of arrays, but they are slow to
process. NumPy aims to provide an array object that is much faster than traditional
Python lists. The array object in NumPy is called ndarray, it provides a lot of supporting
functions that make working with ndarray very easy. Arrays are very frequently used in
data science, where speed and resources are very important. NumPy arrays are stored at
one continuous place in memory unlike lists, so processes can access and manipulate
them very efficiently. This behavior is called locality of reference in computer science.
This is the main reason why NumPy is faster than lists. Also it is optimized to work with
latest CPU architectures using the concept of vectorized processing.

The NumPy ndarray: A Multidimensional Array


Object
One of the key features of NumPy is its N-dimensional array object, or ndarray, which is
a fast, flexible container for large data sets in Python. Arrays enable you to perform
mathematical operations on whole blocks of data using similar syntax to the equivalent
operations between scalar elements.

import numpy as np
#creating and displaying array
data=[[1,2,6],[3,5,9]]
data=np.array(data)
print("Array Data")
print(data)

Every array has a shape, a tuple indicating the size of each dimension, and a dtype, an
object describing the data type of the array.

#displaying array shape


print(data.shape)

#displaying datatype of array elemnts


print(data.dtype)

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
Creating ndarrays

The easiest way to create an array is to use the array function. This accepts any
sequence like object (including other arrays) and produces a new NumPy array
containing the passed data.

data=(2,6,9)
data=np.array(data)
print("Array Data")
print(data)

Nested sequences, like a list of equal-length lists, will be converted into a


multidimensional array:

import numpy as np
#creating and displaying array
data=[[1,2,6],[3,5,9]]
data=np.array(data)
print("Array Data")
print(data)

NumPy arrays has many attributes. ndim is the attribute that represents the number of
dimensions (axes) of the ndarray.

#displaying dimension
print(data.ndim)

Unless explicitly specified, np.array tries to infer a good data type for the array that it
creates. The data type is stored in a special dtype object.

data=[2.4,3.9,-1.2]
data=np.array(data)
#print data type of array elemnts
print(data.dtype)

We can also specify data type of array elements explicitly while creating ndarrays.

data=np.array([1,3,5,8],dtype='int64')
print(data)

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
In addition to np.array, there are a number of other functions for creating new arrays.
As examples, zeros and ones create arrays of 0’s or 1’s, respectively, with a given
length or shape. empty creates an array without initializing its values to any particular
value. To create a higher dimensional array with these methods, pass a tuple for the
shape.

#create array of length 5


data=np.zeros(5)
print(data)

#create array of length 5


data=np.ones(5)
print(data)

#create array of shape(3,3)


data=np.zeros((3,3))
print(data)

#create array of shape(3,3)


data=np.ones((3,3))
print(data)

#create array of length 10


data=np.empty(10)
print(data)

#create array of shape(2,3)


data=np.empty((2,3))
print(data)

The numpy.arange() function is used to generate an array with evenly spaced values
within a specified interval. The function returns a one-dimensional array of type
numpy.ndarray.

Syntax: numpy.arange([start, ]stop, [step, ], dtype=None)

#create array with elements 0-9


data=np.arange(10)
print(data)

#create array with elements 5-


9 data=np.arange(5,10)
print(data)

#create array with elements 1-9

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
data=np.arange(1,10,2)
print(data)

A list of standard array creation functions is given below.

 array: Convert input data (list, tuple, array, or other sequence type) to an
ndarray either by inferring a dtype or explicitly specifying a dtype. Copies the
input data by default.
 asarray: Convert input to ndarray, but do not copy if the input is already an
ndarray.
 arrange: used to generate an array with evenly spaced values within a
specified interval.
 ones, ones_like: Produce an array of all 1’s with the given shape and dtype.
ones_like takes another array and produces a ones array of the same shape and
dtype.

#create array with elements 0-9


data=np.arange(10)
print(data)

d=np.ones_like(data)
print(d)

 zeros, zeros_like: Like ones and ones_like but producing arrays of 0’s instead.

#create array with elements 0-9


data=np.arange(10)
print(data)

d=np.zeros_like(data)
print(d)
 empty, empty_like: Create new arrays by allocating new memory, but do
not populate with any values like ones and zeros.
 eye, identity: Create a square N x N identity matrix (1’s on the diagonal and
0’s elsewhere)

#creates identity matrix of 3x3


data=np.eye(3)
print(data)

#creates identity matrix of 4x


data=np.identity(4)
print(data)

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
Data Types for ndarrays

The data type or dtype is a special object containing the information the ndarray needs
to interpret a chunk of memory as a particular type of data.

data=np.array([1,3,5,8],dtype='int64')
print(data)

The numerical dtypes are named in the format: a type name, like float or int, followed
by a number indicating the number of bits per element. We can explicitly convert or
cast an array from one dtype to another using ndarray’s astype method.

data=np.array([1,3,5,8],dtype='int64')
print(data)

data=data.astype('float64')
print(data)

data=data.astype(np.int32)
print(data)

If we have an array of strings representing numbers, we can use astype to convert them
to numeric form.

data=np.array(['2.5','3.7','9.1'],dtype=np.string_)
print(data)
data=data.astype('float64')
print(data)
print(data.dtype)

If casting was failed for some reason (like a string that cannot be converted to float64), a
TypeError will be raised.

data=np.array(['2.5','3.7','9.1f'],dtype=np.string_)
print(data)
data=data.astype('float64') #Error
print(data)
print(data.dtype)

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
Operations between Arrays and Scalars
Arrays are important because they enable you to express batch operations on data
without writing any for loops. This is usually called Vectorization. Any arithmetic
operations between equal-size arrays applies the operation element-wise.

import numpy as np
a = np.array([[1., 2., 3.], [4., 5., 6.]])
print(a)
r=a*a
print("Element-wise multiplication of arrays:")
print(r)
r=a+a
print("Sum of arrays:")
print(r)

Arithmetic operations with scalars is propagated to the value to each element in the
NumPy array.

import numpy as np
a = np.array([[1., 2., 3.], [4., 5., 6.]])
print(a)
r=a/2
print("Half of array elements:")
print(r)
r=a**0.5
print("Square root of array elements:")
print(r)

Basic Indexing and Slicing


There are many ways to select a subset of data or individual elements stored in NumPy
arrays. One-dimensional arrays are simple; on the surface they act similarly to Python
lists.

import numpy as np
a = np.arange(10)
print("Array Elements:")
print(a)
print("Element at index 3")
print(a[3])
print("Element from index 3-6")
print(a[3:7])

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
a[2]=10 #modifying element at index 2
a[4:6]=11 #modifying element from index 4 to 5
print("Array Elements:")
print(a)

If we assign a scalar value to a slice, as in a[4:6] = 11, the value is propagated (or
broadcasted henceforth) to the entire selection. An important first distinction from lists is
that array slices are views on the original array. This means that the data is not copied,
and any modifications to the view will be reflected in the source array as demonstrated
below.

import numpy as np
a = np.array([1,2,3,4,5,6,7,8,9])
aslice=a[3:7]
aslice[1]=15 #modification will be reflected in original array
print("Array Elements:")
print(a)

a = [1,2,3,4,5,6,7,8,9]
aslice=a[3:7]
aslice[1]=15 #modification will not be reflected in original list
print("List Elements:")
print(a)

As NumPy has been designed with large data use cases in mind, we could imagine
performance and memory problems if NumPy copies data instead of creating views. We
want a copy of a slice of an ndarray instead of a view, you will need to explicitly copy the
array; for example arr[5:8].copy().

With higher dimensional arrays, we have many more options. In a two-dimensional


array, the elements at each index are no longer scalars but rather one-dimensional arrays.
Thus, individual elements can be accessed recursively. We can also pass a comma-
separated list of indices to select individual elements. So these are equivalent.

import numpy as np
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("Array Element at index 2")
print(a[2])
print("Array Element at index 1,2")
print(a[1][2])
print(a[1,2])#Equivalent to a[1][2]

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
In multidimensional arrays, if you omit later indices, the returned object will be a
lower dimensional ndarray consisting of all the data along the higher dimensions. As
demonstrated in 2 × 2 × 3 array.

import numpy as np
a = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
print("Array Elements")
print(a)
print("Array Element at index 1")
print(a[1])
print("Array Element at index 1,1")
print(a[1,1])
print("Array Element at index 1,1,2")
print(a[1,1,2])

Indexing with slices


Like one-dimensional objects such as Python lists, ndarrays can be sliced using the similar
syntax. Higher dimensional objects give you more options as you can slice one or more
axes and also mix integers. Note that a colon by itself means to take the entire axis, so
we can slice only higher dimensional axes.

import numpy as np
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("Array Elements:")
print(a)
print("First Two Rows")
print(a[0:2])# Or print(a[:2])
print("First Two Columns of array")
print(a[:,0:2])
print("2x2 slice in top-left corner")
print(a[0:2,0:2])

Like 1D array we can take array slices and update it which is reflected in the original
array.

import numpy as np
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
aslice=a[0:2,0:2]
aslice[:,:]=0
print("Array Elements")
print(a)

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
Boolean Indexing
In NumPy, Boolean indexing allows us to filter elements from an array based on a specific
condition. We use Boolean masks to specify the condition. Boolean mask is a NumPy
array containing truth values (True/False) that correspond to each element in the array.
Suppose we have an array named ‘a’.

a = np.array([12, 24, 16, 21, 32, 29, 7, 15])

We can create a mask that selects all elements of a that are greater than

20. boolean_mask = a> 20

Above statement creates a Boolean mask that evaluates to True for elements that are
greater than 20, and False for elements that are less than or equal to 20. The resulting
mask is an array stored in the boolean_mask variable as below.

[False, True, False, True, True, True, False, False]

Boolean Indexing allows us to create a filtered subset of an array by passing a Boolean


mask as an index. The boolean_mask selects only those elements in the array that have a
True value at the corresponding index position as demonstrated below.

import numpy as np
a = np.array([12, 24, 16, 21, 32, 29, 7, 15])
boolean_mask = a > 20
print(boolean_mask)
print(a[boolean_mask])
a[boolean_mask]=0#sets all elements greater than 20 to zero
print(a)

Fancy Indexing
In NumPy, fancy indexing allows us to use an array of indices to access multiple array
elements at once. Fancy indexing can perform more advanced and efficient array
operations, including conditional filtering, sorting, and so on.

Example

import numpy as np
a = np.array([1, 2, 3, 4, 5, 6, 7, 8])

# select a single element


simple_indexing = a[3]

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
print("Simple Indexing:",simple_indexing) # 4

# select multiple elements


fancy_indexing = a[[1, 2, 5, 7]]
print("Fancy Indexing:",fancy_indexing) # [2 3 6 8]

#Returns array of indices of sorted array in ascending order


print("Indicies of Sorted Data:",np.argsort(a))

# sort a using fancy indexing


sorted_array = a[np.argsort(a)]
print("Sorted Data:",sorted_array)

#Sorting is descending order


sorted_array = a[np.argsort(-a)]
print("Reverse Sorted Data",sorted_array)

We can also use fancy indexing on multi-dimensional arrays. Concept of fancy indexing
is also same in multi-dimensional arrays.

import numpy as np a=np.array([[1,3,6],


[2,7,1],[1,9,4]])
ind=[0,2]
print(a[ind])#prints row 0 and row 2

Universal Functions: Fast Element-wise Array


Functions
A universal function is a function that performs element-wise operations on data in
ndarray. We can think of them as fast vectorized wrappers for simple functions that take
one or more scalar values and produce one or more scalar results.

Example

import numpy as np
a = np.arange(10)
print("Dataset:",a)
s=np.sqrt(a)#unary universal function
print("Square Roots:",s)
e=np.exp(a)
print("Exp(a):",e)
x=np.random.randn(10)
y=np.random.randn(10)

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
z=np.maximum(x,y)#bunary universal function
print("x=",x)
print("y=",y)
print("z=",z)

m=np.max(x)
print("Maximum=",m)

List of Unary Universal Functions

List of Binary Universal Functions

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
Sort 2d array using custom datatype
import numpy as np

# Define a custom data type with fields 'name', 'age', and 'height'
user_dtype = np.dtype([
('name', 'U10'), # Unicode string of maximum length 10
('age', 'i4'), # 4-byte (32-bit) integer
('height', 'f4') # 4-byte (32-bit) float
])

# Create an array with the custom data type


data = np.array([
('Shyam', 25, 5.5),
('Ram', 30, 6.0),
('Alice', 22, 5.8)
], dtype=user_dtype)

# Sort the array by the 'name' field


sorted_data = np.sort(data, order='name')

print("Original array:\n", data)


print("Sorted array by names:\n", sorted_data)

Data Processing With Arrays


Example

points = np.arange(-5, 5, 0.01)


#print(points)
xs, ys = np.meshgrid(points, points)
import matplotlib.pyplot as plt
z = np.sqrt(xs ** 2 + ys ** 2)
print(z.shape)
plt.imshow(z, cmap=plt.cm.gray)
plt.colorbar()
plt.title("Image plot of a grid of values")

Array Functions

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
Note: Write down programs to demonstrate each of the above
methods

File Input and Output with Arrays


NumPy is able to save and load data to and from disk either in text or binary format.
Storing Arrays on Disk in Binary Format np.save and np.load are the two workhorse

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
functions for efficiently saving and loading array data on disk. Arrays are saved by
default in an uncompressed raw binary format with file extension .npy. If the file path
does not already end in .npy, the extension will be appended. The array on disk can
then be loaded using np.load. We can save multiple arrays in a zip archive using
np.savez and passing the arrays as keyword arguments. When loading an .npz file, we
get back a dictionary-like object which loads the individual arrays.

import numpy as np
a = np.arange(10)
print("a=",a)
np.save('some_array', a)
b=np.load('some_array.npy')
print("b=",b)
c = np.arange(20)
print("c=",c)
np.savez('array_archive.npz', x=a, y=c)
arch = np.load('array_archive.npz')
print("Arrays in Archive:")
for k in arch:
print(arch[k])

Saving and Loading Text Files


Loading text from files is a fairly standard task. We will focus mainly on the read_csv
and read_table functions in pandas. Sometimes it is useful to load data into vanilla
NumPy arrays using np.loadtxt or the more specialized np.genfromtxt. These functions
have many options allowing us to specify different delimiters, converter functions for
certain columns, skipping rows, and other things. Take a simple case of a comma-
separated file as demonstrated in the example below. np.savetxt performs the inverse
operation: writing an array to a delimited text file.

Example

import numpy as np
a = np.loadtxt('/content/drive/My Drive/test.txt', delimiter=',')
print(a)
np.savetxt('/content/drive/My Drive/test1.txt', a)
print("File is saved")

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
The genfromtxt() function is used to load data in a program from a text file. It takes
multiple argument values to clean the data of the text file. It also has the ability to deal
with missing or null values through the processes of filtering, removing, and replacing.

import numpy as np
# invoking genfromtxt method to read employee.txt file
content = np.genfromtxt("/content/drive/My Drive/test.txt", dtype=str,
encoding = None, delimiter=",")
# print file data on console
print("File data:", content)

Linear Algebra
Linear algebra, like matrix multiplication, decompositions, determinants, and other
square matrix math, is an important part of any array library. Unlike some languages
like MATLAB, multiplying two two-dimensional arrays with * is an element-wise
product instead of a matrix dot product. Numpy.linalg has a standard set of matrix
decompositions and things like inverse and determinant. Commonly-used
numpy.linalg functions are listed below.

#Matrix multiplication
import numpy as np
x = np.array([[1, 2, 3], [4, 5, 6]])
y = np.array([[6, 23], [-1, 7], [8, 9]])
z=x.dot(y)
print(z)
r=np.dot(x, y)#equivalent to x.dot(y)
print(r)

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
#solving system of linear equations, finding determinant and inverse
#2x+3y-z=5
#x+3y-z=4
#3x-y+2z=7
import numpy as np
from numpy.linalg import inv, solve,det
a = np.array([[2,3,-1],[1,3,-1],[3,-1,2]])
b=np.array([5,4,7])
s=solve(a,b)
print(s)
d=det(a)
print("determinant of a=",d)
b=inv(a)
print("Inverse of a=",b)

Array concatenate and stack


Stacking is same as concatenation, the only difference is that stacking is done along a
new axis.

We pass a sequence of arrays that we want to join to the stack() method along with the
axis. If axis is not explicitly passed it is taken as 0
import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.concatenate((arr1, arr2))
print(arr)

import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.stack((arr1, arr2))
print(arr)

Random Number Generation


The numpy.random module supplements the built-in Python random with functions
for efficiently generating whole arrays of sample values from many kinds of probability
distributions. For example, you can get a 4 by 4 array of samples from the standard
normal distribution using normal. See table given below for a partial list of functions
available in numpy.random.

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
import numpy as np
np.random.seed(100)
d=np.random.randint(0,10)
print("d=",d)
samples = np.random.normal(size=(4, 4))
print(samples)
d=np.random.permutation([1,2,3])
print("d=",d)

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
l=[1,2,3,4,5]
d=np.random.shuffle(l)
print("Shuffled List=",l)

Array Broadcasting
In NumPy, we can perform mathematical operations on arrays of different shapes. An
array with a smaller shape is expanded to match the shape of a larger one. This is
called broadcasting.
import numpy as np
a=np.array([1,10,3]) #size=3
b=np.array([[5],[10],[15]]) #size=3x1 so compatible for broadcasting
c=a+b
print(c) #output is 3x3 array

output:
[[ 6 15 8]
[11 20 13]
[16 25 18]]

Common NumPy String Functions


Here are some of the string functions provided by NumPy:

Functions Descriptions
add() concatenates two strings
multiply() repeats a string for a specified number of times
capitalize() capitalizes the first letter of a string
lower() converts all uppercase characters in a string to lowercase
upper() converts all lowercase characters in a string to uppercase
join() joins a sequence of strings
equal() checks if two strings are equal or not

Introduction to pandas Data Structures


Series and DataFrame are two widely used data structures of Pandas. While they are
not a universal solution for every problem, they provide a solid, easy-to-use basis for
most applications.

Series

A Series is a one-dimensional array-like object containing an array of data and an


associated array of data labels, called its index.

Example

import pandas as pd
Edited By:import
Dipak Dahal
numpy as np Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
obj = pd.Series([4, 7, -5, 3]) #series data structure
print(obj.values) #displaying values in the data structure
DataFrame

A DataFrame represents a tabular, spreadsheet-like data structure containing an ordered


collection of columns, each of which can be a different value type (numeric, string,
boolean, etc.). The DataFrame has both a row and column index; it can be thought of as a
dict of Series. The data is stored as one or more two-dimensional blocks rather than a list,
dict, or some other collection of one-dimensional arrays.

import pandas as pd
data = {'State': ['Bagmati', 'Koshi', 'Karnali', 'Lumbini', 'Gandaki'],
'Year': [2000, 2001, 2002, 2001, 2002]}
frame1 = pd.DataFrame(data)#creating dataframe
print(frame1)
frame2 = pd.DataFrame(data,columns=["State","Year","Debt"])
print(frame2)#creating data frame
print(frame2["State"])#displaying column State
obj=pd.Series([2,5,3,3,4])
frame2["Debt"]=obj
print(frame2)#displaying data frame
print(frame2.values)#displaying in 2D array format

Index Objects

Pandas’s Index objects are responsible for holding the axis labels and other metadata (like
the axis names). Any array or other sequence of labels used when constructing a Series
or DataFrame is internally converted to an Index. Index objects are immutable and thus
can’t be modified by the user.

import pandas as pd
s= pd.Series(range(3), index=[1, 2, 3])
print(s)
print(s.index)
print(pd.Int64Index(s))
#s.index[1]='d'# index is immutable

Essential Functionalities
This section discusses fundamental mechanics of interacting with the data contained in a
Series or DataFrame.

Reindexing

A critical method on panda’s objects is reindex, which means to create a new object
with the data conformed to a new index. Calling reindex on this Series rearranges the
data according to the new index, introducing missing values if any index values were
Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
not already present.

import pandas as pd

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
obj = pd.Series([4.5, 7.2, -5.3, 3.6], index=['d', 'b', 'a', 'c'])
print(obj)
obj1 = obj.reindex(['a', 'b', 'c', 'd', 'e'])
print(obj1)
obj2=obj.reindex(['a', 'b', 'c', 'd', 'e'], fill_value=0)
print(obj2)

For ordered data like time series, it may be desirable to do some interpolation or filling
of values when reindexing. The method option allows us to do this, using a method
such as ffill which forward fills the values.

import pandas as pd
obj = pd.Series(['blue', 'purple', 'yellow'], index=[0, 2, 4])
print(obj)
obj1=obj.reindex(range(6), method='ffill')
print(obj1)
obj2=obj.reindex(range(6), method='bfill')
print(obj2)

Dropping Entries From an Axis

Dropping one or more entries from an axis is easy if you have an index array or list
without those entries. As that can require a bit of munging and set logic, the drop
method will return a new object with the indicated value or values deleted from an
axis.

import pandas as pd
import numpy as np
obj = pd.Series(np.arange(5), index=['a', 'b', 'c', 'd', 'e'])
print(obj)
obj1 = obj.drop('c')
print(obj1)
obj2 = obj.drop(['b','d'])
print(obj2)

With DataFrame, index values can be deleted from either axis:

import pandas as pd

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
import numpy as np
data = pd.DataFrame(np.arange(16).reshape((4, 4)),index=['c1', 'c2', 'c3',
'c4'],
columns=['r1', 'r2', 'r3', 'r4'])
print(data)
d=data.drop('c2')
print(d)
d=data.drop('r2',axis=1)
print(d)

Indexing, Selection, and Filtering

Series indexing works analogously to NumPy array indexing, except we can use the
Steris’s index values instead of only integers.

import pandas as pd
import numpy as np
obj = pd.Series(np.arange(4), index=['a', 'b', 'c', 'd'])
print(obj[2]) #same as obj(['c'])
print(obj['c'])
print(obj[1:3])
print(obj[['b','c','d']])

Slicing with labels behaves differently than normal Python slicing in that the endpoint is
inclusive and setting using these methods works just as we would expect.

import pandas as pd
import numpy as np
obj = pd.Series(np.arange(4), index=['a', 'b', 'c', 'd'])
print(obj[2]) #same as obj(['c'])
print(obj['c'])
print(obj[1:3])
print(obj['b':'d'])
obj['b':'c'] = 5
print(obj)

Indexing into a DataFrame is for retrieving one or more columns either with a single
value or sequence:

import pandas as pd
import numpy as np
data = pd.DataFrame(np.arange(16).reshape((4, 4)),index=['r1', 'r2', 'r3',
'r4'],

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
columns=['c1', 'c2', 'c3', 'c4'])
print(data['c1'])
print(data[['c1','c3']])
print(data[:2])
print(data[data['c3'] > 5])

Another use case is in indexing with a Boolean DataFrame, such as one produced by a
scalar comparison.

import pandas as pd
import numpy as np
data = pd.DataFrame(np.arange(16).reshape((4, 4)),index=['r1', 'r2', 'r3',
'r4'],
columns=['c1', 'c2', 'c3', 'c4'])
print(data < 5)
data[data < 5] = 0
print(data)

Arithmetic and Data Alignment

One of the most important pandas features is the behavior of arithmetic between objects
with different indexes. When adding together objects, if any index pairs are not the same,
the respective index in the result will be the union of the index pairs.

import pandas as pd
s1 = pd.Series([7.3, -2.5, 3.4, 1.5], index=['a', 'c', 'd', 'e'])
s2 = pd.Series([-2.1, 3.6, -1.5, 4, 3.1], index=['a', 'c', 'e', 'f', 'g'])
s3=s1+s1
print(s3)
s3=s1+s2
print(s3)

In the case of DataFrame, alignment is performed on both the rows and the columns:

import pandas as pd
df1 = pd.DataFrame(np.arange(9).reshape((3, 3)), columns=list('bcd'),
index=['1', '2', '3'])
df2 = pd.DataFrame(np.arange(12).reshape((4, 3)), columns=list('bde'),
index=['1', '2', '3', '4'])
print(df1)
print(df2)
df=df1+df1
print(df)

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
df=df1+df2
print(df)

Arithmetic Methods with Fill Values

In arithmetic operations between differently-indexed objects, we might want to fill


with a special value, like 0, when an axis label is found in one object but not the other.

import pandas as pd
df1 = pd.DataFrame(np.arange(12).reshape((3, 4)), columns=list('abcd'),
index=['1', '2', '3'])
df2 = pd.DataFrame(np.arange(20).reshape((4, 5)), columns=list('abcde'),
index=['1', '2', '3', '4'])
df=df1.add(df2, fill_value=0)
print(df)
df=df1.sub(df2, fill_value=0)
print(df)
df=df1.mul(df2, fill_value=0)
print(df)
df=df1.div(df2, fill_value=0)
print(df)

Operations between DataFrame and Series

As with NumPy arrays, arithmetic between DataFrame and Series is well-defined. In such
case, operation is performed by using the concept of broadcasting.

import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(12).reshape((3, 4)))
s=pd.Series([2,4,5,7])
df1=df+s
print(df)
print(s)
print(df1)

By default, arithmetic between DataFrame and Series matches the index of the Series on
the Data Frame’s columns, broadcasting down the rows. If an index value is not found in
either the DataFrame’s columns or the Series’s index, the objects will be reindexed to form
the union.

Function application and mapping

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
NumPy universal functions work fine with pandas objects.

import numpy as np
import pandas as pd
frame = pd.DataFrame(np.random.randn(4, 3),
columns=list('bde'),index=['r1', 'r2', 'r3', 'r4'])
print(frame)
frame=np.abs(frame)
print(frame)

Another frequent operation is applying a function on 1D arrays to each column or row.

import numpy as np
import pandas as pd
frame = pd.DataFrame(np.random.randn(4, 3),
columns=list('bde'),index=['r1', 'r2', 'r3', 'r4'])
print(frame)
frame=np.abs(frame)
print(frame)
f= lambda x: x.max() - x.min()
fr=frame.apply(f,axis=0)
print(fr)

The function passed to apply need not return a scalar value, it can also return a Series
with multiple values.

import numpy as np
import pandas as pd

def f(x):
return pd.Series([x.min(), x.max()], index=['min', 'max'])

frame = pd.DataFrame(np.random.randn(4, 3),


columns=list('bde'),index=['r1', 'r2', 'r3', 'r4'])
print(frame)
frame=np.abs(frame)
print(frame)
fr=frame.apply(f)
print(fr)

Sorting and Ranking

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
Sorting a data set by some criterion is another important built-in operation. To sort
lexicographically by row or column index, use the sort_index method, which returns a
new, sorted object.

import numpy as np
import pandas as pd

obj = pd.Series(range(4), index=['d', 'a', 'b', 'c'])


obj1=obj.sort_index()
print(obj1)

With a DataFrame, we can sort by index on either axis. The data is sorted in ascending
order by default, but can be sorted in descending order too.

import numpy as np
import pandas as pd

frame = pd.DataFrame(np.arange(8).reshape((2, 4)), index=['three',


'one'],columns=['d', 'a', 'b', 'c'])
fr=frame.sort_index(axis=0)
print(fr)
fr=frame.sort_index(axis=1)
print(fr)
fr=frame.sort_index(axis=1,ascending=False)
print(fr)

To sort a Series by its values, use its sort_values method. Any missing values are sorted
to the end of the Series by default.

import numpy as np
import pandas as pd

obj = pd.Series([4, np.nan, 7, np.nan, -3, 2])


obj1=obj.sort_values()
print(obj1)

On DataFrame, We may want to sort by the values in one or more columns. To do so,
pass one or more column names to the by option:

import numpy as np

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
import pandas as pd

frame = pd.DataFrame({'b': [4, 7, -3, 2], 'a': [0, 1, 0, 1]})


print(frame)
fr=frame.sort_values(by='b')
print(fr)
fr=frame.sort_values(by=['a','b'])
print(fr)
fr=frame.sort_values(by=['a','b'], ascending=False)
print(fr)

Ranking is closely related to sorting, assigning ranks from one through the number of
valid data points in an array. Ties are broken according to a rule. By default rank
breaks ties by assigning. We can also rank in descending order, too.

import numpy as np
import pandas as pd

obj = pd.Series([7, -5, 7, 4, 2, 0, 4])


obj1=obj.rank()
print(obj1)
obj1=obj.rank(method='first')
print(obj1)
obj1=obj.rank(ascending=False, method='min')
print(obj1)

DataFrame can compute ranks over the rows or the columns:

import numpy as np
import pandas as pd

frame = pd.DataFrame({'b': [4.3, 7, -3, 2], 'a': [0, 1, 0, 1], 'c': [-2,
5, 8, -2.5]})
print(frame)
fr=frame.rank(axis=1)
print(fr)

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
Axis indexes with Duplicate Values

Series may have duplicate indices. The index’s is_unique property can tell you whether
its values are unique or not. Data selection is one of the main things that behaves
differently with duplicates. Indexing a value with multiple entries returns a Series
while single entries return a scalar value.

import numpy as np
import pandas as pd

obj = pd.Series(range(5), index=['a', 'a', 'b', 'b', 'c'])


print(obj.index.is_unique)
print(obj['a'])
print(obj['c'])

The same logic extends to indexing rows in a DataFrame.

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.randn(4, 3), columns=['a', 'a', 'b'])


print(df.index.is_unique)
print(df['a'])
print(df['b'])

Summarizing and Computing Descriptive Statistics


Pandas objects are equipped with a set of common mathematical and statistical methods.
Most of these fall into the category of reductions or summary statistics, methods that
extract a single value (like the sum or mean) from a Series or a Series of values from the
rows or columns of a DataFrame. Compared with the equivalent methods of vanilla
NumPy arrays, they are all built from the ground up to exclude missing data. Calling
DataFrame’s sum method returns a Series containing column sums. Passing axis=1 sums
over the rows instead. NA values are excluded unless the entire slice (row or column in
this case) is NA.

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
Some methods, like idxmin and idxmax, return indirect statistics like the index value
where the minimum or maximum values are attained. Another method is describe that
produce multiple summary statistics in one shot. Summary descriptive methods of
dataframe is listed the table given below.

Example

import numpy as np
import pandas as pd
df = pd.DataFrame([[1.4, np.nan], [7.1, -4.5],[np.nan, np.nan], [0.75, -
1.3]],
index=['a', 'b', 'c', 'd'],columns=['one', 'two'])
print(df)
print(df.sum())
print(df.sum(axis=1))
print(df.mean())
print(df.describe())

Covariance
Covariance is a measure of the relationship between two random variables. It measures
the direction of the relationship between two variables. If the covariance for any two
variables is positive, that means, both the variables move in the same direction. If the

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
covariance for any two variables is negative, that means, both the variables move in the

∑𝑛 (𝑥𝑖 − 𝑥̅)(𝑦𝑖 − 𝑦̅ )
opposite direction. It can be calculated as below:
𝑐𝑜𝑣(𝑥, 𝑦) = 𝑖=1
𝑛

A square matrix provides the covariance between each pair of components (or elements)
of a given random vector is called a covariance matrix.

# importing pandas as pd
import pandas as pd

# Creating the dataframe


df = pd.DataFrame({"A":[5, 3, 6, 4],
"B":[11, 2, 4, 3],
"C":[4, 3, 8, 5],
"D":[5, 4, 2, 8]})
print(df)
print(df.cov())

Correlation is a statistical measure that expresses the extent to which two variables are
linearly related (meaning they change together at a constant rate). It’s a common tool for
describing simple relationships without making a statement about cause and effect. The
sample correlation coefficient, r, quantifies the strength of the relationship. Correlation
coefficient quite close to 0, but either positive or negative, implies little or no relationship
between the two variables. A correlation coefficient close to plus 1 means a positive
relationship between the two variables, with increases in one of the variables being
associated with increment in the other variable. A correlation coefficient close to -1
indicates a negative relationship between two variables, with an increase in one of the
variables being associated with a decrease in the other variable. The most common
formula is the Pearson Correlation coefficient used for linear dependency between the

𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
data sets and is given as below.
𝑟=
√(𝑛 ∑ 𝑥2 − (∑ 𝑥)2)(𝑛 ∑ 𝑦2 − (∑ 𝑦)2)

Example

# importing pandas as pd
import pandas as pd

# Creating the dataframe


df = pd.DataFrame({"A":[5, 3, 6, 4],
"B":[11, 2, 4, 3],

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
"C":[4, 3, 8, 5],
"D":[5, 4, 2, 8]})
print(df)
print(df.corr())

Unique Values, Value Counts, and Membership


Another class of related methods extracts information about the values contained in a
one-dimensional Series. Unique function gives us an array of the unique values in a
Series. The unique values are not necessarily returned in sorted order, but could be sorted
if needed using sort() function. Function value_counts() computes a Series containing
value frequencies. Lastly, isin() is responsible for vectorized set membership and can
be very useful in filtering a data set down to a subset of values in a Series or column in
a DataFrame.

# importing pandas as pd
import pandas as pd
s=pd.Series(['c', 'a', 'd', 'a', 'a', 'b', 'b', 'c', 'c'])
uniques = s.unique()
print(uniques)
l=s.value_counts()
print(l)
m = s.isin(['b', 'c'])
print(m)
print(s[m])

Handling Missing Data


Missing data is common in most data analysis applications. One of the goals in designing
pandas was to make working with missing data as easy as possible.

Example

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
import pandas as pd
import numpy as np
s = pd.Series(['Orange', 'Mango', np.nan, 'Avocado'])
print(s)
print(s.isnull())
print(s.notnull())
s1=s.dropna()
print(s1)
s2=s.fillna(0)
print(s2)

Example 2
import pandas as pd
import numpy as np
data = pd.DataFrame([[np.nan, 6.5, 3.], [np.nan, np.nan, 2.0],[np.nan,
np.nan, np.nan], [np.nan, 6.5, 3.]])
print(data)
d1=data.dropna()
print(d1)
d2=data.fillna(0)
print(d2)
d3=data.dropna(how='all')
print(d3)
d4=data.dropna(how='all',axis=1)
print(d4)

Calling fillna with a dict you can use a different fill value for each column. fillna returns
a new object, but you can modify the existing object in place. The same interpolation
methods available for reindexing can be used with fillna. With fillna you can do lots of
other things with a little creativity. For example, we might pass the mean or median
value of a Series.

import pandas as pd
import numpy as np
data = pd.DataFrame([[np.nan, 6.5, 3.], [np.nan, np.nan, 2.0],[np.nan,
np.nan, np.nan], [np.nan, 6.5, 3.]])
d=data
d.fillna(0)
print(d)
d.fillna(0,inplace=True)
print(d)

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
import pandas as pd
import numpy as np
data = pd.DataFrame([[np.nan, 6.5, 3.], [np.nan, np.nan, 2.0],[np.nan,
np.nan, np.nan], [np.nan, 6.5, 3.]])
print(data)
d=data.ffill()
print(d)
d=data.ffill(limit=1)
print(d)
d=data.fillna(data.mean())
print(d)

Hierarchical Indexing?
Hierarchical indexing, also known as multi-level indexing, is a way of organizing data
in Pandas with multiple levels of row or column labels. This allows you to work with
more complex data structures than a simple table with one row and one column of
labels. For example, imagine we have a dataset with sales data for a company, broken
down by region and by quarter. You could organize this data with a hierarchical index
that has two levels: one for the region and one for the quarter.

Example

import pandas as pd
index = [('Kathmandu', 'Q1'), ('Kathmandu', 'Q2'),('Kathmandu', 'Q3'),
('Kathmandu', 'Q4'),
('Pokhara', 'Q1'), ('Pokhara', 'Q2'), ('Pokhara', 'Q3'),
('Pokhara', 'Q4')]
sales = [350, 500,325, 475,200, 300,350,250]
sales_data = pd.Series(sales, index=index)
print(sales_data)
print(sales_data.index)
for x in index:
if(x[1]=='Q2'):
print(x,sales_data[x])

Another Example
arrays = [['Falcon', 'Falcon', 'Parrot', 'Parrot'],
['Captive', 'Wild', 'Captive', 'Wild']]
index = pd.MultiIndex.from_arrays(arrays, names=('Animal', 'Type'))
df = pd.DataFrame({'Max Speed': [390., 350., 30., 20.]},
index=index)
df

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
Panel Data
The Panel in Pandas is used for working with three-dimensional data. It has three main
axes these are items is the 0 axis which corresponds to the data, major-axis is the axis 1
for rows, and minor-axis is the axis 2 for columns. A panel can be created by using the
pandas panel () function. The panel in pandas is a three-dimensional container of data.
To create a panel, we can use ndarrays and a dictionary of DataFrames. We can also
extract data from panels using different methods. (Deprecated)

Group By
Pandas groupby is used for grouping the data according to the categories and
applying a function to the categories. It also helps to aggregate data efficiently. The
Pandas groupby() is a very powerful function with a lot of variations. It makes the task
of splitting the Dataframe over some criteria really easy and efficient.
Example:
import pandas as pd
df=pd.read_csv("/content/drive/MyDrive/Python
Data/employees.csv")
df=df.dropna()
data=df.head(100)
data=data[['Team','First Name','Gender','Salary']]
gb=data.groupby(["Team","Gender"])
gb['Salary'].max()

Pandas DataFrame Join Operation


import pandas as pd

set1=['a','c','d','e','y']
address=['ktm','htd','pkh','bkh','ktm']
set2=['g','a','b','d','h']
d1={"set":set1,"address":address}
d2={"set":set2,"address":address}
df1=pd.DataFrame(d1)
df2=pd.DataFrame(d2)

result=pd.merge(df1,df2,on="set",how="outer")

print(result)

Matplotlib Library
Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations
in Python. Matplotlib makes difficult things possible and simple things easy. matplotlib.pyplot is
a collection of functions that make matplotlib work like MATLAB. Each pyplot function makes
Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
some change to a figure: e.g., creates a figure, creates a plotting area in a figure, plots some lines
in a plotting area, decorates the plot with labels, etc. Once we are done, we can save it with
savefig() or display it with show().
Example
from matplotlib import pyplot as plt
years = [1950, 1960, 1970, 1980, 1990, 2000, 2010]
gdp = [300.2, 543.3, 1075.9, 2862.5, 5979.6, 10289.7, 14958.3]
# create a line chart, years on x-axis, gdp on y-axis
plt.plot(years, gdp, color='green', marker='o', linestyle='solid')
# add a title
plt.title("Nominal GDP")
# add a label to the x and y-axis
plt.ylabel("Billions")
plt.xlabel("Years")
plt.show()

Bar Charts
A bar chart, often known as a bar graph, is a diagram that displays categorical data as rectangular
bars with heights or lengths proportional to the values they stand for. You can plot the bars either
vertically or horizontally. A vertical bar chart may also be referred to as a column chart.
Comparisons among distinct categories are displayed in a bar graph. The comparison categories
are shown on one axis of the chart, and a measured value is shown on the other axis.
Example
from matplotlib import pyplot as plt
Country = ["Nepal", "Srilanka", "Bangladesh", "India",
"Bhutan","Madhives","Pakistan","Afganistan"]
GDP_growth_rate = [6.4, 4.5, 8.3, 7.4, 5.8,8.7,3.2,2.1]
# plot bars with Country as x-coordinate and GDP_growth_rate as height
plt.figure(figsize=(8,4))
plt.bar(Country, GDP_growth_rate)
plt.title("GDP Growth Rates of SAARC Countries") # add a title

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
plt.ylabel("GDP Growth Rate") # label the y-axis
plt.xlabel("Country")#label the x-axis
# label x-axis with movie names at bar centers
plt.show()

Calling plt.barh() function with parameters y,x as plt.barh(y,x) plots horizontal bar chart.
Example
from matplotlib import pyplot as plt
Country = ["Nepal", "Srilanka", "Bangladesh", "India",
"Bhutan","Madhives","Pakistan","Afganistan"]
GDP_growth_rate = [6.4, 4.5, 8.3, 7.4, 5.8,8.7,3.2,2.1]
# plot bars with Country as x-coordinate and GDP_growth_rate as height
plt.figure(figsize=(8,4))
plt.barh(Country, GDP_growth_rate)
plt.title("GDP Growth Rates of SAARC Countries") # add a title
plt.ylabel("GDP Growth Rate") # label the y-axis
plt.xlabel("Country")#label the x-axis
# label x-axis with movie names at bar centers
plt.show()

Stacked bar charts have each plot stacked one over another. We used an unstacked bar chart to
compare each group; we can use a stacked plot to compare each individual. A stacked bar plot is
used to represent the grouping variable. Where group counts or relative proportions are being
plotted in a stacked manner. Occasionally, it is used to display the relative proportion summed to
100%.
Example
# importing package
import matplotlib.pyplot as plt
import numpy as np

# create data
x = ['A', 'B', 'C', 'D']
y1 = np.array([10, 20, 10, 30])
y2 = np.array([20, 25, 15, 25])
y3 = np.array([12, 15, 19, 6])
y4 = np.array([10, 29, 13, 19])

# plot bars in stack manner


plt.bar(x, y1, color='r')
plt.bar(x, y2, bottom=y1, color='b')
plt.bar(x, y3, bottom=y1+y2, color='y')
plt.bar(x, y4, bottom=y1+y2+y3, color='g')
plt.xlabel("Teams")
plt.ylabel("Score")

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
plt.legend(["Round 1", "Round 2", "Round 3", "Round 4"])
plt.title("Scores by Teams in 4 Rounds")
plt.show()

Line Charts
A line chart is a type of chart that provides a visual representation of data in the form of points
that are connected in a straight line. Line Charts are a good choice for showing trends. These
charts are used to represent the relation between two data X and Y on a different axis.
Example
import matplotlib.pyplot as plt

quantity=[1123,1256,1289,1378,1456,1367,1256]
amount=[2246,2512,2588,2702,2912,3214,3250]
Month=["Jan","Feb","Mar","Apr","May","June","July"]

plt.figure(figsize=(8,4))
plt.plot(Month,quantity,marker='x')
plt.plot(Month,amount,marker='o')
plt.title('Sales Trend')
plt.xlabel('Month')
plt.ylabel('Sales')
plt.legend(["Sales Quntity","Sales Amount"],loc="upper left")

# Show the plot


plt.show()

Example
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# Load the dataset into a Pandas DataFrame
df = pd.read_csv("/content/drive/My Drive/HistoricalPrices.csv")

# Convert the date column to datetime


df['Date'] = pd.to_datetime(df['Date'])

# Sort the dataset in the ascending order of date


df = df.sort_values(by = 'Date')

plt.figure(figsize=(8,4))

# Extract the date and close price columns


plt.plot(df['Date'], df['Open'])
plt.plot(df['Date'], df['Close'])

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
plt.title('DJIA Open and Close Prices')
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend(["Open Price","Close Price"],loc="upper left")

# Show the plot


plt.show()

Scatterplots
A scatter plot (aka scatter chart, scatter graph) uses dots to represent values for two
different numeric variables. The position of each dot on the horizontal and vertical axis
indicates values for an individual data point. Scatter plots are used to observe
relationships between variables.
Example

import matplotlib.pyplot as plt


numofgames =[3, 5, 2, 6, 7, 1, 2, 7, 1, 7]
scores =[80, 90, 75, 80, 90, 50, 65, 85, 40, 100]
teams=['A','B','C','D','E','E','F','G','H','I']

plt.scatter(numofgames, scores, c ="blue", marker='o', linewidths=0.25)


plt.title("Game Scores")
plt.xlabel("#Games")
plt.ylabel("Scores")

#Labeling Scatter plot


for i,txt in enumerate(teams):
plt.annotate(txt, (numofgames[i], scores[i]))

# To show the plot


plt.show()

Histogram and Density Plots


A histogram is a graph that displays the frequency of data in equal intervals or bins. It
consists of a series of bars, where each bar represents a range of values, and the height of
the bar corresponds to the number of data points that fall within that range. Histograms
are commonly used to show the distribution of a single variable, such as age, income, or
test scores.

Example

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
import numpy as np
import matplotlib.pyplot as plt
x = np.random.normal(170, 10, 250)
num_bins = 7
plt.figure(figsize=(4,3))
plt.hist(x, num_bins, color='Blue', alpha=0.5)
plt.show()

Example 2

import matplotlib.pyplot as plt


import pandas as pd
flights = pd.read_csv('/content/drive/My Drive/Data/flights.csv')
print(flights)
plt.figure(figsize=(9,7))
plt.hist(flights['arr_delay'], color = 'blue', edgecolor = 'black', bins =
int(180/5))
plt.show()

A density plot shows the probability density function of a variable. It is a smoothed


version of a histogram, where the bars are replaced by a continuous line. Density plots
are useful for showing the shape of a distribution and identifying its mode, skewness,
and kurtosis.

Example

import matplotlib.pyplot as plt


import pandas as pd
import seaborn as sns
flights = pd.read_csv('/content/drive/My Drive/Data/flights.csv')
plt.figure(figsize=(9,7))
sns.kdeplot(flights['arr_delay'], fill=True, color='blue')
plt.show()

Plotting Maps
Maps have been used for centuries to help people navigate and understand their
surroundings. In the age of big data, maps have become an essential tool for data
visualization. They allow us to visualize data in a way that is intuitive, interactive, and

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
easy to understand. Maps can help us identify patterns and relationships that might be
difficult to see in other types of visualizations.

Plotly is a powerful data visualization library for Python that allows you to create a
wide range of interactive visualizations, including maps. One of the advantages of
Plotly is that it is designed to work seamlessly with other Python libraries, such as
Pandas and NumPy. This makes it easy to import and manipulate data and to create
visualizations that are customized to your specific needs.

The Scattergeo() function is used to create a scatter plot on a geographic map. This means
that it can help you plot points on a map where each point represents a specific
geographic location, like a city or a landmark. For example, if you have a dataset that
contains the latitude and longitude coordinates of different cities around the world, we
can use Scattergeo() to plot each city on a world map.

Example

import plotly.express as px
import pandas as pd

# Import data from USGS


data =
pd.read_csv('https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all
_month.csv')

# Drop rows with missing or invalid values in the 'mag' column


data = data.dropna(subset=['mag'])
data = data[data.mag >= 4]

# Create scatter map


fig = px.scatter_geo(data, lat='latitude', lon='longitude', color='mag',
hover_name='place', #size='mag',
title='Earthquakes Around the World')
fig.show()

Line Plot
 plt.plot(x,y)
 color property for linecolor
 marker
 markersize or ms
 markeredgecolor or mec
 markerfacecolor or mfc
 linestyle '-', '--', '-.', ':', 'None', ' ', '', 'solid', 'dashed',
'dashdot', 'dotted'
 multiple lines

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
 plt.grid(True) for grid
 xlabel, ylabel
 title
 legend
 subplot plt.subplot(row,col,figure)

Relationship of sinx and cosx


import matplotlib.pyplot as plt
import numpy as np
x=np.arange(1,10)
y=np.sin(np.arange(1,10))
y1=np.cos(np.arange(1,10))
plt.plot(x,y,marker='>',ms=10,markeredgecolor='red',mfc='green',linestyle="-
.")

plt.plot(x,y1)
plt.legend(["sinx","cosx"])
plt.show()

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT
Scatter Plot
 plt.sactter(x,y)
 Color
 s for size of points
from ctypes import sizeof
import numpy as np
import matplotlib.pyplot as plt
points=[(10,5),(5,6),(7,5)]
x,y=list(zip(*points))
plt.scatter(x,y,color="red")
plt.show()

Bar Graph
 plt.bar(x,y) and plt.barh(x,y)

plot with pandas


import matplotlib.pyplot as plt
import pandas as pd

bim=np.random.randint(10,60,3)
bba=np.random.randint(20,60,3)
bca=np.random.randint(10,60,3)
d={"bim":bim,"bba":bba,"bca":bca}
data=pd.DataFrame(d,index=["2011","2012","2013"])
data.plot(kind='bar')

pie chart
 plt.pie(values)
 labels=[]
 startangel=number
 explode=[same size as data]
 autopct for labels
 colors[]

Edited By: Dipak Dahal Prepared By: Arjun Singh Saud, Asst. Prof. CDCSIT

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy