0% found this document useful (0 votes)
7 views

Attachment 3 Python for Data Analysis Lyst9850 (1)

The document provides an overview of Python libraries NumPy and Pandas for data analysis, covering installation, array creation, basic operations, and data structures. It explains key features such as NumPy's n-dimensional arrays, broadcasting, and Pandas' Series and DataFrames for handling and analyzing data. Additionally, it discusses methods for managing missing data, grouping, merging, and input/output operations with various file formats.

Uploaded by

kalpeshboratkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Attachment 3 Python for Data Analysis Lyst9850 (1)

The document provides an overview of Python libraries NumPy and Pandas for data analysis, covering installation, array creation, basic operations, and data structures. It explains key features such as NumPy's n-dimensional arrays, broadcasting, and Pandas' Series and DataFrames for handling and analyzing data. Additionally, it discusses methods for managing missing data, grouping, merging, and input/output operations with various file formats.

Uploaded by

kalpeshboratkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

SKILLATHON.

CO

PYTHON
FOR
DATA ANALYSIS
© www.skillathon.co
Content

✔ NumPy ✔ Pandas
✔ Introduction ✔ Introduction
✔ Installation ✔ Series
✔ Numpy Arrays ✔ DataFrames
✔ How to create ndarrays? ✔ Missing Data
✔ random() methods ✔ Groupby
✔ Shape of arrays ✔ Aggregate Functions
✔ Reshaping arrays ✔ Merging joining and
✔ Operation on arrays concatenating
✔ Arithmetic ✔ Operations
✔ Broadcasting ✔ Data Input and output

© www.skillathon.co
© www.skillathon.co
Introduction

❑ Stands for Numerical Python .

❑ Fundamental package for scientific computing in


python

❑ Incredibly fast , since has binding to C libraries .

❑ Part of the SciPy stack .

❑ Many other libraries rely on numpy as one of their


building blocks .

© www.skillathon.co
Installation

❑ It’s highly recommended to install anaconda distribution to make sure all underlying
dependencies sync up .

❑ If you have anaconda , install numpy by going to the


terminal or command prompt and start typing :

conda install numpy

❑ If you don’t have anaconda , then type

pip install numpy

© www.skillathon.co
Numpy arrays

❑ Fast built-in n-dimensional array object containing elements of same type .

❑ Dimensions are called axes .

Note

✔ Indexing starts at 0
✔Unlike list , they can be broadcasted .

© www.skillathon.co
How to create numpy arrays

❑ To start using numpy package , we need to import it.

>>> import numpy as np ### we’re importing numpy as np to reduce the work

❑ numpy arrays can directly be created using np.array() function.

>>> arr1 = np.array([1,2,3]) ###passing a simple list as arguments


>>> arr1
array([1,2,3]) ### returns a 1-d array
>>> arr2 = np.array( [ [1,2,3] , [2,3,4] ] ) ### passing nested list
>>>arr2
array( [ [1, 2, 3],
[2, 3, 4] ] ) ### returns a 2-d array

❑ numpy arrays can be quickly generated using np.arange() function.


np.arange ( start , stop, step)

❑Example:
>>> a = np.arange( 0 , 5) ###generates an array from 0 to 4.
© www.skillathon.co
How to create numpy arrays (continued)

❑ To generate an array of zeroes :

>>> np.zeros(shape)

❑ To generate arrays of ones :

>>> np.ones(shape)

❑ To create an identity matrix of size n*n:

>>> np.eye(n)

❑ To create an array with evenly spaced points :

>>> np.linspace(start, stop, no. of points)

linspace is same as arange but it takes an


additional argument of number of points.

© www.skillathon.co
Random Functions

❑ Numpy consists of some functions to generate arrays with random


elements.

np.random.rand(shape) : This function returns random numbers from a uniform


distribution

np.random.randn(shape) : This function generates array of the given size from


gaussian distribution or normal distribution set around zero.

np.random.randint( low , high , size ) : It returns array of given range and size.

Note:
✔In randint() function , lower limit is inclusive and upper limit is exclusive.

© www.skillathon.co
Random function (Examples)

© www.skillathon.co
Array Shape

❑ To get the shape of an numpy array shape attribute is used.

>>> a = np.array ( [ 7, 2, 9, 10] )


>>> a.shape
( 4, )
>>> b = np.array ( [ [ 2, 4, 6 ] , [ 1, 3, 5 ] ] )
>>> b.shape
( 2, 3)

Note :
✔No brackets ,since it’s not a method but attribute .

© www.skillathon.co
Reshaping Arrays

❑ Shape of the arrays can be changed.

❑ Using numpy’s reshape() function , the dimensions of the given function can be changed.

❑ Example :
>>> a = np.random.rand( 4,4 )
>>> a.resahpe ( 2, 2, 4)

© www.skillathon.co
Basic Operations :

❑ Numpy provide some functions to perform basic operations on the array.

ndarray.max() : returns the max element in the given array.


>>> a = np.array ( [ 2, 4, 12, 83, 1] )
>>> a.max()
83
ndarray.min() : returns the smallest element in the given array.
>>> a.min()
1
ndarray.argmax() : returns the index of max element.
>>> a.argmax()
3
ndarray.argmin() : returns the index of smallest element.
>>> a.argmin()
4
ndarray.sum() : returns the sum of the given array.
>>> a.sum()
102
© www.skillathon.co
Basic Operations : statistics

❑ We can calculate mean , median or standard deviation using numpy functions directly.

>>> a = np.array([1,2,3,3])
>>> a.mean () ### will return mean of a
2.25
>>> a.median() ### return the median
2.5
>>> a.std() ### standard deviation
0.8291

© www.skillathon.co
Element-wise operations

❑ Many arithmetic operations can be done with numpy arrays.

❑ With scalars :
>>> a = np.array( [1 , 2, 3] )
>>> a + 1 ###adding 1 to each element in the array
[2, 3, 4]
>>> a ** 2 ### squaring all the elements of the array
[1, 4, 9]
❑ With another array :
>>> b = np.ones(3) ###generates this array [ 1, 1, 1]
>>> a + b
[2, 3, 4]
>>> a-b
[0,1,2]
>>> a * b
[1, 2, 3] ###this multiplication is not matrix multiplication,we use np.dot(a,b) for that.

Note: These operations are of course much faster than if you did them in pure python

© www.skillathon.co
Element-wise operations : comparisons and logical operators

❑ Comparisons can be done between elements 2 arrays.


>>> a == b ###returns an array of Booleans
[ True, False ,False]
>>> a > b
[False , True , True ]
❑ Comparing 2 arrays.
>>> np.array_equal (a ,b) ### returns a boolean value
False
❑ Logical operations :
>>> a = np.array([1 , 0, 0, 1], dtype=bool)
>>> b = np.array([0 , 1, 0, 1],dtype=bool)
>>> np.logical_or(a , b)
[ True, True, False, True ]
>>> np.logical_and(a, b)
[False, False, False, True]

© www.skillathon.co
Broadcasting

❑ Broadcasting is useful when we want to do element-wise operations on numpy arrays with different
shape.
❑ It’s possible to do operations on arrays of different sizes if NumPy can transform these arrays so that
they all have the same size: this conversion is called broadcasting.
❑ It does this without making needless copies of data and usually leads to efficient algorithm
implementations.

Note:
✔If both your arrays are two-dimensional, then their corresponding sizes have to be either
equal or one of them has to be 1 .
© www.skillathon.co
Broadcasting : example

© www.skillathon.co
© www.skillathon.co
Introduction

❑ One of the richest library in python.

❑ Can be used to analyze and visualize data.

❑ Pandas provide us two high performing new data structures :


Series : 1D labeled vector
DataFrames : 2-D spreadsheet like structure

❑ These data structures are fast since they are made on top of Numpy.

❑ SQL like functionality : GroupBy , joining / merging etc.

❑ Missing data handling

© www.skillathon.co
Series

❑ Series is One dimensional object similar to array, list or column in a table.


❑ To each item in the list , an index is assigned .
❑ The index can be integer or string .
❑ By default each item will receive an index label from 0 to n .
❑ Values Can be heterogeneous

© www.skillathon.co
Series ( contd.)

❑ Dictionaries can be converted into series.

❑ To grab any value from the given series, it’s index is used.

© www.skillathon.co
DataFrame

❑ A DataFrame is a tabular data structure comprised of rows and columns, like a spreadsheet,
database table, or R's dataframe object.
❑ Could be thought of as a bunch of Series objects grouped together to share the same index.

❑ Most commonly used pandas object.

© www.skillathon.co
DataFrame (contd.) :

❑ To create a DataFrame, pd.DataFrame() is used.


❑ Like Series, DataFrame accepts many different kinds of input:
Dict of 1D ndarrays, lists, dicts, or Series
2-D numpy.ndarray
Structured or record ndarray
A Series
Another DataFrame

Note:
✔ Along with the data, you can optionally pass index (row labels) and columns (column labels)
arguments.
✔ If axis labels are not passed, they will be constructed from the input data based on common
sense rules
© www.skillathon.co
DataFrames : Columns and rows

❑ To select a column in a data frame , we simply write:


dataframe_name [ ‘ Column_name’]
dataframe_name [ [ ‘Column_name_1’ ,‘Column_name_2’]] ###To select multiple columns
❑ To create a new column:
dataframe_name [‘New_column_name’] = [‘ Values’ ]
❑ We can also remove any column from the dataset .
dataframe_name.drop ( ‘Column_name’ , axis , inplace )
Note: we have to specify the axis of that column and whether we want to remove the column
permanently.
❑ To select rows in a dataframe we use loc attribute
dataframe_name.loc[ ‘row_name’]

© www.skillathon.co
Handling Missing Data

❑ There maybe many missing data in your datasets.

❑ Pandas provide some functions to deal with the.

df.dropna() : Return object with labels on given axis


omitted where alternately any or all of the
data are missing.

df.fillna() : Fill NA/NaN values using the specified


method.

© www.skillathon.co
GroupBy

❑ GroupBy method is used to group together the data based off any row or column .

❑ After grouping them together , aggregate functions can be used on the data for analysis.

❑ There are many aggregate functions available like:


sum()
std()
mean()
min()
max()
describe()

Note: describe() method is the prior to the


rest of them, as it would already print the
max, min, std (standard deviation), count, etc.
out of the numerical columns of the
DataFrame.
© www.skillathon.co
© www.skillathon.co
Merging and Concatenation

❑ Concatenation basically glues together two dataframes who’s dimensions are same.
❑ Pandas provide a function pd.concat( ) to concatenate.
❑ The merge function allows you to merge DataFrames together using a similar logic as merging
SQL Tables together.
Conc
at
Merging

© www.skillathon.co
Data Input and Output

❑ Using pandas we can read and write files of various format like :
.csv()
.json()
.xml()
.html
And many more…
❑ Functions to read a file:
pd.read_csv(‘file_name’)
pd.read_json(‘file_name’)
pd.read_excel(‘file_name’)
❑ Functions to write a file:
pd.to_csv(‘file_name’)
pd.to_excel(‘file_name’)

© www.skillathon.co
Thanks!
Does anyone have any questions?

© www.skillathon.co

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy