Attachment 3 Python for Data Analysis Lyst9850 (1)
Attachment 3 Python for Data Analysis Lyst9850 (1)
CO
PYTHON
FOR
DATA ANALYSIS
© www.skillathon.co
Content
✔ NumPy ✔ Pandas
✔ Introduction ✔ Introduction
✔ Installation ✔ Series
✔ Numpy Arrays ✔ DataFrames
✔ How to create ndarrays? ✔ Missing Data
✔ random() methods ✔ Groupby
✔ Shape of arrays ✔ Aggregate Functions
✔ Reshaping arrays ✔ Merging joining and
✔ Operation on arrays concatenating
✔ Arithmetic ✔ Operations
✔ Broadcasting ✔ Data Input and output
© www.skillathon.co
© www.skillathon.co
Introduction
© www.skillathon.co
Installation
❑ It’s highly recommended to install anaconda distribution to make sure all underlying
dependencies sync up .
© www.skillathon.co
Numpy arrays
Note
✔ Indexing starts at 0
✔Unlike list , they can be broadcasted .
© www.skillathon.co
How to create numpy arrays
>>> import numpy as np ### we’re importing numpy as np to reduce the work
❑Example:
>>> a = np.arange( 0 , 5) ###generates an array from 0 to 4.
© www.skillathon.co
How to create numpy arrays (continued)
>>> np.zeros(shape)
>>> np.ones(shape)
>>> np.eye(n)
© www.skillathon.co
Random Functions
np.random.randint( low , high , size ) : It returns array of given range and size.
Note:
✔In randint() function , lower limit is inclusive and upper limit is exclusive.
© www.skillathon.co
Random function (Examples)
© www.skillathon.co
Array Shape
Note :
✔No brackets ,since it’s not a method but attribute .
© www.skillathon.co
Reshaping Arrays
❑ Using numpy’s reshape() function , the dimensions of the given function can be changed.
❑ Example :
>>> a = np.random.rand( 4,4 )
>>> a.resahpe ( 2, 2, 4)
© www.skillathon.co
Basic Operations :
❑ We can calculate mean , median or standard deviation using numpy functions directly.
>>> a = np.array([1,2,3,3])
>>> a.mean () ### will return mean of a
2.25
>>> a.median() ### return the median
2.5
>>> a.std() ### standard deviation
0.8291
© www.skillathon.co
Element-wise operations
❑ With scalars :
>>> a = np.array( [1 , 2, 3] )
>>> a + 1 ###adding 1 to each element in the array
[2, 3, 4]
>>> a ** 2 ### squaring all the elements of the array
[1, 4, 9]
❑ With another array :
>>> b = np.ones(3) ###generates this array [ 1, 1, 1]
>>> a + b
[2, 3, 4]
>>> a-b
[0,1,2]
>>> a * b
[1, 2, 3] ###this multiplication is not matrix multiplication,we use np.dot(a,b) for that.
Note: These operations are of course much faster than if you did them in pure python
© www.skillathon.co
Element-wise operations : comparisons and logical operators
© www.skillathon.co
Broadcasting
❑ Broadcasting is useful when we want to do element-wise operations on numpy arrays with different
shape.
❑ It’s possible to do operations on arrays of different sizes if NumPy can transform these arrays so that
they all have the same size: this conversion is called broadcasting.
❑ It does this without making needless copies of data and usually leads to efficient algorithm
implementations.
Note:
✔If both your arrays are two-dimensional, then their corresponding sizes have to be either
equal or one of them has to be 1 .
© www.skillathon.co
Broadcasting : example
© www.skillathon.co
© www.skillathon.co
Introduction
❑ These data structures are fast since they are made on top of Numpy.
© www.skillathon.co
Series
© www.skillathon.co
Series ( contd.)
❑ To grab any value from the given series, it’s index is used.
© www.skillathon.co
DataFrame
❑ A DataFrame is a tabular data structure comprised of rows and columns, like a spreadsheet,
database table, or R's dataframe object.
❑ Could be thought of as a bunch of Series objects grouped together to share the same index.
© www.skillathon.co
DataFrame (contd.) :
Note:
✔ Along with the data, you can optionally pass index (row labels) and columns (column labels)
arguments.
✔ If axis labels are not passed, they will be constructed from the input data based on common
sense rules
© www.skillathon.co
DataFrames : Columns and rows
© www.skillathon.co
Handling Missing Data
© www.skillathon.co
GroupBy
❑ GroupBy method is used to group together the data based off any row or column .
❑ After grouping them together , aggregate functions can be used on the data for analysis.
❑ Concatenation basically glues together two dataframes who’s dimensions are same.
❑ Pandas provide a function pd.concat( ) to concatenate.
❑ The merge function allows you to merge DataFrames together using a similar logic as merging
SQL Tables together.
Conc
at
Merging
© www.skillathon.co
Data Input and Output
❑ Using pandas we can read and write files of various format like :
.csv()
.json()
.xml()
.html
And many more…
❑ Functions to read a file:
pd.read_csv(‘file_name’)
pd.read_json(‘file_name’)
pd.read_excel(‘file_name’)
❑ Functions to write a file:
pd.to_csv(‘file_name’)
pd.to_excel(‘file_name’)
© www.skillathon.co
Thanks!
Does anyone have any questions?
© www.skillathon.co