Pandas Basics
Pandas Basics
Introduction
Python pandas
Open Source
Python Library
Data Manipulation
and Analysis
Application of Pandas
Application of Pandas
Natural Language
Processing Statistics Analytics Big Data
Recommendation
Engine
Stock Prediction Data Science
Pandas Vs. Numpy
Numpy Pandas
Low level data structure. (np.array) High level data structures. (data frame)
It provides in-memory 2d table object
called data frame.
Support for large multidimensional arrays More streamlined handling of tabular data,
and matrices. and rich time series functionality.
Open terminal program (for Mac user) or command line (for Windows) and install it
using following command:
Or
● Alternatively, you can install pandas in a jupyter notebook using below code:
import pandas as pd
Components of the pandas
Components of the
Pandas
DataFrame
Multi
Dimensional
Panel Data
Pandas Series
Pandas series
● The key becomes the row index while the value becomes the value at that row
index
Pandas series
To display the index names and values of the series use “.index” and “.values”
respectively.
Accessing a Series
Accessing elements of series
A data frame is two dimensional data structure, i.e., data aligned in tabular
manner(rows and column)
Features of DataFrame:
Size - Mutable
Use ‘read_csv()’ function from pandas to read data from csv file
Reading data from xlsx file
Use ‘read_excel()’ function from pandas to read data from xlsx file
Reading data from zip file
Use ‘read_csv()’ function from pandas to read data from text file
Reading data from json file
Use ‘read_json()’ function from pandas to read data from json file
Reading data from xml file
Import package to read xml
file
Use ‘read_html()’ function from pandas to read data from html file
Pandas DataFrame
Pandas DataFrame
● Using the previous mentioned ways to import the data in python, the data is
always a python DataFrame
● We can either use the .iloc[] function, the .loc[] function or use some
conditions.
Example: Create a new DataFrame as show and access the value that is at index 0 in column
‘Name’
Dealing with rows and column
● loc[] function selects data by the label of the rows and column
● Access the value that is at index 1 in column ‘Score’ using loc method
Dealing with rows and column
Select multiple value by row label and column label using loc
Dealing with rows and column
Subset students who either have more than two attempts or qualify the exam.
Sorting DataFrames
Sort data frame
Sort data frame based on the values of the column in descending order
Sort data frame
● Note that, while sorting dataframe by multiple columns, pandas sort_value() sorts
the first variable and then the next variable next