0% found this document useful (0 votes)
374 views

Pandas Basics

Python Pandas is an open source library that provides high-performance, easy-to-use data structures and data analysis tools. It allows for data manipulation and analysis such as data cleaning, grouping, merging, and joining. Pandas works with one-dimensional Series and two-dimensional DataFrame objects and allows importing data from various sources like CSV, Excel, JSON and performing operations on rows and columns using indexing.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
374 views

Pandas Basics

Python Pandas is an open source library that provides high-performance, easy-to-use data structures and data analysis tools. It allows for data manipulation and analysis such as data cleaning, grouping, merging, and joining. Pandas works with one-dimensional Series and two-dimensional DataFrame objects and allows importing data from various sources like CSV, Excel, JSON and performing operations on rows and columns using indexing.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 84

Python Pandas

Introduction
Python pandas

Open Source
Python Library

Simple yet Powerful


and Expressive Tool

Data Manipulation
and Analysis
Application of Pandas

Application of Pandas

Natural Language
Processing Statistics Analytics Big Data

Recommendation
Engine
Stock Prediction Data Science
Pandas Vs. Numpy

Numpy Pandas

Low level data structure. (np.array) High level data structures. (data frame)
It provides in-memory 2d table object
called data frame.

Support for large multidimensional arrays More streamlined handling of tabular data,
and matrices. and rich time series functionality.

A wide range of mathematical array Data alignment, handling missing data,


operations. groupby, merge, and, join methods.
Installation

Open terminal program (for Mac user) or command line (for Windows) and install it
using following command:

conda install pandas

Or

pip install pandas


Installation

● Alternatively, you can install pandas in a jupyter notebook using below code:

!pip install pandas

● To import pandas we import it with a shorter name:

import pandas as pd
Components of the pandas

● Series and dataframe are two primary components of the pandas


● A series is typically a column, and a data frame is a multi-dimensional table
made up of a group of Series
One Series
Dimensional

Components of the
Pandas

DataFrame

Multi
Dimensional

Panel Data
Pandas Series
Pandas series

Series can be created using the following constructor:

To copy the data

It takes various forms It is for data type


like ndarray, lists

Values must be unique and the


same length as data
Creating a Series
Pandas series

A pandas series can be created out of a python list or numpy array

Using list Using numpy array


Pandas series

We can create our index values while creating a series

Pass index parameter


Pandas series

String as a row index


Pandas series

● Create a series from python dictionary

● The key becomes the row index while the value becomes the value at that row
index
Pandas series

Here, the list items remain part of a single row index


Pandas series

To display the index names and values of the series use “.index” and “.values”
respectively.
Accessing a Series
Accessing elements of series

Use the index operator [] to access element in a series

Retrieve first five


elements
Accessing elements of series

Retrieve last five elements


Access element using index

Use index element to access


element
Access element using index

Use index element to access


element
Access element using index

Retrieve multiple elements


using a list of index
Filtering a Series
Filter the values

Filter all the values that are


greater than 15
Arithmetic Operations
Multiply each element in the series by 2

Use ‘*’ operator to perform


multiplication
Add corresponding elements of two series

Use ‘+’ operator to perform


addition
Ranking and Sorting
Ranking in the series

It returns the rank of the


underlying data
Sort series in ascending order
Sort series in descending order
Sort series based on index
Check for Null Values
Check null values using .isnull()

True indicates that the value is null


Check null values using .notnull()

False indicates that the value is null


Pandas DataFrame
Pandas dataframe

A data frame is two dimensional data structure, i.e., data aligned in tabular
manner(rows and column)

Features of DataFrame:

Potentially Columns are of Different Types

Size - Mutable

Labeled axes(rows and column)

Can Perform Arithmetic Operations on Rows and Column


Reading data from different
sources
Reading data from csv file

Use ‘read_csv()’ function from pandas to read data from csv file
Reading data from xlsx file

Use ‘read_excel()’ function from pandas to read data from xlsx file
Reading data from zip file

Read the zip file

Open csv file

Read the csv file


Reading data from text file

Use ‘read_csv()’ function from pandas to read data from text file
Reading data from json file

Use ‘read_json()’ function from pandas to read data from json file
Reading data from xml file
Import package to read xml
file

Parse or extract the xml file

Assign the column names of


output dataframe

Use for loop to extract all the


data

Append each observation in


data to ‘rows’

Create a dataframe ‘xml_df’


Reading data from html file

Use ‘read_html()’ function from pandas to read data from html file
Pandas DataFrame
Pandas DataFrame

● Using the previous mentioned ways to import the data in python, the data is
always a python DataFrame

● Let us now see some operations and manipulations on DataFrames


Creating DataFrames
Creating data frame using single list

1. First create a list.


2. Convert it into a
DataFrame.
Creating dataframe using list of list
Creating dataframe from dictionary of ndarrays
Creating data frame using arrays

Create list of index


Creating data frame using list of dictionaries
Read first five rows of the data

DataFrame.head() will display first five rows of the data


Read last five rows of the data

DataFrame.tail() will display last five rows of the data


Shape of DataFrame
Know more about data

● Check the dimension of the data

● Check the data type


Know more about data

● Use “DataFrame.info()” to know get


information on shape of the data, the data
type and null values in each variable

● Here we see ‘df_market” has 3 variables


with 25 observations in each

● These are non-null observations

● There are 2 categorical variables and one


numeric variable.
Indexing DataFrames
Dealing with rows and column

● Indexing is frequently required in DataFrame. It may serve the purpose


of cross tables or pivot tables

● We can either use the .iloc[] function, the .loc[] function or use some
conditions.

● The “.iloc[]” allows us to retrieve rows and columns by position, and


The “.loc[]” allows us to specify the column name or index to subset.
Dealing with rows and column

Example: Create a new DataFrame as show and access the value that is at index 0 in column
‘Name’
Dealing with rows and column

Select row by iloc[] method


Dealing with rows and column

Select 4th and 6th rows


Dealing with rows and column

Select first three columns by using column number


Dealing with rows and column

Select first and third column


Dealing with rows and column

● loc[] function selects data by the label of the rows and column
● Access the value that is at index 1 in column ‘Score’ using loc method
Dealing with rows and column

Select multiple value by row label and column label using loc
Dealing with rows and column

Select two columns from the data frame


Conditional Subsetting

Subset students who have marks more than 12.


Conditional Subsetting

Subset students who either have more than two attempts or qualify the exam.
Sorting DataFrames
Sort data frame

Sort data frame based on the values of the column


Sort data frame

Sort data frame based on the values of the column in descending order
Sort data frame

Sort data frame based on the values of the multiple columns


Sort data frame

● Note that, while sorting dataframe by multiple columns, pandas sort_value() sorts
the first variable and then the next variable next

● In this case, the function first


sorted the variable ‘percentage’
and then the variable ‘store’
Sort data frame

To sort the index of


the DataFrame use
“.sort_index()”
Ranking DataFrames
Rank the data frame

Rank the dataframe in pandas on ascending order


Rank the data frame

Rank the dataframe in pandas on descending order


Rank the data frame

● Rank the dataframe in pandas by minimum value of the rank


● Rank the data frame in descending order of percentage and if found two
percentage are same then assign the minimum rank to both the percentage
Rank the data frame

● Rank the dataframe in pandas by maximum value of the rank


● Rank the data frame in descending order of percentage and if found two
percentage are same then assign the maximum rank to both the percentage
Rank the data frame

● Rank the dataframe in pandas by dense rank


● Rank the data frame in descending order of score and if found two scores are same
then assign the same rank . Dense rank does not skip any rank (in min and max
ranks are skipped)
Thank You

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy