0% found this document useful (0 votes)

120 views18 pages

Pandas CheatSheet

Uploaded by

srinivas.nath.jobs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

120 views18 pages

Pandas CheatSheet

Uploaded by

srinivas.nath.jobs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

PANDAS

CHEATSHEET
A Beginners Guide

@apexiq.ai
Introduction
What is Pandas?
Pandas is a free library for Python that makes it easy to work with data. It
provides two main data structures: Series (like a list) and DataFrame (like a
table or spreadsheet). With Pandas, you can easily organize, analyze, and
manipulate data.

Why use Pandas?

User-Friendly: It has a simple and clear syntax, making it easy to learn
and use.
Data Handling: You can easily read and write data in different formats,
like CSV or Excel.
Data Manipulation: It offers powerful tools to filter, group, and reshape
your data quickly.
Integration: Pandas works well with other libraries like Matplotlib for
plotting graphs and Scikit-learn for machine learning.

Installation
To install Pandas, open your terminal or command prompt and type:
!pip install pandas

If you’re using Anaconda, you can install it by typing:

!conda install pandas

@apexiq.ai
1. Loading Data
Loading data is the first step in any data analysis workflow. Pandas provides
several functions to read data from various file formats.

Import:
Import Pandas library:
import pandas as pd

Load CSV File:

df = pd.read_csv('file.csv')

Load Excel File:

df = pd.read_excel('file.xlsx')
or
df = pd.read_excel('file.xlsx', sheet_name='Sheet1')

Load JSON File:

df = pd.read_json('file.json')

@apexiq.ai
2. Viewing Data
After loading the data, it’s important to inspect it to understand its structure
and content. Pandas provides several methods for this.

View First N Rows:

df.head(n=5)

View Last N Rows:

df.tail(n=5)

Random Sample of Rows:

df.sample(n=5)

Summary of DataFrame:

Display information about the DataFrame (data types, non-null counts)

df.info()

Display descriptive statistics for numerical columns (count, mean, std, min,
max)

df.describe()

@apexiq.ai
3. Selecting Data
Selecting specific data from a DataFrame is crucial for analysis. Pandas allows
you to select columns and rows easily.

Select Column by Name:

Access a single column by name

df['column_name']

Access multiple columns by names (returns a DataFrame)

df[['col1', 'col2']]

Select Rows by Index:

Access the first row by integer index (position)

df.iloc[0]

Access the first row by label (if index is not integer)

df.loc[0]

Select Rows with Conditions:

Filter rows based on condition (e.g., column_name > value)

filtered_df = df[df['column_name'] > value]

@apexiq.ai
4. Modifying Data
Modifying data in a DataFrame is essential for preparing your dataset for
analysis.

Add New Column:

Create a new column that is double the values of an existing column

df['new_column'] = df['existing_column'] * 2

Rename Columns:
Rename a specific column

df.rename(columns={'old_name': 'new_name'}, inplace=True)

Drop Columns:

Drop specified column(s)

df.drop(columns=['column_to_drop'], inplace=True)

@apexiq.ai
5. Handling Missing Values
Dealing with missing values is crucial to ensure the integrity of your analysis.

Check for Missing Values:

Count of missing values in each column

df.isnull().sum()

Drop Rows with Missing Values:

Drop any row with NaN values

df.dropna(inplace=True)

Drop rows where a specific column is NaN

df.dropna(subset=['column_name'], inplace=True)

Fill Missing Values:

Fill NaN with a specified value (e.g., zero)

df.fillna(value=0, inplace=True)

Forward fill to propagate the last valid observation forward

df.fillna(method='ffill', inplace=True)

@apexiq.ai
6. Removing Duplicates
Dealing with missing values is crucial to ensure the integrity of your analysis.

Remove Duplicate Rows:

Remove duplicate rows based on all columns

df.drop_duplicates(inplace=True)

Remove duplicates based on specific column(s)

df.drop_duplicates(subset=['col1'], inplace=True)

@apexiq.ai
7. Sorting Data
Sorting data is essential for analysis and presentation. You can sort your
DataFrame by one or more columns.

Sort by One Column:

Sort in ascending order

df.sort_values(by='column_name', ascending=True, inplace=True)

Sort in descending order

df.sort_values(by='column_name', ascending=False, inplace=True)

Sort by Multiple Columns:

Sort by col1 ascending and col2 descending

df.sort_values(by=['col1', 'col2'], ascending=[True, False], inplace=True)

@apexiq.ai
8. Grouping and Aggregating Data
Grouping data allows you to perform operations on subsets of your data.

Group By One Column:

Group data by specified column(s)

grouped = df.groupby('column_name')

Aggregate Functions on Grouped Data:

Sum of grouped values in a specific column

grouped['value_column'].sum()

Mean of grouped values in a specific column

grouped['value_column'].mean()

Multiple aggregations
agg_df = grouped.agg({'value_column': ['sum', 'mean'], 'another_column':
'count'})

@apexiq.ai
9. Merging and Joining DataFrames
Combining multiple DataFrames is often necessary when working with related
datasets.

Merge Two DataFrames:

Merge on key column(s)

merged_df = pd.merge(df1, df2, on='key_column')

Outer Join Two DataFrames:

Outer join to include all records from both DataFrames

merged_outer = pd.merge(df1, df2, how='outer', on='key_column')

Concatenate Two DataFrames:

Concatenate along rows (axis=0)

concat_df = pd.concat([df1, df2], axis=0)

Concatenate along columns (axis=1)

concat_cols_df = pd.concat([df1, df2], axis=1)

@apexiq.ai
10. Applying Functions
You can apply custom functions to your DataFrame or Series to manipulate or
transform data.

Using apply() on DataFrame:

Apply a function to each element in a column

df['new_col'] = df['existing_col'].apply(lambda x: x + 1)

Using apply() on Series:

Square each element in the Series
s = pd.Series([1, 2, 3])
s_squared = s.apply(lambda x: x**2)

Using map() for Element-wise Operations:

Map values based on a dictionary

df['new_col'] = df['existing_col'].map({1: 'A', 2: 'B'})

@apexiq.ai
11. String Methods
Pandas provides string methods that allow you to perform vectorized string
operations on Series.

Converting Strings to Lowercase:

Convert all strings in the column to lowercase

df['string_column'] = df['string_column'].str.lower()

Checking for Substrings:

Check if 'text' is in each string

df['contains_text'] = df['string_column'].str.contains('text')

Replacing Substrings:
Replace 'old' with 'new' in strings

df['string_column'] = df['string_column'].str.replace('old', 'new')

@apexiq.ai
12. Advanced Data Manipulation
Advanced data manipulation techniques allow for more complex
transformations and reshaping of your DataFrame.

Melt Function:
The melt() function is used to transform wide-format data into long-format
data.

df_melted = pd.melt(df, id_vars=['id'], value_vars=['col1', 'col2'])

Pivot Function:
The pivot() function reshapes the DataFrame by specifying index, columns,
and values.

df_pivot = df.pivot(index='date', columns='category', values='value')

Stack and Unstack:

Stack: Convert columns into rows (long format).

stacked_df = df.stack()

Unstack: Convert rows back into columns (wide format).

unstacked_df = stacked_df.unstack()

@apexiq.ai
13. Creating and Using Pivot Tables
Pivot tables allow you to summarize data in a flexible way.

Creating a Pivot Table:

Create a pivot table with specified values, index, columns, and aggregation
function

df_melted = pd.melt(df, id_vars=['id'], value_vars=['col1', 'col2'])

Pivot Function:
The pivot() function reshapes the DataFrame by specifying index, columns,
and values.

pivot_table = df.pivot_table(values='value', index='index_col',

columns='column_col', aggfunc='sum')

Pivot Table with Multiple Aggregations:

Create a pivot table with multiple aggregation functions (sum and mean)

pivot_table_multi = df.pivot_table(values='value', index='index_col', aggfunc=

[np.sum, np.mean])

@apexiq.ai
14. Working with Categorical Data
Pandas provides support for categorical data, which can improve performance
and memory usage.

Convert Column to Categorical:

Convert a column to categorical type

df['category_column'] = df['category_column'].astype('category')

Get Categories and Their Codes:

Get unique categories

pivot_table = df.pivot_table(values='value', index='index_col',

columns='column_col', aggfunc='sum')

Get integer codes for categories

codes = df['category_column'].cat.codes

Using Categorical Data for Grouping:

Group by categorical column and count occurrences

grouped = df.groupby('category_column').size()

@apexiq.ai
15. Handling Date and Time Data
Pandas provides powerful tools for working with date and time data, making it
easy to manipulate and analyze time series.

Convert Strings to Datetime:

Convert to datetime format

df['date_column'] = pd.to_datetime(df['date_column'])

Extracting Date Components:

Extract year

df['year'] = df['date_column'].dt.year

Extract month

df['month'] = df['date_column'].dt.month

Extract day

df['day'] = df['date_column'].dt.day

Setting a Date Column as Index:

Set date_column as the index

df.set_index('date_column', inplace=True)

@apexiq.ai
LIKE FOLLOW SHARE

THANK YOU!

@apexiq.ai

Quality Agreement between Supplier and Client
No ratings yet
Quality Agreement between Supplier and Client
21 pages
Pandas Library Documentation
No ratings yet
Pandas Library Documentation
16 pages
Pandas
No ratings yet
Pandas
86 pages
UN Data Analysis Pandas Matplotlib
No ratings yet
UN Data Analysis Pandas Matplotlib
28 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
19 pages
Data Analysis With Pandas - Aggregates in Pandas Cheatsheet - Codecademy
100% (1)
Data Analysis With Pandas - Aggregates in Pandas Cheatsheet - Codecademy
2 pages
Flow Calculation Software For Engineered Suppression Systems
No ratings yet
Flow Calculation Software For Engineered Suppression Systems
1 page
Pandas
No ratings yet
Pandas
30 pages
Python Pandas
No ratings yet
Python Pandas
177 pages
40_NumPy_and_Pandas_interview_questions_with_answers_1740141557
No ratings yet
40_NumPy_and_Pandas_interview_questions_with_answers_1740141557
6 pages
DBMT103-EUR 500BN Swift
No ratings yet
DBMT103-EUR 500BN Swift
2 pages
Class 6 Pandas
No ratings yet
Class 6 Pandas
13 pages
Pandas For Data Science
No ratings yet
Pandas For Data Science
42 pages
Pandas Methods
No ratings yet
Pandas Methods
6 pages
SCAN LIVRE CZURD1XxVuxD7mL
No ratings yet
SCAN LIVRE CZURD1XxVuxD7mL
91 pages
1-Pandas Cheat Sheet
No ratings yet
1-Pandas Cheat Sheet
7 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Topic 4 Auditing It Environment Part Iii Auditing Operatig System
No ratings yet
Topic 4 Auditing It Environment Part Iii Auditing Operatig System
32 pages
B.SC Four Year Sem 1 (CCF) - Otp Cuexam
No ratings yet
B.SC Four Year Sem 1 (CCF) - Otp Cuexam
25 pages
Pandas_Notes_Design
No ratings yet
Pandas_Notes_Design
5 pages
Module1-Cheat-Sheet-LINE PLOT
No ratings yet
Module1-Cheat-Sheet-LINE PLOT
3 pages
Pandas Notes(1)
No ratings yet
Pandas Notes(1)
44 pages
Media and Information Literacy: The Evolution of Traditional Media To New Media
100% (1)
Media and Information Literacy: The Evolution of Traditional Media To New Media
11 pages
Pandas 1702216043
No ratings yet
Pandas 1702216043
86 pages
UNIT - 3 Pandas
No ratings yet
UNIT - 3 Pandas
21 pages
18_Pandas
No ratings yet
18_Pandas
33 pages
How To Earn Ipv6 Certifications (On Windows, Part 1)
No ratings yet
How To Earn Ipv6 Certifications (On Windows, Part 1)
30 pages
ML Lab1 Python Panda
No ratings yet
ML Lab1 Python Panda
9 pages
Pandas_Notes
No ratings yet
Pandas_Notes
6 pages
Unit-1 Python Pandas (1)
No ratings yet
Unit-1 Python Pandas (1)
56 pages
Data Science Python Cheat Sheet
No ratings yet
Data Science Python Cheat Sheet
25 pages
CSS11_SSLM_WEEK_1 (1)
No ratings yet
CSS11_SSLM_WEEK_1 (1)
6 pages
Microsoft PowerPoint - Lecture-1 - Intro
No ratings yet
Microsoft PowerPoint - Lecture-1 - Intro
33 pages
Pandas
No ratings yet
Pandas
8 pages
Pandas Class XII (2021-22)
No ratings yet
Pandas Class XII (2021-22)
246 pages
lab manual 15 (1)
No ratings yet
lab manual 15 (1)
7 pages
Python Pandas New Sylabus
No ratings yet
Python Pandas New Sylabus
53 pages
subhasish_parhi-1
No ratings yet
subhasish_parhi-1
3 pages
DevOps Session 3 Pandas.pptx
No ratings yet
DevOps Session 3 Pandas.pptx
33 pages
Erc STG 2024 b1 Final
No ratings yet
Erc STG 2024 b1 Final
4 pages
XII-IP - Data Visualisation
No ratings yet
XII-IP - Data Visualisation
65 pages
Pandas
No ratings yet
Pandas
167 pages
Pandas
No ratings yet
Pandas
9 pages
Top 50 Pandas Interview Questions and Answers (2024)
No ratings yet
Top 50 Pandas Interview Questions and Answers (2024)
34 pages
Neural Networks (Representation) : 1a. Non-Linear Hypothesis
No ratings yet
Neural Networks (Representation) : 1a. Non-Linear Hypothesis
11 pages
Ensemble Learning for AI Developers Learn Bagging Stacking and Boosting Methods with Use Cases Alok Kumar Mayank Jain pdf download
100% (1)
Ensemble Learning for AI Developers Learn Bagging Stacking and Boosting Methods with Use Cases Alok Kumar Mayank Jain pdf download
41 pages
GE Control Catalog - Section 2 - Reduced Voltage Starters Solo XT
No ratings yet
GE Control Catalog - Section 2 - Reduced Voltage Starters Solo XT
7 pages
Lab3 - Python - Pandas DataFrame - GeeksforGeeks
No ratings yet
Lab3 - Python - Pandas DataFrame - GeeksforGeeks
20 pages
Db2 Faq: Db2 Questions Amit Sethi Page 1 of 49
No ratings yet
Db2 Faq: Db2 Questions Amit Sethi Page 1 of 49
49 pages
IPL DATA ANLYSIS (1)
No ratings yet
IPL DATA ANLYSIS (1)
20 pages
Pandas
No ratings yet
Pandas
14 pages
What is pandas
No ratings yet
What is pandas
9 pages
Journal 12
No ratings yet
Journal 12
54 pages
IV Unit Fds
No ratings yet
IV Unit Fds
16 pages
User Manual For Contractors-For Two Stage Reverse Auction (Works Module) Version 2.0 PDF
No ratings yet
User Manual For Contractors-For Two Stage Reverse Auction (Works Module) Version 2.0 PDF
10 pages
Panda Cheatsheet
No ratings yet
Panda Cheatsheet
17 pages
Optimal Tracking A Moving Target For Integrated Mobile Robot-Pan Tilt-Stereo Camera
No ratings yet
Optimal Tracking A Moving Target For Integrated Mobile Robot-Pan Tilt-Stereo Camera
6 pages
Essay The Internet
No ratings yet
Essay The Internet
2 pages
EDA with Pandas
No ratings yet
EDA with Pandas
8 pages
CLS - Xii - Ip - Practical & Project - 2022-23
No ratings yet
CLS - Xii - Ip - Practical & Project - 2022-23
6 pages
A5:M2 Electronic Control Unit A5:M2 Electronic Control Unit: Specifications Specifications
No ratings yet
A5:M2 Electronic Control Unit A5:M2 Electronic Control Unit: Specifications Specifications
2 pages
Ilovepdf Merged-Compressed
No ratings yet
Ilovepdf Merged-Compressed
352 pages
Pandas
No ratings yet
Pandas
13 pages
IP TERM-1 Study Material (Session 2021-22)
No ratings yet
IP TERM-1 Study Material (Session 2021-22)
84 pages
TDA7072 Datasheet
No ratings yet
TDA7072 Datasheet
11 pages
LightWave 11 Addendum 120412 Small
No ratings yet
LightWave 11 Addendum 120412 Small
166 pages
1745516832930-Pandas-Handbook
No ratings yet
1745516832930-Pandas-Handbook
33 pages
Python Pandas Cheatsheety
No ratings yet
Python Pandas Cheatsheety
7 pages
Pandas Notes
No ratings yet
Pandas Notes
4 pages
Pandas Cheat Sheet - Python For Data Science
No ratings yet
Pandas Cheat Sheet - Python For Data Science
5 pages
JAVA QUESTION - MR - ABHISHEK AVULA
No ratings yet
JAVA QUESTION - MR - ABHISHEK AVULA
6 pages
1 - Interactive Data Visualization With Bokeh
No ratings yet
1 - Interactive Data Visualization With Bokeh
31 pages
Tercel Isopulse HW
0% (1)
Tercel Isopulse HW
2 pages
Exploratory Data Analysis (Eda) With Pandas: (Cheatsheet)
No ratings yet
Exploratory Data Analysis (Eda) With Pandas: (Cheatsheet)
7 pages
Pandas in Python 16sept2022
No ratings yet
Pandas in Python 16sept2022
8 pages
LMRS Ip 2020 21
No ratings yet
LMRS Ip 2020 21
21 pages
Azure Sentinel Deployment and Migration Services
No ratings yet
Azure Sentinel Deployment and Migration Services
2 pages
Calibration of Hydraulic Force Machines - Requirements, Concepts, Problems, Solutions
No ratings yet
Calibration of Hydraulic Force Machines - Requirements, Concepts, Problems, Solutions
6 pages
Chapter - 6 Dictionary
100% (2)
Chapter - 6 Dictionary
25 pages
Iso 13374 3 2012 en PDF
No ratings yet
Iso 13374 3 2012 en PDF
11 pages
Pandas
No ratings yet
Pandas
41 pages
Class XII Data Handlinng Using PandasI
No ratings yet
Class XII Data Handlinng Using PandasI
46 pages
Block 1-Data Handling Using Pandas DataFrame
No ratings yet
Block 1-Data Handling Using Pandas DataFrame
17 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
2 pages
Fds Unit - III
No ratings yet
Fds Unit - III
58 pages
Unit 5
No ratings yet
Unit 5
27 pages
Data Visualization
No ratings yet
Data Visualization
9 pages
Python Libraries Cheat Sheets
No ratings yet
Python Libraries Cheat Sheets
6 pages
HS FreeCAD Bookazine
No ratings yet
HS FreeCAD Bookazine
104 pages
Project
No ratings yet
Project
18 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Pandas CheatSheet

Uploaded by

Pandas CheatSheet

Uploaded by

PANDAS

Why use Pandas?

If you’re using Anaconda, you can install it by typing:

Load CSV File:

Load Excel File:

Load JSON File:

View First N Rows:

View Last N Rows:

Random Sample of Rows:

Display information about the DataFrame (data types, non-null counts)

Select Column by Name:

Access multiple columns by names (returns a DataFrame)

Select Rows by Index:

Access the first row by label (if index is not integer)

Select Rows with Conditions:

Filter rows based on condition (e.g., column_name > value)

filtered_df = df[df['column_name'] > value]

Add New Column:

df.rename(columns={'old_name': 'new_name'}, inplace=True)

Drop specified column(s)

Check for Missing Values:

Drop Rows with Missing Values:

Drop rows where a specific column is NaN

Fill Missing Values:

Forward fill to propagate the last valid observation forward

Remove Duplicate Rows:

Remove duplicate rows based on all columns

Remove duplicates based on specific column(s)

Sort by One Column:

df.sort_values(by='column_name', ascending=True, inplace=True)

Sort in descending order

df.sort_values(by='column_name', ascending=False, inplace=True)

Sort by Multiple Columns:

df.sort_values(by=['col1', 'col2'], ascending=[True, False], inplace=True)

Group By One Column:

Aggregate Functions on Grouped Data:

Mean of grouped values in a specific column

Merge Two DataFrames:

merged_df = pd.merge(df1, df2, on='key_column')

Outer Join Two DataFrames:

merged_outer = pd.merge(df1, df2, how='outer', on='key_column')

Concatenate Two DataFrames:

concat_df = pd.concat([df1, df2], axis=0)

Concatenate along columns (axis=1)

concat_cols_df = pd.concat([df1, df2], axis=1)

Using apply() on DataFrame:

Using apply() on Series:

Using map() for Element-wise Operations:

df['new_col'] = df['existing_col'].map({1: 'A', 2: 'B'})

Converting Strings to Lowercase:

Checking for Substrings:

df['string_column'] = df['string_column'].str.replace('old', 'new')

df_melted = pd.melt(df, id_vars=['id'], value_vars=['col1', 'col2'])

df_pivot = df.pivot(index='date', columns='category', values='value')

Stack and Unstack:

Unstack: Convert rows back into columns (wide format).

Creating a Pivot Table:

df_melted = pd.melt(df, id_vars=['id'], value_vars=['col1', 'col2'])

pivot_table = df.pivot_table(values='value', index='index_col',

Pivot Table with Multiple Aggregations:

pivot_table_multi = df.pivot_table(values='value', index='index_col', aggfunc=

Convert Column to Categorical:

Get Categories and Their Codes:

pivot_table = df.pivot_table(values='value', index='index_col',

Get integer codes for categories

Using Categorical Data for Grouping:

Convert Strings to Datetime:

Extracting Date Components:

Setting a Date Column as Index:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.