0% found this document useful (0 votes)
120 views18 pages

Pandas CheatSheet

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
120 views18 pages

Pandas CheatSheet

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

PANDAS

CHEATSHEET
A Beginners Guide

@apexiq.ai
Introduction
What is Pandas?
Pandas is a free library for Python that makes it easy to work with data. It
provides two main data structures: Series (like a list) and DataFrame (like a
table or spreadsheet). With Pandas, you can easily organize, analyze, and
manipulate data.

Why use Pandas?


User-Friendly: It has a simple and clear syntax, making it easy to learn
and use.
Data Handling: You can easily read and write data in different formats,
like CSV or Excel.
Data Manipulation: It offers powerful tools to filter, group, and reshape
your data quickly.
Integration: Pandas works well with other libraries like Matplotlib for
plotting graphs and Scikit-learn for machine learning.

Installation
To install Pandas, open your terminal or command prompt and type:
!pip install pandas

If you’re using Anaconda, you can install it by typing:


!conda install pandas

@apexiq.ai
1. Loading Data
Loading data is the first step in any data analysis workflow. Pandas provides
several functions to read data from various file formats.

Import:
Import Pandas library:
import pandas as pd

Load CSV File:


df = pd.read_csv('file.csv')

Load Excel File:


df = pd.read_excel('file.xlsx')
or
df = pd.read_excel('file.xlsx', sheet_name='Sheet1')

Load JSON File:


df = pd.read_json('file.json')

@apexiq.ai
2. Viewing Data
After loading the data, it’s important to inspect it to understand its structure
and content. Pandas provides several methods for this.

View First N Rows:


df.head(n=5)

View Last N Rows:


df.tail(n=5)

Random Sample of Rows:


df.sample(n=5)

Summary of DataFrame:

Display information about the DataFrame (data types, non-null counts)

df.info()

Display descriptive statistics for numerical columns (count, mean, std, min,
max)

df.describe()

@apexiq.ai
3. Selecting Data
Selecting specific data from a DataFrame is crucial for analysis. Pandas allows
you to select columns and rows easily.

Select Column by Name:


Access a single column by name

df['column_name']

Access multiple columns by names (returns a DataFrame)

df[['col1', 'col2']]

Select Rows by Index:


Access the first row by integer index (position)

df.iloc[0]

Access the first row by label (if index is not integer)

df.loc[0]

Select Rows with Conditions:

Filter rows based on condition (e.g., column_name > value)

filtered_df = df[df['column_name'] > value]

@apexiq.ai
4. Modifying Data
Modifying data in a DataFrame is essential for preparing your dataset for
analysis.

Add New Column:


Create a new column that is double the values of an existing column

df['new_column'] = df['existing_column'] * 2

Rename Columns:
Rename a specific column

df.rename(columns={'old_name': 'new_name'}, inplace=True)

Drop Columns:

Drop specified column(s)

df.drop(columns=['column_to_drop'], inplace=True)

@apexiq.ai
5. Handling Missing Values
Dealing with missing values is crucial to ensure the integrity of your analysis.

Check for Missing Values:


Count of missing values in each column

df.isnull().sum()

Drop Rows with Missing Values:


Drop any row with NaN values

df.dropna(inplace=True)

Drop rows where a specific column is NaN

df.dropna(subset=['column_name'], inplace=True)

Fill Missing Values:


Fill NaN with a specified value (e.g., zero)

df.fillna(value=0, inplace=True)

Forward fill to propagate the last valid observation forward

df.fillna(method='ffill', inplace=True)

@apexiq.ai
6. Removing Duplicates
Dealing with missing values is crucial to ensure the integrity of your analysis.

Remove Duplicate Rows:

Remove duplicate rows based on all columns

df.drop_duplicates(inplace=True)

Remove duplicates based on specific column(s)

df.drop_duplicates(subset=['col1'], inplace=True)

@apexiq.ai
7. Sorting Data
Sorting data is essential for analysis and presentation. You can sort your
DataFrame by one or more columns.

Sort by One Column:


Sort in ascending order

df.sort_values(by='column_name', ascending=True, inplace=True)

Sort in descending order

df.sort_values(by='column_name', ascending=False, inplace=True)

Sort by Multiple Columns:


Sort by col1 ascending and col2 descending

df.sort_values(by=['col1', 'col2'], ascending=[True, False], inplace=True)

@apexiq.ai
8. Grouping and Aggregating Data
Grouping data allows you to perform operations on subsets of your data.

Group By One Column:


Group data by specified column(s)

grouped = df.groupby('column_name')

Aggregate Functions on Grouped Data:


Sum of grouped values in a specific column

grouped['value_column'].sum()

Mean of grouped values in a specific column

grouped['value_column'].mean()

Multiple aggregations
agg_df = grouped.agg({'value_column': ['sum', 'mean'], 'another_column':
'count'})

@apexiq.ai
9. Merging and Joining DataFrames
Combining multiple DataFrames is often necessary when working with related
datasets.

Merge Two DataFrames:


Merge on key column(s)

merged_df = pd.merge(df1, df2, on='key_column')

Outer Join Two DataFrames:


Outer join to include all records from both DataFrames

merged_outer = pd.merge(df1, df2, how='outer', on='key_column')

Concatenate Two DataFrames:


Concatenate along rows (axis=0)

concat_df = pd.concat([df1, df2], axis=0)

Concatenate along columns (axis=1)

concat_cols_df = pd.concat([df1, df2], axis=1)

@apexiq.ai
10. Applying Functions
You can apply custom functions to your DataFrame or Series to manipulate or
transform data.

Using apply() on DataFrame:


Apply a function to each element in a column

df['new_col'] = df['existing_col'].apply(lambda x: x + 1)

Using apply() on Series:


Square each element in the Series
s = pd.Series([1, 2, 3])
s_squared = s.apply(lambda x: x**2)

Using map() for Element-wise Operations:


Map values based on a dictionary

df['new_col'] = df['existing_col'].map({1: 'A', 2: 'B'})

@apexiq.ai
11. String Methods
Pandas provides string methods that allow you to perform vectorized string
operations on Series.

Converting Strings to Lowercase:


Convert all strings in the column to lowercase

df['string_column'] = df['string_column'].str.lower()

Checking for Substrings:


Check if 'text' is in each string

df['contains_text'] = df['string_column'].str.contains('text')

Replacing Substrings:
Replace 'old' with 'new' in strings

df['string_column'] = df['string_column'].str.replace('old', 'new')

@apexiq.ai
12. Advanced Data Manipulation
Advanced data manipulation techniques allow for more complex
transformations and reshaping of your DataFrame.

Melt Function:
The melt() function is used to transform wide-format data into long-format
data.

df_melted = pd.melt(df, id_vars=['id'], value_vars=['col1', 'col2'])

Pivot Function:
The pivot() function reshapes the DataFrame by specifying index, columns,
and values.

df_pivot = df.pivot(index='date', columns='category', values='value')

Stack and Unstack:


Stack: Convert columns into rows (long format).

stacked_df = df.stack()

Unstack: Convert rows back into columns (wide format).

unstacked_df = stacked_df.unstack()

@apexiq.ai
13. Creating and Using Pivot Tables
Pivot tables allow you to summarize data in a flexible way.

Creating a Pivot Table:


Create a pivot table with specified values, index, columns, and aggregation
function

df_melted = pd.melt(df, id_vars=['id'], value_vars=['col1', 'col2'])

Pivot Function:
The pivot() function reshapes the DataFrame by specifying index, columns,
and values.

pivot_table = df.pivot_table(values='value', index='index_col',


columns='column_col', aggfunc='sum')

Pivot Table with Multiple Aggregations:


Create a pivot table with multiple aggregation functions (sum and mean)

pivot_table_multi = df.pivot_table(values='value', index='index_col', aggfunc=


[np.sum, np.mean])

@apexiq.ai
14. Working with Categorical Data
Pandas provides support for categorical data, which can improve performance
and memory usage.

Convert Column to Categorical:


Convert a column to categorical type

df['category_column'] = df['category_column'].astype('category')

Get Categories and Their Codes:


Get unique categories

pivot_table = df.pivot_table(values='value', index='index_col',


columns='column_col', aggfunc='sum')

Get integer codes for categories

codes = df['category_column'].cat.codes

Using Categorical Data for Grouping:


Group by categorical column and count occurrences

grouped = df.groupby('category_column').size()

@apexiq.ai
15. Handling Date and Time Data
Pandas provides powerful tools for working with date and time data, making it
easy to manipulate and analyze time series.

Convert Strings to Datetime:


Convert to datetime format

df['date_column'] = pd.to_datetime(df['date_column'])

Extracting Date Components:


Extract year

df['year'] = df['date_column'].dt.year

Extract month

df['month'] = df['date_column'].dt.month

Extract day

df['day'] = df['date_column'].dt.day

Setting a Date Column as Index:


Set date_column as the index

df.set_index('date_column', inplace=True)

@apexiq.ai
LIKE FOLLOW SHARE

THANK YOU!

@apexiq.ai

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy