Content Pandas Cheat Sheet
Content Pandas Cheat Sheet
4. Info extraction
df[['col1', 'col2']]
# for multiple columns
b. Slicing
Row df.loc[1:3]
# 1 and 3 are the explicit
indices here
df.iloc[2:4]
# 2 and 4 are the implicit
indices here
Masking df['col']>value
Creates a mask based on our # E.g.
required condition df['age'] > 30
Filtering df.loc[(df['col1'] ==
Filters data based on conditions val1) & (df['col2']
==val2)]
# E.g
df.loc[(df['month'] ==
'January') &
(df['year']=='2022')]
# filters out data for
january 2022
6. Dataframe Manipulation
a. Adding a new row/column
# E.g:.
df.loc[len(df.index)] = ['a', 1]
Column df['new_col']=data
b. Deleting a new row/column
# E.g.
df.drop(3, axis=0)
# Here 3 is the explicit index,
axis=0 is for row
c. Renaming a column
Column df.rename({'old_name':'new_name',
axis=1})
Row df.index=new_indices
E.g.
data[['revenue', 'budget']].apply(np.sum, axis=1)
#sums values of revenue and budget across each row
8. Joins
9.
a. Concat
pd.concat([df1, df2], axis = 0] (for concatenating horizontally, change axis
= 1)
b. Merge
df1.merge(df2, on=’foreign_key’, how=’type_of_join’)
● Optional -> left_on and right_on
● Eg. df1.merge(df2, on=’id’, how=’inner’)
10. Groupby
E.g.
df.groupby(‘director_name’)[‘title’].cou
nt()
# Finds number of titles per director
E.g.
df.groupby(['director_name'])["year"].a
ggregate(['min', 'max'])
# Finds first and recent year of movies
made by all directors
E.g.
data.groupby('director_name').filter(la
mbda x: x["budget"].max() >= 100)
E.g.
def func(x):
x["risky"] = x["budget"] -
x["revenue"].mean() >= 0
return x
data_risky =
data.groupby("director_name").apply(f
unc)
Cut df[‘new_cat_column’]=pd.cut(df[‘co
Bins continuous data into ntinous_col’],bins=bin_values,
categorical groups labels=label_values)
E.g.
data_tidy['temp_cat'] =
pd.cut(data_tidy['Temperature'],
bins=temp_points,
labels=temp_labels)
b. String functions
We can use .str to apply string functions to any column
df[‘col’].str.function()
E.g.
i. data_tidy['Date'].str.split('-')
# This will split the “Date” column into elements separated by “-”
ii. data_tidy.loc[data_tidy['Drug_Name'].str.contains('hydrochloride')]
# Will filter out rows containing the string “hydrochloride”