Reference Guide - Pandas Tools For Structuring A Dataset
Reference Guide - Pandas Tools For Structuring A Dataset
dataset
As you’ve learned, there are far too many Python functions to memorize all of them. That’s why,
as every data professional will tell you, you’ll be using reference sheets and coding libraries
nearly every day in your data analysis work.
The following reference guide will help you identify the most common Pandas tools used for
structuring data. Note that this is just for reference. For detailed information on how each
method works, including explanations of every parameter and examples, refer to the linked
documentation.
Combine data
Note that for many situations that require combining data, you can use a number of different
functions, methods, or approaches. Usually you’re not limited to a single “correct” function. So if
these functions and methods seem very similar, don’t worry! It’s because they are! The best way
to learn them, determine what works best for you, and understand them is to use them!
df.merge()
● A method available to the DataFrame class.
● Use df.merge() to take columns or indices from other dataframes and combine them
with the one to which you’re applying the method.
● Example:
pd.concat()
● A pandas function to combine series and/or dataframes
● Use pd.concat() to join columns, rows, or dataframes along a particular axis
● Example:
df3 = pd.concat([df1.drop(['column_1','column_2'], axis=1), df2])
df.join()
● A method available to the DataFrame class.
● Use df.join() to combine columns with another dataframe either on an index or on a key
column. Efficiently join multiple DataFrame objects by index at once by passing a list.
● Example:
df1.set_index('key').join(df2.set_index('key'))
df[['animal', 'legs']]
df.select_dtypes()
● A method available to the DataFrame class.
● Use df.select_dtypes() to return a subset of the dataframe’s columns based on the
column dtypes (e.g., float64, int64, bool, object, etc.).
● Example:
df2 = df.select_dtypes(include=['int64'])
Filter data
Recall from Course 2 that Boolean masks are used to filter dataframes.
df[condition]
● Use df[condition] to create a Boolean mask, then apply the mask to the dataframe to
filter according to selected condition.
● Example:
df[df['class']=='Aves']
Sort data
df.sort_values()
● A method available to the DataFrame class.
● Use df.sort_values() to sort data according to selected parameters.
● Example:
df.sort_values(by=['legs'], ascending=False)
Slice data
df.iloc[]
● Use df.iloc[] to slice a dataframe based on an integer index location.
● Examples:
df.loc[]
● Use df.loc[] to slice a dataframe based on a label or Boolean array.
● Example: