100% found this document useful (1 vote)
119 views

Python Cheat Sheet Code Academy

This document provides a cheat sheet for the Pandas library in Python. It summarizes key functions for importing and exporting data, selecting and filtering data, cleaning and transforming data, joining/combining data, and descriptive statistics. Some important functions covered include reading/writing CSV/Excel files, selecting columns/rows, dropping null values, grouping/pivoting data, concatenating DataFrames, and calculating means, medians, and standard deviations. The cheat sheet is intended to be a handy reference for common Pandas tasks.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
119 views

Python Cheat Sheet Code Academy

This document provides a cheat sheet for the Pandas library in Python. It summarizes key functions for importing and exporting data, selecting and filtering data, cleaning and transforming data, joining/combining data, and descriptive statistics. Some important functions covered include reading/writing CSV/Excel files, selecting columns/rows, dropping null values, grouping/pivoting data, concatenating DataFrames, and calculating means, medians, and standard deviations. The cheat sheet is intended to be a handy reference for common Pandas tasks.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

LEARN DATA SCIENCE ONLINE

Start Learning For Free - www.dataquest.io

Data Science Cheat Sheet


Pandas

KEY IMPORTS
We’ll use shorthand in this cheat sheet Import these to start
df - A pandas DataFrame object import pandas as pd
s - A pandas Series object import numpy as np

I M P O RT I N G DATA SELECTION col1 in ascending order then col2 in descending


pd.read_csv(filename) - From a CSV file df[col] - Returns column with label col as Series order
pd.read_table(filename) - From a delimited text df[[col1, col2]] - Returns Columns as a new df.groupby(col) - Returns a groupby object for
file (like TSV) DataFrame values from one column
pd.read_excel(filename) - From an Excel file s.iloc[0] - Selection by position df.groupby([col1,col2]) - Returns a groupby
pd.read_sql(query, connection_object) - s.loc[0] - Selection by index object values from multiple columns
Reads from a SQL table/database df.iloc[0,:] - First row df.groupby(col1)[col2].mean() - Returns the
pd.read_json(json_string) - Reads from a JSON df.iloc[0,0] - First element of first column mean of the values in col2, grouped by the
formatted string, URL or file. values in col1 (mean can be replaced with
pd.read_html(url) - Parses an html URL, string or DATA C L E A N I N G almost any function from the statistics section)
file and extracts tables to a list of dataframes df.columns = ['a','b','c'] - Renames columns df.pivot_table(index=col1,values=
pd.read_clipboard() - Takes the contents of your pd.isnull() - Checks for null Values, Returns [col2,col3],aggfunc=mean) - Creates a pivot
clipboard and passes it to read_table() Boolean Array table that groups by col1 and calculates the
pd.DataFrame(dict) - From a dict, keys for pd.notnull() - Opposite of s.isnull() mean of col2 and col3
columns names, values for data as lists df.dropna() - Drops all rows that contain null df.groupby(col1).agg(np.mean) - Finds the
values average across all columns for every unique
E X P O RT I N G DATA df.dropna(axis=1) - Drops all columns that column 1 group
df.to_csv(filename) - Writes to a CSV file contain null values df.apply(np.mean) - Applies a function across
df.to_excel(filename) - Writes to an Excel file df.dropna(axis=1,thresh=n) - Drops all rows each column
df.to_sql(table_name, connection_object) - have have less than n non null values df.apply(np.max, axis=1) - Applies a function
Writes to a SQL table df.fillna(x) - Replaces all null values with x across each row
df.to_json(filename) - Writes to a file in JSON s.fillna(s.mean()) - Replaces all null values with
format the mean (mean can be replaced with almost J O I N /C O M B I N E
df.to_html(filename) - Saves as an HTML table any function from the statistics section) df1.append(df2) - Adds the rows in df1 to the
df.to_clipboard() - Writes to the clipboard s.astype(float) - Converts the datatype of the end of df2 (columns should be identical)
series to float pd.concat([df1, df2],axis=1) - Adds the
C R E AT E T E ST O B J E C TS s.replace(1,'one') - Replaces all values equal to columns in df1 to the end of df2 (rows should be
Useful for testing 1 with 'one' identical)
pd.DataFrame(np.random.rand(20,5)) - 5 s.replace([1,3],['one','three']) - Replaces df1.join(df2,on=col1,how='inner') - SQL-style
columns and 20 rows of random floats all 1 with 'one' and 3 with 'three' joins the columns in df1 with the columns
pd.Series(my_list) - Creates a series from an df.rename(columns=lambda x: x + 1) - Mass on df2 where the rows for col have identical
iterable my_list renaming of columns values. how can be one of 'left', 'right',
df.index = pd.date_range('1900/1/30', df.rename(columns={'old_name': 'new_ 'outer', 'inner'
periods=df.shape[0]) - Adds a date index name'}) - Selective renaming
df.set_index('column_one') - Changes the index STAT I ST I C S
V I E W I N G/ I N S P E C T I N G DATA df.rename(index=lambda x: x + 1) - Mass These can all be applied to a series as well.
df.head(n) - First n rows of the DataFrame renaming of index df.describe() - Summary statistics for numerical
df.tail(n) - Last n rows of the DataFrame columns
df.shape() - Number of rows and columns F I LT E R, S O RT, & G R O U P BY df.mean() - Returns the mean of all columns
df.info() - Index, Datatype and Memory df[df[col] > 0.5] - Rows where the col column df.corr() - Returns the correlation between
information is greater than 0.5 columns in a DataFrame
df.describe() - Summary statistics for numerical df[(df[col] > 0.5) & (df[col] < 0.7)] - df.count() - Returns the number of non-null
columns Rows where 0.7 > col > 0.5 values in each DataFrame column
s.value_counts(dropna=False) - Views unique df.sort_values(col1) - Sorts values by col1 in df.max() - Returns the highest value in each
values and counts ascending order column
df.apply(pd.Series.value_counts) - Unique df.sort_values(col2,ascending=False) - Sorts df.min() - Returns the lowest value in each column
values and counts for all columns values by col2 in descending order df.median() - Returns the median of each column
df.sort_values([col1,col2], df.std() - Returns the standard deviation of each
ascending=[True,False]) - Sorts values by column

LEARN DATA SCIENCE ONLINE


Start Learning For Free - www.dataquest.io

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy