0% found this document useful (0 votes)
8 views3 pages

Pandas Fuction Notes

The document provides a comprehensive guide on Exploratory Data Analysis (EDA) using Pandas, covering data loading, inspection, cleaning, transformation, visualization, and statistical analysis. It includes techniques for handling time series data, merging datasets, managing duplicates, and optimizing memory usage. Additionally, it addresses advanced operations such as multi-indexing, categorical data handling, and working with JSON and XML files.

Uploaded by

jasskarans078
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views3 pages

Pandas Fuction Notes

The document provides a comprehensive guide on Exploratory Data Analysis (EDA) using Pandas, covering data loading, inspection, cleaning, transformation, visualization, and statistical analysis. It includes techniques for handling time series data, merging datasets, managing duplicates, and optimizing memory usage. Additionally, it addresses advanced operations such as multi-indexing, categorical data handling, and working with JSON and XML files.

Uploaded by

jasskarans078
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Exploratory Data Analysis (EDA) with Pandas

1. Data Loading
• Read CSV File: df = pd.read_csv('filename.csv')
• Read Excel File: df = pd.read_excel('filename.xlsx')
• Read from SQL Database: df = pd.read_sql(query, connection)
2. Basic Data Inspection
• Display Top Rows: df.head()
• Display Bottom Rows: df.tail()
• Display Data Types: df.dtypes
• Summary Statistics: df.describe()
• Display Index, Columns, and Data: df.info()
3. Data Cleaning
• Check for Missing Values: df.isnull().sum()
• Fill Missing Values: df.fillna(value)
• Drop Missing Values: df.dropna()
• Rename Columns: df.rename(columns={'old_name': 'new_name'})
• Drop Columns: df.drop(columns=['column_name'])
4. Data Transformation
• Apply Function: df['column'].apply(lambda x: function(x))
• Group By and Aggregate: df.groupby('column').agg({'column': 'sum'})
• Pivot Tables: df.pivot_table(index='column1', values='column2', aggfunc='mean')
• Merge DataFrames: pd.merge(df1, df2, on='column')
• Concatenate DataFrames: pd.concat([df1, df2])
5. Data Visualization Integration
• Histogram: df['column'].hist()
• Boxplot: df.boxplot(column=['column1', 'column2'])
• Scatter Plot: df.plot.scatter(x='col1', y='col2')
• Line Plot: df.plot.line()
• Bar Chart: df['column'].value_counts().plot.bar()
6. Statistical Analysis
• Value Counts: df['column'].value_counts()
• Unique Values in Column: df['column'].unique()
• Number of Unique Values: df['column'].nunique()
7. Indexing and Selection
• Select Column: df['column']
• Select Multiple Columns: df[['col1', 'col2']]
• Select Rows by Position: df.iloc[0:5]
• Select Rows by Label: df.loc[0:5]
• Conditional Selection: df[df['column'] > value]
8. Data Formatting and Conversion
• Convert Data Types: df['column'].astype('type')
• String Operations: df['column'].str.lower()
• Datetime Conversion: pd.to_datetime(df['column'])
• Setting Index: df.set_index('column')
9. Handling Time Series Data
• Set Datetime Index: df.set_index(pd.to_datetime(df['date']))
• Resampling Data: df.resample('M').mean()
• Rolling Window Operations: df.rolling(window=5).mean()
10. File Export
• Write to CSV: df.to_csv('filename.csv')
• Write to Excel: df.to_excel('filename.xlsx')
• Write to SQL Database: df.to_sql('table_name', connection)
11. Advanced Data Queries
• Query Function: df.query('column > value')
• Filtering with isin: df[df['column'].isin([value1, value2])]
12. Memory Optimization
• Reducing Memory Usage: df.memory_usage(deep=True)
• Change Data Types to Save Memory: df['column'].astype('category')
13. Multi-Index Operations
• Creating MultiIndex: df.set_index(['col1', 'col2'])
• Slicing on MultiIndex: df.loc[(slice('index1_start', 'index1_end'),
• slice('index2_start', 'index2_end'))]
14. Data Merging Techniques
• Outer Join: pd.merge(df1, df2, on='column', how='outer')
• Inner Join: pd.merge(df1, df2, on='column', how='inner')
• Left Join: pd.merge(df1, df2, on='column', how='left')
• Right Join: pd.merge(df1, df2, on='column', how='right')
15. Dealing with Duplicates
• Finding Duplicates: df.duplicated()
• Removing Duplicates: df.drop_duplicates()
16. Specialized Data Types Handling
• Working with Categorical Data: df['column'].astype('category')
17. Advanced Grouping and Aggregation
• Group by Multiple Columns: df.groupby(['col1', 'col2']).mean()
• Aggregate with Multiple Functions: df.groupby('col').agg(['mean','sum'])
• Transform Function: df.groupby('col').transform(lambda x: x - x.mean())
18. Time Series Specific Operations
• Time-Based Grouping: df.groupby(pd.Grouper(key='date_col',freq='M')).sum()
• Resample Time Series Data: df.resample('M', on='date_col').mean()
19. Text Data Specific Operations
• String Contains: df[df['column'].str.contains('substring')]
• String Split: df['column'].str.split(' ', expand=True)
• Regular Expression Extraction: df['column'].str.extract(r'(regex)')
20. Working with JSON and XML
• Reading JSON: df = pd.read_json('filename.json')
• Reading XML: df = pd.read_xml('filename.xml')
21. Advanced File Handling
• Read CSV with Specific Delimiter: df = pd.read_csv('filename.csv', delimiter=';')
• Writing to JSON: df.to_json('filename.json')
22. Dealing with Missing Data
• Interpolate Missing Values: df['column'].interpolate()
• Forward Fill Missing Values: df['column'].ffill()
• Backward Fill Missing Values: df['column'].bfill()
23. Data Reshaping
• Wide to Long Format: pd.wide_to_long(df, ['col'], i='id_col', j='year')
• Long to Wide Format: df.pivot(index='id_col', columns='year', values='col')
24. Categorical Data Operations
• Convert Column to Categorical: df['column'] = df['column'].astype('category')
• Order Categories: df['column'].cat.set_categories(['cat1', 'cat2'], ordered=True)
25. Advanced Indexing
• Reset Index: df.reset_index(drop=True)
• Set Multiple Indexes: df.set_index(['col1', 'col2'])
• MultiIndex Slicing: df.xs(key='value', level='level_name')
26. Handling Large Data Efficiently
• Dask Integration for Large Data: import dask.dataframe as dd; ddf = dd.from_pandas(df,
npartitions=10)
• Sampling Data for Quick Insights: df.sample(n=1000)
27. Advanced Data Merging
• SQL-like Joins: pd.merge(df1, df2, how='left', on='col')
• Concatenating Along a Different Axis: pd.concat([df1, df2], axis=1)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy