0% found this document useful (0 votes)
8 views

Introduction To Pandas in Data Analytics

Introduction To Pandas In Data Analytics

Uploaded by

Wael Aly
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Introduction To Pandas in Data Analytics

Introduction To Pandas In Data Analytics

Uploaded by

Wael Aly
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Introduction to

Pandas in Data
Analytics
Pandas DataFrame is an essential tool for data analysis in Python, offering
a powerful and flexible tabular data structure.

1 Labeled Axes
Pandas DataFrame provides a two-dimensional, size-mutable, and
potentially heterogeneous tabular data structure with labeled rows
and columns.

2 Data Analysis
Commonly used alongside NumPy and Matplotlib for
comprehensive data manipulation and visualization.

3 Essential for Python


Pandas DataFrame is a core component of the Python data analysis
ecosystem.

python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
Loading Data into a
DataFrame
Methods for loading data from various sources into a DataFrame. Code
Snippets:

From CSV
df_csv = pd.read_csv('file.csv')

From Excel
df_excel = pd.read_excel('file.xlsx', sheet_name='Sheet1')

From MySQL
import sqlalchemy engine =
sqlalchemy.create_engine('mysql://username:password@localhost/dbnam
e')

df_sql = pd.read_sql_table('table_name', engine)


DataFrame and Series Objects
DataFrame: A two-dimensional table with labeled axes. Series: A one-dimensional array with labels. Index objects:
Immutable array implementing an ordered, sliceable set.

Example DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

Example Series
s = pd.Series([1, 2, 3])
Working with Rows and
Columns
Content: Accessing and manipulating rows and columns. Selecting, adding, and deleting rows and columns.

Selecting a column

df['A']

Adding a new column

df['C'] = df['A'] + df['B']

Deleting a column

df.drop('C', axis=1, inplace=True)

Selecting rows

df.loc[0] # First row df.iloc[0] # First row by position

Indexing and Selecting Data


Content: Indexing with .loc, .iloc, and .ix. Vectorized arithmetic operations.

Using .loc

df.loc[0:1, ['A', 'B']]

Using .iloc

df.iloc[0:1, 0:2]

Vectorized operations

df['A'] + df['B']
Filtering and Filtering
Grouping filtered = df[df['A'] > 1]

Content:
Grouping
Filtering functions and grouping by row index.

grouped = df.groupby('A').sum()
Merging DataFrames Title: Merging
DataFrames
Merging DataFrames using pd.merge(). Types of joins: inner, outer, left, right. Code Snippets:

Creating two DataFrames


df1 = pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'value': [1, 2, 3, 4]})

df2 = pd.DataFrame({'key': ['B', 'D', 'E', 'F'], 'value': [5, 6, 7, 8]})

Inner join
inner_merge = pd.merge(df1, df2, on='key', how='inner')

Outer join
outer_merge = pd.merge(df1, df2, on='key', how='outer')

Left join
left_merge = pd.merge(df1, df2, on='key', how='left')

Right join
right_merge = pd.merge(df1, df2, on='key', how='right')

Concatenating DataFrames Title: Concatenating DataFrames

Concatenating DataFrames using pd.concat(). Concatenating along rows and columns. Code Snippets:

Concatenating along rows


concat_rows = pd.concat([df1, df2])

Concatenating along columns


concat_cols = pd.concat([df1, df2], axis=1)
Joining DataFrames Title: Joining
DataFrames
Joining DataFrames using df.join(). Different types of joins: inner, outer, left, right.

Code Snippets:

Creating two DataFrames with different


indexes
df1 = pd.DataFrame({'value1': [1, 2, 3]}, index=['A', 'B', 'C'])

df2 = pd.DataFrame({'value2': [4, 5, 6]}, index=['B', 'C', 'D'])

Joining DataFrames
joined_df = df1.join(df2, how='inner')

Grouping and Aggregating Data Title: Grouping and


Aggregating Data
Grouping data using df.groupby(). Aggregating data using sum, mean, count, etc.

Code Snippets:

Creating a DataFrame
df = pd.DataFrame({ 'Category': ['A', 'B', 'A', 'B'], 'Value': [10, 20, 30, 40] })

Grouping by 'Category' and calculating


sum
grouped_sum = df.groupby('Category').sum()

Grouping by 'Category' and calculating


multiple aggregations
grouped_agg = df.groupby('Category').agg({'Value': ['sum', 'mean', 'count']})
Filtering Data Title: Filtering Data
Filtering data using conditions. Using df.query() for SQL-like queries.

Code Snippets:

Filtering with conditions


filtered_df = df[df['Value'] > 20]

Using query
filtered_query = df.query('Value > 20')

Sorting Data Title: Sorting Data


Sorting data using df.sort_values() and df.sort_index().

Code Snippets:

Sorting by values
sorted_values = df.sort_values(by='Value')

Sorting by index
sorted_index = df.sort_index()

Handling Missing Data Title: Handling Missing Data


Handling missing data with df.isna(), df.dropna(), and df.fillna().

Code Snippets:

Creating a DataFrame with missing values


df_missing = pd.DataFrame({ 'A': [1, np.nan, 3], 'B': [4, 5, np.nan] })

Checking for missing values


missing_values = df_missing.isna()

Dropping missing values


dropped_na = df_missing.dropna()

Filling missing values


filled_na = df_missing.fillna(0)
Saving and Exporting
DataFrames
Content:

Saving DataFrames to various formats. Code Snippets:

To CSV
df.to_csv('output.csv')

To Excel
df.to_excel('output.xlsx', sheet_name='Sheet1')

To Python dictionary
df_dict = df.to_dict()

To string
df_str = df.to_string()

To MySQL
df.to_sql('table_name', engine)

DataFrame Attributes and


Methods
Non-indexing attributes. Utility methods. Code Snippets:

Non-indexing attributes
df.T

df.axes

df.dtypes

df.empty

df.ndim

df.shape

df.size

df.values

Utility methods
df_copy = df.copy()

df_ranked = df.rank()

df_sorted = df.sort_values(by='A')

df = df.astype({'A': 'float64'})
Iterating Over DataFrames Title: Iterating
Over DataFrames
Methods for iterating over DataFrames.

Iterating over columns


for label, content in df.iteritems():

print(label, content)

Iterating over rows


for index, row in df.iterrows():

print(index, row)

Working with Dates and Times Title: Working with


Dates and Time
Timestamps and Periods. Handling time zones, date ranges, and period frequencies. Code Snippets:

Timestamps
ts = pd.Timestamp('2023-01-01')

Periods
period = pd.Period('2023-01')

Date range
date_range = pd.date_range('2023-01-01', periods=10)

Period range
period_range = pd.period_range('2023-01', periods=10, freq='M')

Pivot Tables and Reshaping Data Title: Pivot Tables


and Reshaping Data
Pivoting, melting, and unstacking. Code Snippets:

Pivot table
pivot = df.pivot_table(values='A', index='B', columns='C')

Melting
melted = pd.melt(df, id_vars=['A'], value_vars=['B', 'C'])

Unstacking
unstacked = df.unstack()
Slide 13: Time Series Data Title: Time Series
Data
Handling time series data with DatetimeIndex and PeriodIndex. Upsampling, downsampling, and resampling. Code
Snippets:

DatetimeIndex
dt_index = pd.DatetimeIndex(['2023-01-01', '2023-01-02'])

PeriodIndex
period_index = pd.PeriodIndex(['2023-01', '2023-02'], freq='M')

Resampling
resampled = df.resample('M').mean()

Additional Tips and Tricks Title: Additional Tips and


Tricks
Miscellaneous tips for working with DataFrames. Code Snippets:

Value counts
value_counts = df['A'].value_counts()

Non-standard string to Timestamp


conversion
ts = pd.to_datetime('2023-01-01 12:34:56', format='%Y-%m-%d %H:%M:%S')
Conclusion

Content:

Key Points Summary:


Introduction to Pandas DataFrame: Understanding the basic structure and importance.

Loading Data: Methods to load data from various sources into DataFrames.
DataFrame and Series Objects: Differences and usage.
Working with Rows and Columns: Accessing, selecting, and modifying data.
Indexing and Selecting Data: Using .loc, .iloc, and vectorized operations.
Saving and Exporting: Exporting DataFrames to different formats.
Attributes and Methods: Key attributes and utility methods.
Iterating Over DataFrames: Methods to iterate through rows and columns.
Dates and Times: Handling date and time data.
Pivot Tables and Reshaping: Techniques for reshaping data.
Filtering and Grouping: Data filtering and aggregation.

Time Series Data: Managing and manipulating time series data.


SQL-like Operations: Merging, joining, concatenating, and advanced operations.
Handling Missing Data: Methods to detect and handle missing values.
Pandas is a powerful tool: Pandas provides versatile and efficient methods to handle, manipulate, and analyze
data, making it a cornerstone of data science and analysis in Python.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy