0% found this document useful (0 votes)
5 views

Pandas_Tutorial

The document provides an introduction to Pandas, a data manipulation library in Python, detailing its primary data structures: Series and DataFrame. It covers basic operations, data manipulation, handling missing data, data aggregation, merging DataFrames, and advanced operations like pivot tables and applying functions. The conclusion emphasizes the importance of practice and using documentation for mastering Pandas.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Pandas_Tutorial

The document provides an introduction to Pandas, a data manipulation library in Python, detailing its primary data structures: Series and DataFrame. It covers basic operations, data manipulation, handling missing data, data aggregation, merging DataFrames, and advanced operations like pivot tables and applying functions. The conclusion emphasizes the importance of practice and using documentation for mastering Pandas.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Introduction to Pandas

Introduction to Pandas

1. Introduction to Pandas

Pandas is built on top of NumPy and provides two primary data structures: Series and DataFrame.

Series

A Series is a one-dimensional labeled array capable of holding any data type.

import pandas as pd

# Creating a Series

s = pd.Series([1, 3, 5, 7, 9])

print(s)

DataFrame

A DataFrame is a two-dimensional labeled data structure with columns of potentially different types.

# Creating a DataFrame

data = {

'Name': ['John', 'Anna', 'Peter', 'Linda'],

'Age': [28, 24, 35, 32],

'City': ['New York', 'Paris', 'Berlin', 'London']

df = pd.DataFrame(data)
Introduction to Pandas

print(df)

2. Basic Operations on DataFrames

Viewing Data

- head(): View the first few rows of the DataFrame.

- tail(): View the last few rows of the DataFrame.

- info(): Get a summary of the DataFrame.

- describe(): Get descriptive statistics.

print(df.head())

print(df.tail())

print(df.info())

print(df.describe())

Selecting Data

- Using column names.

- Using row indices with iloc and loc.

# Select a column

print(df['Name'])

# Select multiple columns

print(df[['Name', 'City']])
Introduction to Pandas

# Select rows by index

print(df.iloc[1:3])

# Select rows and columns by labels

print(df.loc[0:2, ['Name', 'City']])

3. Data Manipulation

Adding and Dropping Columns

- Adding new columns.

- Dropping columns.

# Adding a new column

df['Country'] = ['USA', 'France', 'Germany', 'UK']

print(df)

# Dropping a column

df = df.drop(columns=['Country'])

print(df)

Filtering Data

- Using conditions to filter rows.


Introduction to Pandas

# Filtering rows where Age > 30

filtered_df = df[df['Age'] > 30]

print(filtered_df)

4. Handling Missing Data

- Checking for missing data.

- Filling missing data.

- Dropping missing data.

# Creating a DataFrame with missing values

data = {

'Name': ['John', 'Anna', 'Peter', 'Linda'],

'Age': [28, None, 35, 32],

'City': ['New York', 'Paris', None, 'London']

df = pd.DataFrame(data)

# Checking for missing data

print(df.isnull())

# Filling missing data

df_filled = df.fillna({'Age': df['Age'].mean(), 'City': 'Unknown'})


Introduction to Pandas

print(df_filled)

# Dropping missing data

df_dropped = df.dropna()

print(df_dropped)

5. Data Aggregation and Grouping

- Using groupby to group data and perform aggregation.

data = {

'Category': ['A', 'B', 'A', 'B'],

'Value': [10, 20, 30, 40]

df = pd.DataFrame(data)

# Grouping by 'Category' and calculating the sum of 'Value'

grouped_df = df.groupby('Category').sum()

print(grouped_df)

6. Merging and Joining DataFrames

- Concatenation.
Introduction to Pandas

- Merging based on keys.

# Concatenation

df1 = pd.DataFrame({'A': ['A0', 'A1'], 'B': ['B0', 'B1']})

df2 = pd.DataFrame({'A': ['A2', 'A3'], 'B': ['B2', 'B3']})

result = pd.concat([df1, df2])

print(result)

# Merging

left = pd.DataFrame({'key': ['K0', 'K1', 'K2'], 'A': ['A0', 'A1', 'A2']})

right = pd.DataFrame({'key': ['K0', 'K1', 'K2'], 'B': ['B0', 'B1', 'B2']})

result = pd.merge(left, right, on='key')

print(result)

7. Advanced Data Operations

Pivot Tables

- Creating pivot tables to summarize data.

data = {

'Date': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],

'City': ['New York', 'Paris', 'Berlin', 'London'],


Introduction to Pandas

'Sales': [200, 150, 300, 250]

df = pd.DataFrame(data)

pivot_table = df.pivot_table(values='Sales', index='City', columns='Date')

print(pivot_table)

Applying Functions

- Using apply to apply functions to data.

# Applying a lambda function to a column

df['Sales'] = df['Sales'].apply(lambda x: x * 1.1)

print(df)

Conclusion

This is a brief overview of some of the basic and intermediate functionalities of pandas. As you work

more with pandas, you'll discover many more powerful features and methods that can help you

manipulate and analyze data efficiently. Practice is key, so try to work on different datasets and use

the pandas documentation for further reference.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy