0% found this document useful (0 votes)

47 views10 pages

Unit2 - Pandas - Jupyter Notebook

Hjivckjfgghkjvhjhggihxjjvh

Uploaded by

neerajboggavarapu098

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views10 pages

Unit2 - Pandas - Jupyter Notebook

Hjivckjfgghkjvhjhggihxjjvh

Uploaded by

neerajboggavarapu098

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

04/05/2023, 10:44 pandas - Jupyter Notebook

pandas
pandas stands for panel data and is the core library for data manipulation,data analysis.

it consists of single and multidimentional ds for data manipulation.

pandas is a python library used for working with data sets.

high performence data analysis tool

working with large data set

represents in tabular way

working on missing data

three ds in pands

1. series- one dimensional

2. dataframe-two dimentional
3. panel- multidimentional (data,major axis,minor axis)

create Pandas Series

In [1]:

import pandas as pd
import numpy as np

In [2]:

arr = np.array([1,2,3,4])
print(arr)

[1 2 3 4]

In [3]:

s = pd.Series(arr)
print(s)
print(type(s))

0 1
1 2
2 3
3 4
dtype: int64
<class 'pandas.core.series.Series'>

localhost:8888/notebooks/anaconda3/Python/pandas.ipynb 1/10
04/05/2023, 10:44 pandas - Jupyter Notebook

In [4]:

print(s[0:5])

0 1
1 2
2 3
3 4
dtype: int64

In [5]:

a[2]

--------------------------------------------------------------------
-------
NameError Traceback (most recent cal
l last)
/tmp/ipykernel_8943/4164697690.py in <module>
----> 1 a[2]

NameError: name 'a' is not defined

In [ ]:

a = pd.Series(['a','b','c'])

In [ ]:

a = pd.date_range(start = '2023-03-01', end = '2023-03-28')

In [ ]:

type(a)

Pandas dataframe
In [ ]:

arr = np.array([[1,2,3],[4,5,6]])
print(arr)

localhost:8888/notebooks/anaconda3/Python/pandas.ipynb 2/10
04/05/2023, 10:44 pandas - Jupyter Notebook

In [ ]:

df = pd.DataFrame(arr)
print(df)

In [ ]:

temp = np.random.randint(low = 20, high =100, size = [20,])

name = np.random.choice(['Abhay','Teclov','Geekshub','Ankit'],20)
random = np.random.choice([10,11,13,12,14],20)

In [ ]:

df = pd.DataFrame({"Temp":temp,"Name":name,"Random":random})
df

In [ ]:

a = list(zip(temp, name, random))

print(a)

In [ ]:

df = pd.DataFrame(data = a, columns=['Temp','Name','Random'])

In [ ]:

type(df)

In [ ]:

temp = np.random.randint(low = 20, high =100, size = [20,])

name = np.random.choice(['Abhay','Teclov','Geekshub','Ankit'],20)
random = np.random.choice([10,11,13,12,14],20)

In [ ]:

df = pd.DataFrame({'temp':temp, 'name':name, 'random':random})

In [ ]:

type(df)

In [ ]:

df.head()

In [ ]:

df.tail()

localhost:8888/notebooks/anaconda3/Python/pandas.ipynb 3/10
04/05/2023, 10:44 pandas - Jupyter Notebook

In [ ]:

df.shape

In [ ]:

df.columns

In [ ]:

df.name

In [ ]:

df['name']

In [ ]:

df['temp'].describe()

In [ ]:

df.info()

In [ ]:

df.values

In [ ]:

df.set_index('temp', inplace = True)

In [ ]:

df.sort_index(axis =0, ascending=False)

In [ ]:

df.sort_values(by ='random', ascending = False)

In [ ]:

df.drop(['random'], axis =1)

localhost:8888/notebooks/anaconda3/Python/pandas.ipynb 4/10
04/05/2023, 10:44 pandas - Jupyter Notebook

In [ ]:

df.head()

In [ ]:

df.iloc[[0,1]]

In [ ]:

df.iloc[1:3,1]

In [ ]:

df.iloc[[True,True,False]]

In [ ]:

df.head()

In [ ]:

df.loc[:,:]

In [ ]:

df.loc[[39,84,34]]

In [ ]:

df.loc[[39,84],'name':'random']

In [ ]:

df.loc[[True, True, False, True]]

In [ ]:

df.loc[df.random > 13]

In [ ]:

df.loc[(df.random > 13) | (df.random == 10),:]

In [ ]:

# Merging & concat

d1 = pd.DataFrame([['a', 1], ['b', 2]],columns=['col1', 'number'])
d2 = pd.DataFrame([['c', 3, 'lion'], ['d', 4, 'tiger']],columns=['letter', 'numbe

In [ ]:

localhost:8888/notebooks/anaconda3/Python/pandas.ipynb 5/10
04/05/2023, 10:44 pandas - Jupyter Notebook

In [ ]:

pd.concat([d1,d2],axis =0)

In [ ]:

pd.concat([d1,d2], axis =0, ignore_index=True)

In [ ]:

pd.concat([d1,d2], axis = 1)

In [ ]:

d1 = pd.DataFrame({
"city" : ["lucknow","kanpur","agra","delhi"],
"temperature" : [32,45,30,40]
})

In [ ]:

d2 = pd.DataFrame({
"city" : ["delhi","lucknow","kanpur"],
"humidity" : [68,65,75]
})

In [ ]:

df = pd.merge(d1,d2, on='city')

In [ ]:

pd.merge(d1,d2, on=['city'], how ='outer')

In [ ]:

pd.merge(d1, d2, on =['city'], how='left')

localhost:8888/notebooks/anaconda3/Python/pandas.ipynb 6/10
04/05/2023, 10:44 pandas - Jupyter Notebook

In [ ]:

# dataset from https://github.com/codebasics/py/blob/master/pandas/6_handling_miss

In [ ]:

df1 = pd.read_csv("weather_data.csv")

In [ ]:

df1

In [ ]:

# pip3 install openpyxl

df1.to_excel('df_xl.xlsx', sheet_name = 'weather_data')

In [ ]:

# pip3 install xlrd

df2 = pd.read_excel('df_xl.xlsx')

In [ ]:

df2

In [ ]:

df2.to_csv('file.csv')

In [ ]:

df2.to_csv('file_noindex.csv', index = False)

In [ ]:

df_group = df2.groupby("event")

In [ ]:

df_group

In [ ]:

for temperature in df_group:

print(temperature)

In [ ]:

df_group.get_group('Rain')

In [ ]:

df_group.describe()

localhost:8888/notebooks/anaconda3/Python/pandas.ipynb 7/10
04/05/2023, 10:44 pandas - Jupyter Notebook

In [ ]:

def hot_temp(x):
return x > 30

In [ ]:

df2['hot_temp'] = df2['temperature'].apply(hot_temp)

In [ ]:

df2

In [ ]:

df2['hot_temp'] = df2['temperature'].apply(lambda x: x > 30)

In [ ]:

df2

In [ ]:

#pivot table

In [ ]:

df2.pivot_table(values = 'temperature', index = 'event', aggfunc = 'mean')

In [ ]:

df2.pivot_table(columns = 'temperature')

In [ ]:

help(pd.DataFrame.pivot_table)

In [ ]:

df3.to_csv("/home/apiiit-rkv/Desktop/dsp unit-3")

In [ ]:

import pandas as pd

In [ ]:

d=pd.read_excel("//home//apiiit-rkv//Desktop//marks.xlsx")
df=pd.DataFrame(d)
df

localhost:8888/notebooks/anaconda3/Python/pandas.ipynb 8/10
04/05/2023, 10:44 pandas - Jupyter Notebook

In [ ]:

#correlation
Correlation coefficients quantify the association between variables or features o
These statistics are of high importance for science and technology, and Python ha
tools that you can use to calculate them. SciPy, NumPy, and pandas correlation met
fast, comprehensive, and well-documented.

What Pearson, Spearman, and Kendall correlation coefficients are

How to use SciPy, NumPy, and pandas correlation functions
How to visualize data, regression lines, and correlation matrices with Matplot

1. Negative correlation (red dots): In the plot on the left, the y values tend
as the x values increase. This shows strong negative correlation, which o
large values of one feature correspond to small values of the other, and v

2.Weak or no correlation (green dots): The plot in the middle shows no obv
trend. This is a form of weak correlation, which occurs when an assoc
between two features is not obvious or is hardly observable.

Positive correlation (blue dots): In the plot on the right, the y val
to increase as the x values increase. This illustrates strong pos
correlation, which occurs when large values of one feature corres
large values of the other, and vice versa.

In [ ]:

import pandas as pd
x = pd.Series(range(10, 20))
x

In [ ]:

y = pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])

In [ ]:

x.corr(y) # Pearson's r

In [ ]:

y.corr(x)

In [ ]:

x.corr(y, method='spearman') # Spearman's rh

In [ ]:

x.corr(y, method='kendall')

In [ ]:

localhost:8888/notebooks/anaconda3/Python/pandas.ipynb 9/10
04/05/2023, 10:44 pandas - Jupyter Notebook

localhost:8888/notebooks/anaconda3/Python/pandas.ipynb 10/10

Michael A Bailey - Real Econometrics - The Right Tools To Answer Important Questions 2nd Edition OXFORD UNIVERSITY PRESS - Libgenli
No ratings yet
Michael A Bailey - Real Econometrics - The Right Tools To Answer Important Questions 2nd Edition OXFORD UNIVERSITY PRESS - Libgenli
656 pages
Loading Pandas
No ratings yet
Loading Pandas
23 pages
Pandas
No ratings yet
Pandas
21 pages
All Document Reader 1715619870900
No ratings yet
All Document Reader 1715619870900
6 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
14 pages
Loki Temp PPT Pandas 2
No ratings yet
Loki Temp PPT Pandas 2
31 pages
Pandas
No ratings yet
Pandas
44 pages
99c949c0-5910-425f-9ac5-155882800fa5
No ratings yet
99c949c0-5910-425f-9ac5-155882800fa5
36 pages
Pandas+With+Python+ +DATAhill+Solutions
No ratings yet
Pandas+With+Python+ +DATAhill+Solutions
24 pages
Pandas
No ratings yet
Pandas
25 pages
Unit 4 DSE
No ratings yet
Unit 4 DSE
9 pages
Pandas 1705297450
No ratings yet
Pandas 1705297450
21 pages
Python Pandas Demo PDF
100% (2)
Python Pandas Demo PDF
23 pages
ML UNIT-2 NOTES
No ratings yet
ML UNIT-2 NOTES
17 pages
Pandas_Tutorial
No ratings yet
Pandas_Tutorial
9 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
Pandas_Tutorial
No ratings yet
Pandas_Tutorial
7 pages
Pandas Dataframe Export The CSV File
No ratings yet
Pandas Dataframe Export The CSV File
9 pages
Pandas
No ratings yet
Pandas
26 pages
10 Minutes To Pandas - Pandas 1.2.4 Documentation
No ratings yet
10 Minutes To Pandas - Pandas 1.2.4 Documentation
18 pages
python 2.1.2 (2)
No ratings yet
python 2.1.2 (2)
7 pages
Unit 4
No ratings yet
Unit 4
36 pages
pandas (1)
No ratings yet
pandas (1)
25 pages
Pandas
No ratings yet
Pandas
9 pages
Pandas
No ratings yet
Pandas
13 pages
Unit III - Pandas - Data Manipulation Using Python
No ratings yet
Unit III - Pandas - Data Manipulation Using Python
15 pages
FALLSEMFY2023-24 BCSE101E ELA CH2023241700215 Reference Material II 24-11-2023 Introduction To Pandas
No ratings yet
FALLSEMFY2023-24 BCSE101E ELA CH2023241700215 Reference Material II 24-11-2023 Introduction To Pandas
15 pages
Python Data Frame New
No ratings yet
Python Data Frame New
32 pages
Pandas Cheat Sheet
100% (2)
Pandas Cheat Sheet
6 pages
Ii Unit Pandas
No ratings yet
Ii Unit Pandas
30 pages
Unit 2
No ratings yet
Unit 2
81 pages
10 Minutes To Pandas
No ratings yet
10 Minutes To Pandas
26 pages
ip study
No ratings yet
ip study
18 pages
Chapter 2 Python Pandas - II
No ratings yet
Chapter 2 Python Pandas - II
19 pages
exp3 python (1)
No ratings yet
exp3 python (1)
15 pages
Pandas Notes
No ratings yet
Pandas Notes
54 pages
Introduction To Pandas in Data Analytics
No ratings yet
Introduction To Pandas in Data Analytics
12 pages
Pandas
No ratings yet
Pandas
11 pages
Pandas.ipynb - Colab (1)
No ratings yet
Pandas.ipynb - Colab (1)
8 pages
Content Pandas Cheat Sheet
No ratings yet
Content Pandas Cheat Sheet
9 pages
Pandas
No ratings yet
Pandas
4 pages
1501992967_1496666168_Pandas
No ratings yet
1501992967_1496666168_Pandas
63 pages
Pandas
No ratings yet
Pandas
13 pages
DSL Pandas
No ratings yet
DSL Pandas
87 pages
Acknowledgement
No ratings yet
Acknowledgement
25 pages
Unit3_3) Pandas.ipynb - Colab
No ratings yet
Unit3_3) Pandas.ipynb - Colab
11 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
10 Minutes to Pandas — Pandas 2.1.1 Documentation
No ratings yet
10 Minutes to Pandas — Pandas 2.1.1 Documentation
24 pages
Python Pandas
No ratings yet
Python Pandas
2 pages
Fundamental - Python
No ratings yet
Fundamental - Python
3 pages
CSL-410-L13
No ratings yet
CSL-410-L13
16 pages
Pandas
No ratings yet
Pandas
29 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
26 pages
unit 3
No ratings yet
unit 3
10 pages
Introduction To Pandas For Data Analysis
No ratings yet
Introduction To Pandas For Data Analysis
6 pages
Pandas
No ratings yet
Pandas
94 pages
DAP_3_module
No ratings yet
DAP_3_module
62 pages
12 Pandas
100% (1)
12 Pandas
21 pages
Pandas Data Wrangling Cheatsheet Datacamp PDF
No ratings yet
Pandas Data Wrangling Cheatsheet Datacamp PDF
1 page
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
Data Gathering Procedure
No ratings yet
Data Gathering Procedure
22 pages
Statistics Week6
No ratings yet
Statistics Week6
47 pages
Revitalising_brands_2012
No ratings yet
Revitalising_brands_2012
22 pages
Business Statistics Model Paper - 2024-25
No ratings yet
Business Statistics Model Paper - 2024-25
3 pages
Basic Statistics (Math 130) FT 1ST Trimseter 2018
No ratings yet
Basic Statistics (Math 130) FT 1ST Trimseter 2018
3 pages
Optimizing Solar Panel Tilt Using Machine Learning Techniques
No ratings yet
Optimizing Solar Panel Tilt Using Machine Learning Techniques
6 pages
ABVM model exam (1) (6)
No ratings yet
ABVM model exam (1) (6)
32 pages
Fatigue and Somatization in Shift Workers Effect - 2023 - Journal of Psychosoma
No ratings yet
Fatigue and Somatization in Shift Workers Effect - 2023 - Journal of Psychosoma
6 pages
Students - Perceptions of The Effects of Internships On Confidence
No ratings yet
Students - Perceptions of The Effects of Internships On Confidence
115 pages
Mechanical Engg - SE (SEM IV)
No ratings yet
Mechanical Engg - SE (SEM IV)
17 pages
MBQT1001
No ratings yet
MBQT1001
1 page
Assessment and Evaluation in Mathematics
No ratings yet
Assessment and Evaluation in Mathematics
95 pages
Report Case Study Fin534 Group Assignment
No ratings yet
Report Case Study Fin534 Group Assignment
15 pages
Jammu and Kashmir Assistant Director Statistics Examination Syllabus
No ratings yet
Jammu and Kashmir Assistant Director Statistics Examination Syllabus
13 pages
Phase-3 project
No ratings yet
Phase-3 project
14 pages
Chapter 7 -Data analysis process (Full- updated)
No ratings yet
Chapter 7 -Data analysis process (Full- updated)
77 pages
TCH 206 - Statistics for Chemical Engineers (1)
No ratings yet
TCH 206 - Statistics for Chemical Engineers (1)
2 pages
AJ302-04A QuantMeth Syllabus SP25
No ratings yet
AJ302-04A QuantMeth Syllabus SP25
17 pages
The Beginners Guide To Statistical Analysis 5 Steps
No ratings yet
The Beginners Guide To Statistical Analysis 5 Steps
27 pages
Work Stress and Employee Performance An
No ratings yet
Work Stress and Employee Performance An
12 pages
AI-Certification-Q&A
No ratings yet
AI-Certification-Q&A
20 pages
Jurnal 1
No ratings yet
Jurnal 1
15 pages
Andersen 1995
No ratings yet
Andersen 1995
11 pages
THESIS Chapters 1-3
No ratings yet
THESIS Chapters 1-3
22 pages
Collins8 Knowledge Exchange
No ratings yet
Collins8 Knowledge Exchange
17 pages
Ba Economics 20012022
No ratings yet
Ba Economics 20012022
77 pages
Week 4 - 5 - Data Preprocessing
No ratings yet
Week 4 - 5 - Data Preprocessing
67 pages
LESSON1 ObtainingData
100% (1)
LESSON1 ObtainingData
32 pages
Handout PS 1 - Customer Analytics
No ratings yet
Handout PS 1 - Customer Analytics
16 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit2 - Pandas - Jupyter Notebook

Uploaded by

Unit2 - Pandas - Jupyter Notebook

Uploaded by

04/05/2023, 10:44 pandas - Jupyter Notebook

it consists of single and multidimentional ds for data manipulation.

pandas is a python library used for working with data sets.

high performence data analysis tool

working with large data set

represents in tabular way

working on missing data

1. series- one dimensional

create Pandas Series

NameError: name 'a' is not defined

a = pd.date_range(start = '2023-03-01', end = '2023-03-28')

temp = np.random.randint(low = 20, high =100, size = [20,])

a = list(zip(temp, name, random))

temp = np.random.randint(low = 20, high =100, size = [20,])

df = pd.DataFrame({'temp':temp, 'name':name, 'random':random})

df.set_index('temp', inplace = True)

df.sort_index(axis =0, ascending=False)

df.sort_values(by ='random', ascending = False)

df.drop(['random'], axis =1)

df.loc[[True, True, False, True]]

df.loc[df.random > 13]

df.loc[(df.random > 13) | (df.random == 10),:]

# Merging & concat

pd.concat([d1,d2], axis =0, ignore_index=True)

pd.merge(d1,d2, on=['city'], how ='outer')

pd.merge(d1, d2, on =['city'], how='left')

# dataset from https://github.com/codebasics/py/blob/master/pandas/6_handling_miss

# pip3 install openpyxl

# pip3 install xlrd

df2.to_csv('file_noindex.csv', index = False)

for temperature in df_group:

df2['hot_temp'] = df2['temperature'].apply(lambda x: x > 30)

df2.pivot_table(values = 'temperature', index = 'event', aggfunc = 'mean')

What Pearson, Spearman, and Kendall correlation coefficients are

y = pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])

x.corr(y, method='spearman') # Spearman's rh

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.