0% found this document useful (0 votes)
47 views10 pages

Unit2 - Pandas - Jupyter Notebook

Hjivckjfgghkjvhjhggihxjjvh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views10 pages

Unit2 - Pandas - Jupyter Notebook

Hjivckjfgghkjvhjhggihxjjvh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

04/05/2023, 10:44 pandas - Jupyter Notebook

pandas
pandas stands for panel data and is the core library for data manipulation,data analysis.

it consists of single and multidimentional ds for data manipulation.

pandas is a python library used for working with data sets.

high performence data analysis tool

working with large data set

represents in tabular way

working on missing data

three ds in pands

1. series- one dimensional


2. dataframe-two dimentional
3. panel- multidimentional (data,major axis,minor axis)

create Pandas Series


In [1]:

import pandas as pd
import numpy as np

In [2]:

arr = np.array([1,2,3,4])
print(arr)

[1 2 3 4]

In [3]:

s = pd.Series(arr)
print(s)
print(type(s))

0 1
1 2
2 3
3 4
dtype: int64
<class 'pandas.core.series.Series'>

localhost:8888/notebooks/anaconda3/Python/pandas.ipynb 1/10
04/05/2023, 10:44 pandas - Jupyter Notebook

In [4]:

print(s[0:5])

0 1
1 2
2 3
3 4
dtype: int64

In [5]:

a[2]

--------------------------------------------------------------------
-------
NameError Traceback (most recent cal
l last)
/tmp/ipykernel_8943/4164697690.py in <module>
----> 1 a[2]

NameError: name 'a' is not defined

In [ ]:

a = pd.Series(['a','b','c'])

In [ ]:

In [ ]:

a = pd.date_range(start = '2023-03-01', end = '2023-03-28')

In [ ]:

In [ ]:

type(a)

Pandas dataframe
In [ ]:

arr = np.array([[1,2,3],[4,5,6]])
print(arr)

localhost:8888/notebooks/anaconda3/Python/pandas.ipynb 2/10
04/05/2023, 10:44 pandas - Jupyter Notebook

In [ ]:

df = pd.DataFrame(arr)
print(df)

In [ ]:

temp = np.random.randint(low = 20, high =100, size = [20,])


name = np.random.choice(['Abhay','Teclov','Geekshub','Ankit'],20)
random = np.random.choice([10,11,13,12,14],20)

In [ ]:

df = pd.DataFrame({"Temp":temp,"Name":name,"Random":random})
df

In [ ]:

a = list(zip(temp, name, random))


print(a)

In [ ]:

df = pd.DataFrame(data = a, columns=['Temp','Name','Random'])

In [ ]:

df

In [ ]:

type(df)

In [ ]:

temp = np.random.randint(low = 20, high =100, size = [20,])


name = np.random.choice(['Abhay','Teclov','Geekshub','Ankit'],20)
random = np.random.choice([10,11,13,12,14],20)

In [ ]:

df = pd.DataFrame({'temp':temp, 'name':name, 'random':random})

In [ ]:

type(df)

In [ ]:

df.head()

In [ ]:

df.tail()

localhost:8888/notebooks/anaconda3/Python/pandas.ipynb 3/10
04/05/2023, 10:44 pandas - Jupyter Notebook

In [ ]:

df.shape

In [ ]:

df.columns

In [ ]:

df.name

In [ ]:

df['name']

In [ ]:

df['temp'].describe()

In [ ]:

df.info()

In [ ]:

df.values

In [ ]:

df

In [ ]:

df.set_index('temp', inplace = True)

In [ ]:

df

In [ ]:

df.sort_index(axis =0, ascending=False)

In [ ]:

df.sort_values(by ='random', ascending = False)

In [ ]:

df.drop(['random'], axis =1)

localhost:8888/notebooks/anaconda3/Python/pandas.ipynb 4/10
04/05/2023, 10:44 pandas - Jupyter Notebook

In [ ]:

df.head()

In [ ]:

df.iloc[[0,1]]

In [ ]:

df.iloc[1:3,1]

In [ ]:

df.iloc[[True,True,False]]

In [ ]:

df.head()

In [ ]:

df.loc[:,:]

In [ ]:

df.loc[[39,84,34]]

In [ ]:

df.loc[[39,84],'name':'random']

In [ ]:

df.loc[[True, True, False, True]]

In [ ]:

df.loc[df.random > 13]

In [ ]:

df.loc[(df.random > 13) | (df.random == 10),:]

In [ ]:

# Merging & concat


d1 = pd.DataFrame([['a', 1], ['b', 2]],columns=['col1', 'number'])
d2 = pd.DataFrame([['c', 3, 'lion'], ['d', 4, 'tiger']],columns=['letter', 'numbe

In [ ]:

d1

localhost:8888/notebooks/anaconda3/Python/pandas.ipynb 5/10
04/05/2023, 10:44 pandas - Jupyter Notebook

In [ ]:

d2

In [ ]:

pd.concat([d1,d2],axis =0)

In [ ]:

pd.concat([d1,d2], axis =0, ignore_index=True)

In [ ]:

pd.concat([d1,d2], axis = 1)

In [ ]:

d1 = pd.DataFrame({
"city" : ["lucknow","kanpur","agra","delhi"],
"temperature" : [32,45,30,40]
})

In [ ]:

d1

In [ ]:

d2 = pd.DataFrame({
"city" : ["delhi","lucknow","kanpur"],
"humidity" : [68,65,75]
})

In [ ]:

d2

In [ ]:

df = pd.merge(d1,d2, on='city')

In [ ]:

df

In [ ]:

pd.merge(d1,d2, on=['city'], how ='outer')

In [ ]:

pd.merge(d1, d2, on =['city'], how='left')

localhost:8888/notebooks/anaconda3/Python/pandas.ipynb 6/10
04/05/2023, 10:44 pandas - Jupyter Notebook

In [ ]:

# dataset from https://github.com/codebasics/py/blob/master/pandas/6_handling_miss

In [ ]:

df1 = pd.read_csv("weather_data.csv")

In [ ]:

df1

In [ ]:

# pip3 install openpyxl


df1.to_excel('df_xl.xlsx', sheet_name = 'weather_data')

In [ ]:

# pip3 install xlrd


df2 = pd.read_excel('df_xl.xlsx')

In [ ]:

df2

In [ ]:

df2.to_csv('file.csv')

In [ ]:

df2.to_csv('file_noindex.csv', index = False)

In [ ]:

df_group = df2.groupby("event")

In [ ]:

df_group

In [ ]:

for temperature in df_group:


print(temperature)

In [ ]:

df_group.get_group('Rain')

In [ ]:

df_group.describe()

localhost:8888/notebooks/anaconda3/Python/pandas.ipynb 7/10
04/05/2023, 10:44 pandas - Jupyter Notebook

In [ ]:

def hot_temp(x):
return x > 30

In [ ]:

df2['hot_temp'] = df2['temperature'].apply(hot_temp)

In [ ]:

df2

In [ ]:

df2['hot_temp'] = df2['temperature'].apply(lambda x: x > 30)

In [ ]:

df2

In [ ]:

#pivot table

In [ ]:

df2.pivot_table(values = 'temperature', index = 'event', aggfunc = 'mean')

In [ ]:

df2.pivot_table(columns = 'temperature')

In [ ]:

help(pd.DataFrame.pivot_table)

In [ ]:

df3.to_csv("/home/apiiit-rkv/Desktop/dsp unit-3")

In [ ]:

import pandas as pd

In [ ]:

d=pd.read_excel("//home//apiiit-rkv//Desktop//marks.xlsx")
df=pd.DataFrame(d)
df

localhost:8888/notebooks/anaconda3/Python/pandas.ipynb 8/10
04/05/2023, 10:44 pandas - Jupyter Notebook

In [ ]:

#correlation
Correlation coefficients quantify the association between variables or features o
These statistics are of high importance for science and technology, and Python ha
tools that you can use to calculate them. SciPy, NumPy, and pandas correlation met
fast, comprehensive, and well-documented.

What Pearson, Spearman, and Kendall correlation coefficients are


How to use SciPy, NumPy, and pandas correlation functions
How to visualize data, regression lines, and correlation matrices with Matplot

1. Negative correlation (red dots): In the plot on the left, the y values tend
as the x values increase. This shows strong negative correlation, which o
large values of one feature correspond to small values of the other, and v

2.Weak or no correlation (green dots): The plot in the middle shows no obv
trend. This is a form of weak correlation, which occurs when an assoc
between two features is not obvious or is hardly observable.

Positive correlation (blue dots): In the plot on the right, the y val
to increase as the x values increase. This illustrates strong pos
correlation, which occurs when large values of one feature corres
large values of the other, and vice versa.

In [ ]:

import pandas as pd
x = pd.Series(range(10, 20))
x

In [ ]:

y = pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])


y

In [ ]:

x.corr(y) # Pearson's r

In [ ]:

y.corr(x)

In [ ]:

x.corr(y, method='spearman') # Spearman's rh

In [ ]:

x.corr(y, method='kendall')

In [ ]:

localhost:8888/notebooks/anaconda3/Python/pandas.ipynb 9/10
04/05/2023, 10:44 pandas - Jupyter Notebook

localhost:8888/notebooks/anaconda3/Python/pandas.ipynb 10/10

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy