Unit2 - Pandas - Jupyter Notebook
Unit2 - Pandas - Jupyter Notebook
pandas
pandas stands for panel data and is the core library for data manipulation,data analysis.
three ds in pands
import pandas as pd
import numpy as np
In [2]:
arr = np.array([1,2,3,4])
print(arr)
[1 2 3 4]
In [3]:
s = pd.Series(arr)
print(s)
print(type(s))
0 1
1 2
2 3
3 4
dtype: int64
<class 'pandas.core.series.Series'>
localhost:8888/notebooks/anaconda3/Python/pandas.ipynb 1/10
04/05/2023, 10:44 pandas - Jupyter Notebook
In [4]:
print(s[0:5])
0 1
1 2
2 3
3 4
dtype: int64
In [5]:
a[2]
--------------------------------------------------------------------
-------
NameError Traceback (most recent cal
l last)
/tmp/ipykernel_8943/4164697690.py in <module>
----> 1 a[2]
In [ ]:
a = pd.Series(['a','b','c'])
In [ ]:
In [ ]:
In [ ]:
In [ ]:
type(a)
Pandas dataframe
In [ ]:
arr = np.array([[1,2,3],[4,5,6]])
print(arr)
localhost:8888/notebooks/anaconda3/Python/pandas.ipynb 2/10
04/05/2023, 10:44 pandas - Jupyter Notebook
In [ ]:
df = pd.DataFrame(arr)
print(df)
In [ ]:
In [ ]:
df = pd.DataFrame({"Temp":temp,"Name":name,"Random":random})
df
In [ ]:
In [ ]:
df = pd.DataFrame(data = a, columns=['Temp','Name','Random'])
In [ ]:
df
In [ ]:
type(df)
In [ ]:
In [ ]:
In [ ]:
type(df)
In [ ]:
df.head()
In [ ]:
df.tail()
localhost:8888/notebooks/anaconda3/Python/pandas.ipynb 3/10
04/05/2023, 10:44 pandas - Jupyter Notebook
In [ ]:
df.shape
In [ ]:
df.columns
In [ ]:
df.name
In [ ]:
df['name']
In [ ]:
df['temp'].describe()
In [ ]:
df.info()
In [ ]:
df.values
In [ ]:
df
In [ ]:
In [ ]:
df
In [ ]:
In [ ]:
In [ ]:
localhost:8888/notebooks/anaconda3/Python/pandas.ipynb 4/10
04/05/2023, 10:44 pandas - Jupyter Notebook
In [ ]:
df.head()
In [ ]:
df.iloc[[0,1]]
In [ ]:
df.iloc[1:3,1]
In [ ]:
df.iloc[[True,True,False]]
In [ ]:
df.head()
In [ ]:
df.loc[:,:]
In [ ]:
df.loc[[39,84,34]]
In [ ]:
df.loc[[39,84],'name':'random']
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
d1
localhost:8888/notebooks/anaconda3/Python/pandas.ipynb 5/10
04/05/2023, 10:44 pandas - Jupyter Notebook
In [ ]:
d2
In [ ]:
pd.concat([d1,d2],axis =0)
In [ ]:
In [ ]:
pd.concat([d1,d2], axis = 1)
In [ ]:
d1 = pd.DataFrame({
"city" : ["lucknow","kanpur","agra","delhi"],
"temperature" : [32,45,30,40]
})
In [ ]:
d1
In [ ]:
d2 = pd.DataFrame({
"city" : ["delhi","lucknow","kanpur"],
"humidity" : [68,65,75]
})
In [ ]:
d2
In [ ]:
df = pd.merge(d1,d2, on='city')
In [ ]:
df
In [ ]:
In [ ]:
localhost:8888/notebooks/anaconda3/Python/pandas.ipynb 6/10
04/05/2023, 10:44 pandas - Jupyter Notebook
In [ ]:
In [ ]:
df1 = pd.read_csv("weather_data.csv")
In [ ]:
df1
In [ ]:
In [ ]:
In [ ]:
df2
In [ ]:
df2.to_csv('file.csv')
In [ ]:
In [ ]:
df_group = df2.groupby("event")
In [ ]:
df_group
In [ ]:
In [ ]:
df_group.get_group('Rain')
In [ ]:
df_group.describe()
localhost:8888/notebooks/anaconda3/Python/pandas.ipynb 7/10
04/05/2023, 10:44 pandas - Jupyter Notebook
In [ ]:
def hot_temp(x):
return x > 30
In [ ]:
df2['hot_temp'] = df2['temperature'].apply(hot_temp)
In [ ]:
df2
In [ ]:
In [ ]:
df2
In [ ]:
#pivot table
In [ ]:
In [ ]:
df2.pivot_table(columns = 'temperature')
In [ ]:
help(pd.DataFrame.pivot_table)
In [ ]:
df3.to_csv("/home/apiiit-rkv/Desktop/dsp unit-3")
In [ ]:
import pandas as pd
In [ ]:
d=pd.read_excel("//home//apiiit-rkv//Desktop//marks.xlsx")
df=pd.DataFrame(d)
df
localhost:8888/notebooks/anaconda3/Python/pandas.ipynb 8/10
04/05/2023, 10:44 pandas - Jupyter Notebook
In [ ]:
#correlation
Correlation coefficients quantify the association between variables or features o
These statistics are of high importance for science and technology, and Python ha
tools that you can use to calculate them. SciPy, NumPy, and pandas correlation met
fast, comprehensive, and well-documented.
1. Negative correlation (red dots): In the plot on the left, the y values tend
as the x values increase. This shows strong negative correlation, which o
large values of one feature correspond to small values of the other, and v
2.Weak or no correlation (green dots): The plot in the middle shows no obv
trend. This is a form of weak correlation, which occurs when an assoc
between two features is not obvious or is hardly observable.
Positive correlation (blue dots): In the plot on the right, the y val
to increase as the x values increase. This illustrates strong pos
correlation, which occurs when large values of one feature corres
large values of the other, and vice versa.
In [ ]:
import pandas as pd
x = pd.Series(range(10, 20))
x
In [ ]:
In [ ]:
x.corr(y) # Pearson's r
In [ ]:
y.corr(x)
In [ ]:
In [ ]:
x.corr(y, method='kendall')
In [ ]:
localhost:8888/notebooks/anaconda3/Python/pandas.ipynb 9/10
04/05/2023, 10:44 pandas - Jupyter Notebook
localhost:8888/notebooks/anaconda3/Python/pandas.ipynb 10/10