12 Pandas
12 Pandas
#list
myList = ["The", "earth", "revolves", "around", "sun"]
print(myList) #printing list
Output:
['The', 'earth', 'revolves', 'around', 'sun']
names = ['Bob','Jessica','Mary','John','Mel']
births = [968, 155, 77, 578, 973]
BabyDataSet = list(zip(names,births))
print(BabyDataSet)
df.to_csv('demo.csv')
Output
⦿ Output
Adding a new column by passing as Series:
one two three
a 1.0 1 10.0
b 2.0 2 20.0
c 3.0 3 30.0
d NaN 4 NaN
Adding a new column using the existing columns in DataFrame:
one two three four
a 1.0 1 10.0 11.0
b 2.0 2 20.0 22.0
c 3.0 3 30.0 33.0
d NaN 4 NaN
MAX
# importing pandas as pd
Output:
import pandas as pd
import pandas as pd
import numpy as np
#Create a DataFrame
d = { 'Name':['Alisa','Bobby','Cathrine','Madonna','Rocky',
'Sebastian','Jaqluine', 'Rahul','David','Andrew','Ajay','Teresa'],
'Score1':[62,47,55,74,31,77,85,63,42,32,71,57],
'Score2':[89,87,67,55,47,72,76,79,44,92,99,69]}
df = pd.DataFrame(d)
df
# mean of the dataframe
df.mean()
Output:
Score1 58.0
Score2 73.0
dtype: float64
Sorting :
from pandas import DataFrame
import pandas as pd
d = {'one':[2,3,1,4,5], 'two':[5,4,3,2,1], 'letter':['a','a','b','b','c']}
df = DataFrame(d)
test = df.sort_values(['one'], ascending=[False])
name
state
import pandas as pd
import numpy as np hp 1
hr 2
pb 3
df1 =
pd.read_csv('datasets/stackdata
setexample.csv')
print(df1) pb 3
#print Hr 2
(df1.groupby(["state"])[['name']]. hp 1
count()) Name: state, dtype: int64
j=df1['state'].value_counts()
print(j)
Drop Duplicate and missing value
Duplicate data Missing data
5
Joins
df_a df_b
subject_id first_name last_name first_nam
last_name
subject_id e
0 1 Ajay Anderson
0 4 Billy Bonder
1 2 Abhi Ackerman
1 5 Navi Black
2 3 Aman Ali
2 6 Swati Balwner
3 4 Avi Aoni
3 7 Shivali Brice
4 5 Aksh Atiches
4 8 Kamal Btisan
df_new
last_name
subject_id first_name
0 1 Ajay Anderson
1 2 Abhi Ackerman
2 3 Aman Ali
3 4 Avi Aoni
4 5 Aksh Atiches df_new = pd.concat([df_a, df_b])
0 4 Billy Bonder df_new
1 5 Navi Black
2 6 Swati Balwner
3 7 Shivali Brice
4 8 Kamal Btisan
pd.concat([df_a, df_b], axis=1)