Pandas Doc 1681445742
Pandas Doc 1681445742
4. What is a Series?
5. Labels
7. What is a DataFrame?
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 1/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
9. Data Cleaning
15 Basic operation
1) Define the Pandas/Python pandas?
Ans : Pandas is defined as an open-source library that provides high-performance data
manipulation in Python. The name of
Pandas is derived from the word Panel Data, which means an Econometrics from
Multidimensional data. It can be used for data
analysis in Python and developed by Wes McKinney in 2008. It can perform five significant
steps that are required for
processing and analysis of data irrespective of the origin of the data, i.e., load, manipulate,
prepare, model, and analyze.
Pandas can clean messy data sets, and make them readable and relevant.
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 2/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
Is there a correlation between two or more columns? What is average value? Max value? Min
value? Pandas are also able to delete rows that are not relevant, or contains wrong values, like
empty or NULL values. This is called cleaning the data.
Installation of Pandas
If you have Python and PIP already installed on a system, then installation of Pandas is very
easy.
If this command fails, then use a python distribution that already has Pandas installed like,
Anaconda, Spyder etc.
Import Pandas
Once Pandas is installed, import it in your applications by adding the import keyword:
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 3/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
1.5.3
What is a Series?
A Pandas Series is like a column in a table.
0 1
1 2
2 3
3 4
4 5
5 6
dtype: int64
Labels
If nothing else is specified, the values are labeled with their index number. First value has index
0, second value has index 1
etc.
Out[7]: 2
Create Labels
With the index argument, you can name your own labels.
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 4/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
Student Himanshu
Cricketer Virat
wrestler Roman
dtype: object
In [9]: # When you have created labels, you can access an item by referring to the labe
In [11]: print(myvar['Cricketer'])
Virat
What is a DataFrame?
A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table
with rows and columns.
}
# load data into pandas dataframe
df = pd.DataFrame(data)
print(df)
calories Duration
0 429 50
1 454 40
2 232 30
Locate Row
As you can see from the result above, the DataFrame is like a table with rows and columns.
Pandas use the loc attribute to return one or more specified row(s)
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 5/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
calories 429
Duration 50
Name: 0, dtype: int64
calories Duration
0 429 50
1 454 40
Named Indexes
With the index argument, you can name your own indexes.
calories duration
day1 478 40
day2 453 50
day3 423 30
In [16]: print(df.loc["day1"])
calories 478
duration 40
Name: day1, dtype: int64
In [17]: print(df.loc["day2"])
calories 453
duration 50
Name: day2, dtype: int64
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 6/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
In [19]: df
Out[19]:
Dress_ID Style Price Rating Size Season NeckLine SleeveLength waiseline
... ... ... ... ... ... ... ... ... ...
496 722565148 Sexy Low 4.3 free Summer o-neck full empire
499 919930954 Casual Low 4.4 free Summer v-neck short empire
In [21]: type(df)
Out[21]: pandas.core.frame.DataFrame
Out[22]:
Dress_ID Style Price Rating Size Season NeckLine SleeveLength waiseline M
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 7/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
Out[23]:
Dress_ID Style Price Rating Size Season NeckLine SleeveLength waiseline Mate
495 713391965 Casual Low 4.7 M Spring o-neck full natural poly
496 722565148 Sexy Low 4.3 free Summer o-neck full empire co
499 919930954 Casual Low 4.4 free Summer v-neck short empire co
In [25]: df1
Out[25]: 30 64 1 1.1
0 30 62 3 1
1 30 65 0 1
2 31 59 2 1
3 31 65 4 1
4 33 58 10 1
300 75 62 1 1
301 76 67 0 1
302 77 65 3 1
303 78 65 1 2
304 83 58 2 2
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 8/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
In [27]: df1
Out[27]: 0 1 2 3
0 30 64 1 1
1 30 62 3 1
2 30 65 0 1
3 31 59 2 1
4 31 65 4 1
301 75 62 1 1
302 76 67 0 1
303 77 65 3 1
304 78 65 1 2
305 83 58 2 2
In [29]: df1
Out[29]: Age of patient at time of Patient's year of Number of positive axillary Survival
Age
operation operation nodes detected status
0 30 64 1 1 NaN
1 30 62 3 1 NaN
2 30 65 0 1 NaN
3 31 59 2 1 NaN
4 31 65 4 1 NaN
301 75 62 1 1 NaN
302 76 67 0 1 NaN
303 77 65 3 1 NaN
304 78 65 1 2 NaN
305 83 58 2 2 NaN
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 9/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
In [31]: df2
Out[31]: Year Lag1 Lag2 Lag3 Lag4 Lag5 Volume Today Direction
... ... ... ... ... ... ... ... ... ...
1246 2005 0.043 0.422 0.252 -0.024 -0.584 1.28581 -0.955 Down
1248 2005 0.130 -0.955 0.043 0.422 0.252 1.42236 -0.298 Down
1249 2005 -0.298 0.130 -0.955 0.043 0.422 1.38254 -0.489 Down
In [33]: df3
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 10/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
In [35]: df4
In [36]: type(df4)
Out[36]: list
In [37]: len(df4)
Out[37]: 1
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 11/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
In [38]: df4[0]
Out[38]: Rk Player Pos Age Tm G GS MP FG FGA ... FT% ORB DRB TRB AST
Quincy
0 1 PF 24 NYK 68 22 1287 152 331 ... .784 79 222 301 68
Acy
Jordan
1 2 SG 20 MEM 30 0 248 35 86 ... .609 9 19 28 16
Adams
Steven
2 3 C 21 OKC 70 67 1771 217 399 ... .502 199 324 523 66
Adams
Jeff
3 4 PF 28 MIN 17 0 215 19 44 ... .579 23 54 77 15
Adrien
Arron
4 5 SG 29 TOT 78 72 2502 375 884 ... .843 27 220 247 129
Afflalo
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Thaddeus
670 490 PF 26 TOT 76 68 2434 451 968 ... .655 127 284 411 173
Young
Thaddeus
671 490 PF 26 MIN 48 48 1605 289 641 ... .682 75 170 245 135
Young
Thaddeus
672 490 PF 26 BRK 28 20 829 162 327 ... .606 52 114 166 38
Young
Cody
673 491 C 22 CHO 62 45 1487 172 373 ... .774 97 265 362 100
Zeller
Tyler
674 492 C 25 BOS 82 59 1731 340 619 ... .823 146 319 465 113
Zeller
json data..
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 12/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
In [40]: df5
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 13/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 14/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
30 rows × 30 columns
In [41]: js = pd.read_json('https://api.github.com/repos/pandas-dev/pandas/issues')
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 15/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
In [42]: js.columns
In [43]: js['user'][0]
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 16/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
In [45]: df
Out[45]:
Dress_ID Style Price Rating Size Season NeckLine SleeveLength waiseline
... ... ... ... ... ... ... ... ... ...
496 722565148 Sexy Low 4.3 free Summer o-neck full empire
499 919930954 Casual Low 4.4 free Summer v-neck short empire
Data Cleaning
Data cleaning means fixing bad data in your data set.
.Empty cells
.Wrong data
.Duplicates
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 17/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
In [48]: df
Out[48]:
Dress_ID Style Price Rating Size Season NeckLine SleeveLength waiseline
... ... ... ... ... ... ... ... ... ...
496 722565148 Sexy Low 4.3 free Summer o-neck full empire
499 919930954 Casual Low 4.4 free Summer v-neck short empire
In [49]: df.head()
Out[49]:
Dress_ID Style Price Rating Size Season NeckLine SleeveLength waiseline M
In [50]: df.tail()
Out[50]:
Dress_ID Style Price Rating Size Season NeckLine SleeveLength waiseline Mate
495 713391965 Casual Low 4.7 M Spring o-neck full natural poly
496 722565148 Sexy Low 4.3 free Summer o-neck full empire co
499 919930954 Casual Low 4.4 free Summer v-neck short empire co
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 18/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
Out[52]: 0 1006032852
1 1212192089
2 1190380701
3 966005983
4 876339541
...
495 713391965
496 722565148
497 532874347
498 655464934
499 919930954
Name: Dress_ID, Length: 500, dtype: int64
Out[53]: pandas.core.series.Series
In [54]: df.dtypes
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 19/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
In [55]: df[['Dress_ID']]
Out[55]: Dress_ID
0 1006032852
1 1212192089
2 1190380701
3 966005983
4 876339541
... ...
495 713391965
496 722565148
497 532874347
498 655464934
499 919930954
In [56]: type(df[['Dress_ID']])
Out[56]: pandas.core.frame.DataFrame
In [57]: df['Dress_ID']
Out[57]: 0 1006032852
1 1212192089
2 1190380701
3 966005983
4 876339541
...
495 713391965
496 722565148
497 532874347
498 655464934
499 919930954
Name: Dress_ID, Length: 500, dtype: int64
In [58]: type(df['Dress_ID'])
Out[58]: pandas.core.series.Series
In [59]: df.columns
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 20/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
In [60]: df[['Dress_ID','Style','Price','Material']]
In [61]: df.describe().T
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 21/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
In [62]: df.dtypes
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 22/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
In [65]: #list of all the columns that datatpes columns is object ....
df.dtypes[df.dtypes =='object'].index
In [66]: # know i filter out of all the this object datatpes columns into dataframe
df[df.dtypes[df.dtypes =='object'].index]
Out[66]:
Style Price Size Season NeckLine SleeveLength waiseline Material FabricType
... ... ... ... ... ... ... ... ... ...
496 Sexy Low free Summer o-neck full empire cotton NaN
499 Casual Low free Summer v-neck short empire cotton Corduroy
Out[67]:
Style Price Size Season NeckLine SleeveLength waiseline Material FabricType
count 500 498 500 498 497 498 413 372 234
unique 13 7 7 8 16 17 4 23 22
freq 232 252 177 159 271 223 304 152 135
Out[68]: pandas.core.series.Series
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 23/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
In [69]: df.dtypes
In [70]: ##list of all the columns that datatpes columns float ....
df.dtypes[df.dtypes == 'float'].index
Out[71]: Rating
count 500.000000
mean 3.528600
std 2.005364
min 0.000000
25% 3.700000
50% 4.600000
75% 4.800000
max 5.000000
In [72]: df.columns
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 24/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
In [73]: df['Dress_ID']
Out[73]: 0 1006032852
1 1212192089
2 1190380701
3 966005983
4 876339541
...
495 713391965
496 722565148
497 532874347
498 655464934
499 919930954
Name: Dress_ID, Length: 500, dtype: int64
Out[74]: 0 1006032852
1 1212192089
2 1190380701
3 966005983
Name: Dress_ID, dtype: int64
Out[75]: 1 1212192089
3 966005983
5 1068332458
7 1219677488
9 985292672
11 898481530
13 749031896
15 1162628131
17 830467746
19 1113221101
21 856178100
23 840516484
25 1139843344
27 1235426503
29 629131530
31 1150275464
33 978773911
35 640823350
37 1060207186
39 941190190
Name: Dress_ID, dtype: int64
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 25/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
In [76]: df
Out[76]:
Dress_ID Style Price Rating Size Season NeckLine SleeveLength waiseline
... ... ... ... ... ... ... ... ... ...
496 722565148 Sexy Low 4.3 free Summer o-neck full empire
499 919930954 Casual Low 4.4 free Summer v-neck short empire
In [78]: df
Out[78]:
Dress_ID Style Price Rating Size Season NeckLine SleeveLength waiseline
... ... ... ... ... ... ... ... ... ...
496 722565148 Sexy Low 4.3 free Summer o-neck full empire
499 919930954 Casual Low 4.4 free Summer v-neck short empire
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 26/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
In [79]: df.isnull().sum()
Out[79]: Dress_ID 0
Style 0
Price 2
Rating 0
Size 0
Season 2
NeckLine 3
SleeveLength 2
waiseline 87
Material 128
FabricType 266
Decoration 236
Pattern Type 109
Recommendation 0
Style1 0
dtype: int64
In [102]: new_df
Out[102]:
Dress_ID Style Price Rating Size Season NeckLine SleeveLength waiseline
... ... ... ... ... ... ... ... ... ...
490 641665398 Casual Low 4.8 free winter bowneck full natural
493 817353671 bohemian Low 4.6 free Summer o-neck sleevless natural
499 919930954 Casual Low 4.4 free Summer v-neck short empire
99 rows × 15 columns
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 27/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
In [103]: new_df.isnull().sum()
Out[103]: Dress_ID 0
Style 0
Price 0
Rating 0
Size 0
Season 0
NeckLine 0
SleeveLength 0
waiseline 0
Material 0
FabricType 0
Decoration 0
Pattern Type 0
Recommendation 0
Style1 0
dtype: int64
The dropna() method removes the rows that contains NULL values.
The dropna() method returns a new DataFrame object unless the inplace parameter is set
to True, in that case the dropna() method does the removing in the original DataFrame
instead.
Discovering Duplicates
Duplicate rows are rows that have been registered more than one time.
In [104]: print(new_df.duplicated())
3 False
4 False
8 False
10 False
28 False
...
488 False
490 False
493 False
498 False
499 False
Length: 99, dtype: bool
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 28/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
In [106]: new_df.duplicated().sum()
Out[106]: 0
3 False
4 False
8 False
10 False
28 False
...
488 False
490 False
493 False
498 False
499 False
Length: 99, dtype: bool
In [108]: df.duplicated().sum()
Out[108]: 0
In [109]: new_df.columns
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 29/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
Out[111]:
Dress_ID Style Price Rating Size Season NeckLine SleeveLength waiseline M
326 912614690 bohemian Low 5.0 free Spring o-neck short natural
327 1072784739 Casual low 5.0 free Spring bowneck sleevless natural
358 818295153 bohemian low 5.0 free Summer o-neck sleevless empire
410 932913675 Casual Low 5.0 free Summer o-neck sleevless natural
In [112]: df['Season']
Out[112]: 3 Spring
4 Summer
8 Spring
10 Summer
28 Automn
...
488 Summer
490 winter
493 Summer
498 winter
499 Summer
Name: Season, Length: 99, dtype: object
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 30/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
Out[114]:
Dress_ID Style Price Rating Size Season NeckLine SleeveLength waiselin
127 893330898 Sexy Low 4.6 free Summer v-neck sleevless natur
peterpan-
179 1167448608 cute low 0.0 free Summer short empi
collor
217 756620535 Casual Low 4.6 free Summer o-neck full natur
312 1246945687 Novelty Average 0.0 free Summer o-neck full natur
356 944930838 vintage low 4.0 free Summer turndowncollor short natur
358 818295153 bohemian low 5.0 free Summer o-neck sleevless empi
370 1039384371 Casual Average 4.8 free Summer o-neck sleevless natur
375 1069577979 Casual Low 4.0 free Summer v-neck sleevless natur
410 932913675 Casual Low 5.0 free Summer o-neck sleevless natur
417 934830377 bohemian Low 4.6 free Summer v-neck sleevless natur
493 817353671 bohemian Low 4.6 free Summer o-neck sleevless natur
499 919930954 Casual Low 4.4 free Summer v-neck short empi
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 31/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
Out[115]: 3 print
4 dot
8 solid
10 solid
28 striped
...
488 solid
490 solid
493 solid
498 print
499 solid
Name: Pattern Type, Length: 99, dtype: object
Out[116]:
Dress_ID Style Price Rating Size Season NeckLine SleeveLength waiseline
... ... ... ... ... ... ... ... ... ...
slash-
487 1223469038 Sexy Average 0.0 free winter sleevless natural
neck
490 641665398 Casual Low 4.8 free winter bowneck full natural
493 817353671 bohemian Low 4.6 free Summer o-neck sleevless natural
499 919930954 Casual Low 4.4 free Summer v-neck short empire
62 rows × 15 columns
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 32/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
Out[118]:
Dress_ID Style Price Rating Size Season NeckLine SleeveLength waiseline
... ... ... ... ... ... ... ... ... ...
490 641665398 Casual Low 4.8 free winter bowneck full natural
493 817353671 bohemian Low 4.6 free Summer o-neck sleevless natural
499 919930954 Casual Low 4.4 free Summer v-neck short empire
75 rows × 15 columns
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 33/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
Out[121]:
Dress_ID Style Price Rating Size Season NeckLine SleeveLength waiseline
108 1002440915 Casual Medium 0.0 free Spring o-neck sleevless natural
peterpan-
179 1167448608 cute low 0.0 free Summer short empire
collor
312 1246945687 Novelty Average 0.0 free Summer o-neck full natural
430 907669618 Sexy Average 0.0 free Spring v-neck full natural
443 1249825438 Sexy Average 0.0 free Autumn o-neck full natural
477 1122991519 Sexy Low 0.0 free winter o-neck sleevless natural
479 974438263 cute Low 0.0 free Spring v-neck sleevless natural
481 1061890181 Casual Average 0.0 L Spring o-neck sleevless natural chi
486 1109819647 Casual Average 0.0 free winter o-neck short natural
slash-
487 1223469038 Sexy Average 0.0 free winter sleevless natural
neck
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 34/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
In [154]: df7
2/24
0 10107 30 95.70 2 2871.00
8/25
3 10145 45 83.26 6 3746.70
10/10
4 10159 49 100.00 14 5205.27
12/2
2818 10350 20 100.00 15 2244.40
1/31
2819 10373 29 100.00 1 3978.51
3/28
2821 10397 34 62.24 1 2116.16
In [160]: df7.columns
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 35/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
Out[173]: 0 USA
1 France
2 France
3 USA
4 USA
...
2816 Denmark
2817 USA
2819 Finland
2820 Spain
2822 USA
Name: COUNTRY, Length: 2053, dtype: object
11. Find out the that Month_ID that have maximum sales ?
Out[170]: 598 4
Name: MONTH_ID, dtype: int64
Out[172]: 2249 5
Name: MONTH_ID, dtype: int64
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 36/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
Data Correlations
The corr() method calculates the relationship between each column in your data set.
In [175]: df7.corr()
C:\Users\Balodi\AppData\Local\Temp\ipykernel_18652\2906877236.py:1: FutureWar
ning: The default value of numeric_only in DataFrame.corr is deprecated. In a
future version, it will default to False. Select only valid columns or specif
y the value of numeric_only to silence this warning.
df7.corr()
The Result of the corr() method is a table with a lot of numbers that represents how well the
relationship is between two columns.
1 means that there is a 1 to 1 relationship (a perfect correlation), and for this data set, each time
a value went up in the first column, the other one went up as well.
0.9 is also a good relationship, and if you increase one value, the other will probably increase as
well.
-0.9 would be just as good relationship as 0.9, but if you increase one value, the other will
probably go down.
0.2 means NOT a good relationship, meaning that if one value goes up does not mean that the
other will.
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 37/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
In [84]: df = pd.DataFrame(data)
In [85]: # Note : which list is key and paire must be the same .
Out[87]:
name salary mail_id addr
In [89]: df.iloc[5:6]
Out[89]:
name salary mail_id addr
In [90]: # Its not given any dataset because iloc always goes after the a defaut indexes
In [91]: df.iloc[1:3]
Out[91]:
name salary mail_id addr
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 38/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
In [94]: df1
Out[94]:
pf_nu income_tax mobile_no course
In [95]: df
Out[95]:
name salary mail_id addr
In [96]: pd.concat([df,df1])
Out[96]:
name salary mail_id addr pf_nu income_tax mobile_no course
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 39/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
In [98]: pd.concat([df,df1],axis=1)
Out[98]:
name salary mail_id addr pf_nu income_tax mobile_no course
In [103]: pd.concat([df3,df4])
Out[103]:
0 1 2 3
merge operation
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 40/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
In [141]: df5
Out[141]:
emp_id salary Pf
In [143]: df6
Out[143]:
emp_id mob_no house
In [145]: pd.merge(df5,df6)
Out[145]:
emp_id salary Pf mob_no house
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 41/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
Out[147]:
emp_id salary Pf mob_no house
Out[149]:
emp_id salary Pf mob_no house
Out[154]:
emp_id salary Pf mob_no house
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 42/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
In [162]: df7
Out[162]:
emp_id1 salary Pf
In [164]: df8
Out[164]:
emp_id2 mob_no house
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 43/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
In [166]: pd.merge(df7,df8)
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 44/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
---------------------------------------------------------------------------
MergeError Traceback (most recent call last)
Cell In[166], line 1
----> 1 pd.merge(df7,df8)
In [167]: # so how to take the merge operation here these to df let's do.
Out[169]:
emp_id1 salary Pf emp_id2 mob_no house
In [175]: df9
Out[175]:
emp_id salary Pf
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 46/47
4/13/23, 9:16 PM Pandas_doc - Jupyter Notebook
In [177]: df10
Out[177]:
emp_id mob_no house
In [ ]:
localhost:8888/notebooks/Downloads/Pandas_doc.ipynb# 47/47