0% found this document useful (0 votes)

10 views

pandas notes

The document provides an overview of data handling using the Pandas library in Python, explaining its importance for data manipulation and analysis. It details key data structures such as Series and DataFrame, their creation methods, and operations like indexing, slicing, and mathematical operations. Additionally, it highlights the advantages of using Pandas over NumPy for handling heterogeneous data types and performing data processing tasks.

Uploaded by

bhaveshrajwaniking

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

pandas notes

Uploaded by

bhaveshrajwaniking

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Data Handling Using Pandas - I

Library:

a library is a collection of modules (files containing Python code) that provide pre-written functions
and classes to help you perform common tasks without having to write the code from scratch.

Python libraries contain a collection of built in modules that allow us to perform many actions
without writing detailed programs for it

NumPy, Pandas and Matplotlib are three well-established Python libraries for scientific and analytical
use.

Numpy:

NumPy, which stands for ‘Numerical Python’, it is a package that can be used for numerical data
analysis and scientific computing. NumPy uses a multidimensional array object and has functions and
tools for working with these arrays.

Pandas:

PANDAS (PANel Data System) is a high-level data manipulation tool used for analysing data.It gives us
a single, convenient place to do most of our data analysis and visualisation work. Pandas has three
important data structures, namely – Series, DataFrame and Panel to make the process of analysing
data organised, effective and efficient.

The main author of pandas is wes mckinney.

Matplotlib:

The Matplotlib library in Python is used for plotting graphs and visualisation. Using Matplotlib, with
just a few lines of code we can generate publication quality plots, histograms, bar charts,
scatterplots, etc.

What are the need for Pandas?

1. A Numpy array requires homogeneous data, while a Pandas DataFrame can have different data
types (float, int, string, datetime, etc.).

2. Pandas have a simpler interface for operations like f ile loading, plotting, selection, joining, GROUP
BY, which come very handy in data-processing applications.

3. Pandas DataFrames (with column names) make it very easy to keep track of data.

4. Pandas is used when data is in Tabular Format, whereas Numpy is used for numeric array based
data manipulation

5. It can easily select subsets of data from bulky data sets and even combine multiple datasets
together.
6.It has functionality to find and fill missing values.

Data Structure in Pandas:

A data structure is a collection of data values and operations that can be applied to that data. It
enables efficient storage, retrieval and modification to the data.

Two commonly used data structures in Pandas:

• Series • DataFrame

Property series dataframe

dimesnsions 1-dimensional 2-dimensional
Type of data Homogenous: all the elements Heterogenous: dataframe
must be of same data type in a object can have elements of
series object. different data types
mutability Value mutable i.e their Value mutable i.e their
elements value can change. elements value can change.
Size mutable: size of a series Size mutable: size of a
object once created cannot dataframe object once created
change .if you want to can change in place .that is you
add/drop an element, internally can add/drop an element in an
a new series object will be existing dataframe object
created.

series :

A Series is a one-dimensional array containing a sequence of values of any data type (int, float, list,
string, etc) which by default have numeric data labels starting from zero.

Creation of Series:

(A) Creation of Series from Scalar Values:

(1)A Series can be created using scalar values as shown in the example below:
>>> import pandas as pd
>>> series1 = pd.Series(*10,20,30+)
>>> print(series1)
Output:
0 10
1 20
2 30
dtype: int64
that output is shown in two columns -index is on the left and the data value is on the right.
while creating a series, then by default indices range from 0 through N – 1.
(2) We can also assign user-defined labels to the index
>>> import pandas as pd
>>> series2 = pd.Series(*"Kavi","Shyam","Ra vi"+, index=*3,5,1+)
>>> print(series2)
Output:
3 Kavi
5 Shyam
1 Ravi
dtype: object
(3) We can also use letters or strings as indices
>>> import pandas as pd
>>> series2 = pd.Series(*2,3,4+,index=*"Feb","M ar","Apr"+)
>>> print(series2)
Output:
Feb 2
Mar 3
Apr 4
dtype: int64
B) Creation of Series from NumPy Arrays:
(1)We can create a series from a one-dimensional (1D) NumPy array
>>> import numpy as np
>>> import pandas as pd
>>> array1 = np.array(*1,2,3,4+)
>>> series3 = pd.Series(array1)
>>> print(series3)
Output:
01
12
23
34
dtype: int32
(2)
>>> import numpy as np
>>> import pandas as pd
>>> array1 = np.array(*1,2,3,4+)
>>> series4 = pd.Series(array1, index = *"Jan", "Feb", "Mar", "Apr"+)
>>> print(series4)
Output:
Jan 1
Feb 2
Mar 3
Apr 4
dtype: int32
note:- When index labels are passed with the array, then the length of the index and array
must be of the same size, else it will result in a ValueError
(C)Creation of Series from Dictionary
Python dictionary has key: value pairs and a value can be quickly retrieved when its key is
known.
>>> import pandas as pd
>>> dict1 = ,'India': 'NewDelhi', 'UK': 'London', 'Japan': 'Tokyo'-
>>> series8 = pd.Series(dict1)
>>> print(series8)
Output:
India NewDelhi
UK London
Japan Tokyo
dtype: object
Accessing Elements of a Series:
(A) Indexing
Indexes are of two types: positional index and labelled index.
Positional index takes an integer value that corresponds to its position in the series
starting from 0, whereas labelled index takes any user-defined label as index.
(1) >>> seriesNum = pd.Series(*10,20,30+)
>>> seriesNum*2+
Output:
30
(2) >>> seriesMnths = pd.Series(*2,3,4+,index=*"Feb ","Mar","Apr"+)
>>> seriesMnths*"Mar"+
Output:
3
(3) >>> seriesCapCntry**3,2++

Output:
France Paris
UK London
dtype: object
(4) >>> seriesCapCntry**'UK','USA'++
Output:
UK London
USA WashingtonDC
dtype: object
(B) Slicing:
we may need to extract a part of a series. This can be done through slicing.
We can define which part of the series is to be sliced by specifying the start and end
parameters *start :end+ with the series name. When we use positional indices for slicing,
the value at the endindex position is excluded.
(1) >>> import pandas as pd
>>> seriesCapCntry = pd.Series(*'NewDelhi', 'WashingtonDC', 'London', 'Paris'+,
index=*'India', 'USA', 'UK', 'France'+)
>>> seriesCapCntry*1:3+
output:
USA WashingtonDC
UK London
dtype: object
(2) If labelled indexes are used for slicing, then value at the end index label is also
included in the output, for example:
>>> import pandas as pd
>>> seriesCapCntry*'USA' : 'France'+
Output:
USA WashingtonDC
UK London
France Paris
dtype: object
(3) We can also get the series in reverse order, for example:
>>> import pandas as pd
>>> seriesCapCntry* : : -1+
Output:
France Paris
UK London
USA WashingtonDC
India NewDelhi
dtype: object
Attributes of Series:
Attribute Name Purpose example
name assigns a name to the >>> seriesCapCntry.name =
Series ‘Capitals’ >>>
print(seriesCapCntry)
India NewDelhi
USA WashingtonDC
UK London
France Paris
Name: Capitals, dtype: object
index.name assigns a name to the >>>seriesCapCntry.index.name
index of the series = ‘Countries’ >>>
print(seriesCapCntry) Countries
India NewDelhi
USA WashingtonDC
UK London
France Paris
Name: Capitals, dtype: object
values prints a list of the values >>>
in the series print(seriesCapCntry.values)
*‘NewDelhi’ ‘WashingtonDC’
‘London’ ‘Paris’+
size prints the number of >>> print(seriesCapCntry.size)
values in the Series 4
object
empty prints True if the series is >>> seriesCapCntry.empty
empty, and False False
otherwise # Create an empty series
seriesEmpt=pd.Series()
>>> seriesEmpt.empty
True

Methods of Series:

>>> import pandas as pd

>>> seriesTenTwenty=pd.Series(np.arange( 10, 20, 1 ))

>>> print(seriesTenTwenty)

Output:

0 10

1 11

2 12

3 13

4 14

5 15

6 16

7 17

8 18

9 19

dtype: int32

head(n) operation:

Returns the first n members of the series. If the value for n is not passed, then by default n takes
5 and the first five members are displayed.

(1)>>> import pandas as pd

>>> seriesTenTwenty.head(2)

Output:

0 10

1 11
dtype: int32

(2) >>> import pandas as pd

>>> seriesTenTwenty.head()
Output:

0 10
1 11
2 12
3 13
4 14

dtype: int32

tail(n) operation:

Returns the last n members of the series. If the value for n is not passed, then by default n takes
5 and the last five members are displayed.

(1) >>> import pandas as pd

>>> seriesTenTwenty.tail(2)
Output:
8 18
9 19
dtype: int32
(2) >>> import pandas as pd
>>> seriesTenTwenty.tail()
Output:
5 15
6 16
7 17
8 18
9 19
dtype: int32
count():
Returns the number of non-NaN values in the Series
>>> seriesTenTwenty.count()
10
Mathematical Operations on Series:
Consider the following series: seriesA and seriesB for understanding mathematical operations
on series in Pandas.
>>> seriesA = pd.Series(*1,2,3,4,5+, index = *'a', 'b', 'c', 'd', 'e'+)
>>> seriesA
a 1
b 2
c 3
d 4
e 5
dtype: int64
>>> seriesB = pd.Series(*10,20,-10,-50,100+, index = *'z', 'y', 'a', 'c', 'e'+)
>>> seriesB
z 10
y 20
a -10
c -50
e 100
dtype: int64
(A) Addition of two Series:
>>> seriesA + seriesB
a -9.0

b NaN

c -47.0

d NaN

e 105.0

y NaN

z NaN

dtype: float64

The second method is applied when we do not want to have NaN values in the output. We can
use the series method add() and a parameter fill_value to replace missing value with a
specified value.

>>> seriesA.add(seriesB, fill_value=0)

a -9.0

b 2.0

c -47.0

d 4.0

e 105.0

y 20.0

z 10.0

dtype: float64

B) Subtraction of two Series:

>>> seriesA – seriesB

a 11.0
b NaN

c 53.0

d NaN

e -95.0

y NaN

z NaN

dtype: float64

now replace the missing values with 1000

>>> seriesA.sub(seriesB, fill_value=1000)

a 11.0

b -998.0

c 53.0

d -996.0

e -95.0

y 980.0

z 990.0

dtype: float64

c) Multiplication of two Series:

>>>seriesA * seriesB
a -10.0

b NaN

c -150.0

d NaN

e 500.0

y NaN

z NaN

dtype: float64

>>> seriesA.mul(seriesB, fill_value=0)

a -10.0

b 0.0

c -150.0

d 0.0

e 500.0

y 0.0

z 0.0

dtype: float64

d) Division of two Series

>>> seriesA/seriesB
a -0.10

b NaN

c -0.06

d NaN

e 0.05

y NaN

z NaN

dtype: float64

>>> seriesA.div(seriesB, fill_value=0)

a -0.10

b inf

c -0.06

d inf

e 0.05

y 0.00

z 0.00

dtype: float64

dataFrame:
A DataFrame is another pandas structure ,which stores data in two dimensional way.It is
actually a two dimensional (tabular and spreedsheet like) labelled array,which is acutally an
ordered collection of columns where columns may store different types of data e.g numeric
or string or floating point etc.
Creation of DataFrame:
(A) Creation of an empty DataFrame
>>> import pandas as pd
>>> dFrameEmt = pd.DataFrame()
>>> dFrameEmt
Output:
Empty DataFrame
Columns: *+
Index: *+
(B) Creation of DataFrame from NumPy ndarrays
Consider the following three NumPy ndarrays. Let us create a simple DataFrame without
any column labels, using a single ndarray:
>>> import numpy as np
>>> import pandas as pd
>>> array1 = np.array(*10,20,30+)
>>> array2 = np.array(*100,200,300+)
>>> array3 = np.array(*-10,-20,-30, -40+)
>>> dFrame5 = pd.DataFrame(*array1, array3, array2+, columns=* 'A', 'B', 'C', 'D'+)
>>> dFrame5
Output:
A B C D
0 10 20 30 NaN
1 -10 -20 -30 -40.0
2 100 200 300 NaN
(C) Creation of DataFrame from List of Dictionaries
>>> import pandas as pd
>>> listDict = *,'a':10, 'b':20-, ,'a':5, 'b':10, 'c':20-+
>>> dFrameListDict = pd.DataFrame(listDict)
>>> dFrameListDict
Output:
a b c
0 10 20 NaN
1 5 10 20.0
Here, the dictionary keys are taken as column labels, and the values corresponding to
each key are taken as rows.
(D) Creation of DataFrame from Dictionary of Lists:
>>> import pandas as pd
>>> dictForest = ,'State': *'Assam', 'Delhi', 'Kerala'+, 'GArea': *78438, 1483, 38852+ , 'VDF' :
*2797, 6.72,1663+-
>>> dFrameForest= pd.DataFrame(dictForest)
>>> dFrameForest
Output:
State GArea VDF
0 Assam 78438 2797.00
1 Delhi 1483 6.72
2 Kerala 38852 1663.00
(E) Creation of DataFrame from Series :
>>> import pandas as pd
>>> seriesA = pd.Series(*1,2,3,4,5+, index = *'a', 'b', 'c', 'd', 'e'+)
>>>seriesB = pd.Series (*1000,2000,-1000,-5000,1000+, index = *'a', 'b', 'c', 'd', 'e'+)
>>>seriesC = pd.Series(*10,20,-10,-50,100+, index = *'z', 'y', 'a', 'c', 'e'+)
>>> dFrame7 = pd.DataFrame(*seriesA, seriesB+)
>>> dFrame7
Output:
a b c d e
0 1 2 3 4 5
1 1000 2000 -1000 -5000 1000
(F) Creation of DataFrame from Dictionary of Series:
>>> import pandas as pd
>>> ResultSheet=, 'Arnab': pd.Series(*90, 91, 97+, index=*'Maths','Science','Hindi'+),
'Ramit': pd.Series(*92, 81, 96+, index=*'Maths','Science','Hindi'+), 'Samridhi': pd.Series(*89,
91, 88+, index=*'Maths','Science','Hindi'+), 'Riya': pd.Series(*81, 71, 67+,
index=*'Maths','Science','Hindi'+), 'Mallika': pd.Series(*94, 95, 99+,
index=*'Maths','Science','Hindi'+)-
>>> ResultDF = pd.DataFrame(ResultSheet)
>>> ResultDF
Output:
Arnab Ramit Samridhi Riya Mallika
Maths 90 92 89 81 94
Science 91 81 91 71 95
Hindi 97 96 88 67 99
Operations on rows and columns in DataFrames:
(A) Adding a New Column to a DataFrame:
>>> import pandas as pd
>>> ResultSheet=, 'Arnab': pd.Series(*90, 91, 97+, index=*'Maths','Science','Hindi'+),
'Ramit': pd.Series(*92, 81, 96+, index=*'Maths','Science','Hindi'+), 'Samridhi':
pd.Series(*89, 91, 88+, index=*'Maths','Science','Hindi'+), 'Riya': pd.Series(*81, 71, 67+,
index=*'Maths','Science','Hindi'+), 'Mallika': pd.Series(*94, 95, 99+,
index=*'Maths','Science','Hindi'+)-
>>> ResultDF*'Preeti'+=*89,78,76+
>>> ResultDF = pd.DataFrame(ResultSheet)
>>> ResultDF
Output:
Arnab Ramit Samridhi Riya Mallika preeti
Maths 90 92 89 81 94 89
Science 91 81 91 71 95 78
Hindi 97 96 88 67 99 86
Note: Assigning values to a new column label that does not exist will create a new column
at the end. If the column already exists in the DataFrame then the assignment statement
will update the values of the already existing column
>>> ResultDF*'Ramit'+=*99, 98, 78+
>>> ResultDF
Output:
Arnab Ramit Samridhi Riya Mallika preeti
Maths 90 99 89 81 94 89
Science 91 98 91 71 95 78
Hindi 97 78 88 67 99 86
Note: We can also change data of an entire column to a particular value in a DataFrame.
>>> ResultDF*'Arnab'+=90
>>> ResultDF
Output:
Arnab Ramit Samridhi Riya Mallika preeti
Maths 90 99 89 81 94 89
Science 90 98 91 71 95 78
Hindi 90 78 88 67 99 86
(B) Adding a New Row to a DataFrame :

We can add a new row to a DataFrame using the DataFrame.loc* + method.

>>> ResultDF.loc*'English'+ = *85, 86, 83, 80, 90, 89+
>>> ResultDF
Output:
Arnab Ramit Samridhi Riya Mallika preeti
Maths 90 99 89 81 94 89
Science 91 98 91 71 95 78
Hindi 97 78 88 67 99 86
English 85 86 83 80 90 89
(C) Deleting Rows or Columns from a DataFrame:
We can use the DataFrame.drop() method to delete rows and columns from a
DataFrame.
To delete a row, the parameter axis is assigned the value 0 and for deleting a
column,the parameter axis is assigned the value 1.
>>> ResultDF
Output:
Arnab Ramit Samridhi Riya Mallika
Maths 90 92 89 81 94
Science 91 81 91 71 95
Hindi 97 96 88 67 99
English 85 86 83 80 90
(1) >>> ResultDF = ResultDF.drop('Science', axis=0)
>>> ResultDF
Output:
Arnab Ramit Samridhi Riya Mallika
Maths 90 92 89 81 94
Hindi 97 96 88 67 99
English 85 86 83 80 90
(2)>>> ResultDF = ResultDF.drop(*'Samridhi','Rami t','Riya'+, axis=1)
>>> ResultDF
Output:
Arnab Mallika
Maths 90 94
Hindi 97 99
English 95 95
Note: If the DataFrame has more than one row with the same label, the DataFrame.drop()
method will delete all the matching rows from it.
(D) Renaming Row and column Labels of a DataFrame :
We can change the labels of rows and columns in a DataFrame using the
DataFrame.rename() method.
The parameter axis='index' is used to specify that the row label is to be changed
The parameter axis='columns' implies we want to change the column labels
>>> ResultDF
Output:
Arnab Ramit Samridhi Riya Mallika
Maths 90 92 89 81 94
Science 91 81 91 71 95
Hindi 97 96 88 67 99
English 85 86 83 80 90
(1) >>> ResultDF=ResultDF.rename(,'Maths':'Sub1', ‘Science':'Sub2','English':'Sub3',
'Hindi':'Sub4'-, axis='index')
>>> print(ResultDF)
Output:
Arnab Ramit Samridhi Riya Mallika
Sub1 90 92 89 81 94
Sub2 91 81 91 71 95
Sub3 97 96 88 67 99
Sub4 85 86 83 80 90
Note:. If no new label is passed corresponding to an existing label, the existing row label
is left as it is
(2) >>> ResultDF=ResultDF.rename(,'Arnab':'Student1','Ramit':'Student2','
Samridhi':'Student3','Mallika':'Student4'-,axis='columns')
>>> print(ResultDF)
Output:
Student1 Student2 Student3 Riya Student4
Maths 90 92 89 81 94
Science 91 81 91 71 95
Hindi 97 96 88 67 99
Accessing DataFrames Element through Indexing:
(A) Label Based Indexing:
There are several methods in Pandas to implement label based indexing.
DataFrame.loc* + is an important method that is used for label based indexing with
DataFrames.
>>> ResultDF
Output:
Arnab Ramit Samridhi Riya Mallika
Maths 90 92 89 81 94
Science 91 81 91 71 95
Hindi 97 96 88 67 99

(1)>>> ResultDF.loc*'Science'+
Output:
Arnab 91
Ramit 81
Samridhi 91
Riya 71
Mallika 95
Name: Science, dtype: int64
(3) When a single column label is passed, it returns the column as a Series.
>>> ResultDF.loc*:,'Arnab'+
Output:
maths 90
Science 91
Hindi 97
Name: Arnab, dtype: int64
Also, we can obtain the same result that is the marks of ‘Arnab’ in all the subjects
by using the command:
>>> print(df*'Arnab'+)
(4) To read more than one row from a DataFrame, a list of row labels is used as
shown below. Note that using **++ returns a DataFrame.
>>> ResultDF.loc**'Science', 'Hindi'++
Output:
Arnab Ramit Samridhi Riya Mallika
Science 91 81 91 71 95
Hindi 97 96 88 67 99
B) Boolean Indexing:
Boolean means a binary variable that can represent either of the two states - True
(indicated by 1) or False (indicated by 0).
>>> ResultDF
Output:
Arnab Ramit Samridhi Riya Mallika
Maths 90 92 89 81 94
Science 91 81 91 71 95
Hindi 97 96 88 67 99
>>> ResultDF.loc*'Maths'+ > 90
Output:
Arnab False
Ramit True
Samridhi False
Riya False
Mallika True
Name: Maths, dtype: bool
To check in which subjects ‘Arnab’ has scored more than 90, we can write:
>>> ResultDF.loc*:,‘Arnab’+>90
Output:
Maths False
Science True
Hindi True
Name: Arnab, dtype: bool
Accessing DataFrames Element through Slicing:
(1)>>> ResultDF.loc*'Maths': 'Science'+
Output:
Arnab Ramit Samridhi Riya Mallika
Maths 90 92 89 81 94
Science 91 81 91 71 95
Note that in DataFrames slicing is inclusive of the end values.
(2)>>> ResultDF.loc*'Maths': 'Science', ‘Arnab’+
Maths 90
Science 91
Name: Arnab, dtype: int64
(3)>>> ResultDF.loc*'Maths': 'Science', ‘Arnab’:’Samridhi’+
Output:
Arnab Ramit Samridhi
Maths 90 92 89
Science 91 81 91
we may use a slice of labels with a list of column names to access values of those rows and
columns:
(4)>>> ResultDF.loc*'Maths': 'Science',*‘Arnab’,’Samridhi’++
Output:
Arnab Samridhi
Maths 90 89
Science 91 91
Filtering Rows in DataFrames:
In order to select or omit particular row(s), we can use a Boolean list specifying ‘True’ for the
rows to be shown and ‘False’ for the ones to be omitted in the output.
>>> ResultDF.loc**True, False, True++
Output:
Arnab Ramit Samridhi Riya Mallika
Maths 90 92 89 81 94
Hindi 97 96 88 67 99
Joining, Merging and Concatenation of DataFrames:
(A) Joining :
We can use the pandas.DataFrame.append() method to merge two DataFrames. It
appends rowsof the second DataFrame at the end of the first DataFrame. Columns
not present in the first DataFrame are added as new columns.
>>> dFrame1=pd.DataFrame(**1, 2, 3+, *4, 5+, *6++, columns=*'C1', 'C2', 'C3'+,
index=*'R1', 'R2', 'R3'+)
>>> dFrame1
C1 C2 C3
R1 1 2.0 3.0
R2 4 5.0 NaN
R3 6 NaN NaN
>>> dFrame2=pd.DataFrame(**10, 20+, *30+, *40, 50++, columns=*'C2', 'C5'+,
index=*'R4', 'R2', 'R5'+)
>>> dFrame2
C2 C5
R4 10 20.0
R2 30 NaN
R5 40 50.0
>>> dFrame1=dFrame1.append(dFrame2)
>>> dFrame1
C1 C2 C3 C5
R1 1.0 2.0 3.0 NaN
R2 4.0 5.0 NaN NaN
R3 6.0 NaN NaN NaN
R4 NaN 10.0 NaN 20.0
R2 NaN 30.0 NaN NaN
R5 NaN 40.0 NaN 50.0
if we append dFrame1 to dFrame2, the rows of dFrame2 precede the rows of
dFrame1. To get the column labels appear in sorted order we can set the parameter
sort=True. The column labels shall appear in unsorted order when the parameter sort
= False.
verify_integrity:
The parameter verify_integrity of append()method may be set to True when we want
to raise an error if the row labels are duplicate. By default, verify_integrity = False.
That is why we could append the duplicate row with label R2 when appending the
two DataFrames, as shown above.
ignore_index:
The parameter ignore_index of append()method may be set to True, when we do not
want to use row index labels. By default, ignore_index = False.
ImPortIng and exPortIng data between csV FILes and dataFrames:
Csv(comma separated value):
A CSV file (Comma-Separated Values file) is a plain text file that stores tabular data—
like a spreadsheet—in a simple format. Each line in the file represents a row of data,
and the values in that row are separated by commas (or sometimes other delimiters
like semicolons or tabs).
A Comma Separated Value (CSV) file is a text f ile where values are separated by
comma. Each line represents a record (row). Each row consists of one or more f ields
(columns). They can be easily handled through a spreadsheet application.

 Simple and widely supported (used in Excel, databases, Python, etc.).

 No formatting (just raw data—no fonts, colors, or formulas).
 Used for data exchange between systems or for import/export of data

Importing a CSV file to a DataFrame:

>>> marks = pd.read_csv("C:/NCERT/ResultData. csv",sep =",", header=0)

>>> marks
RollNo Name Eco Maths
0 1 Arnab 18 57
1 2 Kritika 23 45
2 3 Divyam 51 37
3 4 Vivaan 40 60
4 5 Aaroosh 18 27
• The first parameter to the read_csv() is the name of the comma separated data file
along with its path.
• The parameter sep specifies whether the values are separated by comma,
semicolon, tab, or any other character. The default value for sepis a space.
• The parameter header specifies the number of the row whose values are to be used
as the column names. It also marks the start of the data to be fetched. header=0
implies that column names are inferred from the first line of the file. By default,
header=0.
Exporting a DataFrame to a CSV file:
>>> ResultDF

Arnab Ramit Samridhi Riya Mallika

Maths 90 92 89 81 94
Science 91 81 91 71 95
Hindi 97 96 88 67 99

In case we do not want the column names to be saved to the file we may use the
parameter header=False. Another parameter index=False is used when we do not
want the row labels to be written to the file on disk.
>>> ResultDF.to_csv( 'C:/NCERT/resultonly.txt', sep = '@', header = False, index=
False)
If we open the file resultonly.txt, we will find the following contents:
90@92@89@81@94
91@81@91@71@95
97@96@88@67@99
Difference between Pandas Series and NumPy Arrays:
pandas Numpy
In series we can define our own labeled NumPy arrays are accessed by their
index to access elements of an array. integer position using numbers only.
These can be numbers or letters.
The elements can be indexed in The indexing starts with zero for the first
descending order also. element and the index is fixed.
If two series are not aligned, NaN or There is no concept of NaN values and if
missing values are generated. there are no matching values in arrays,
alignment fails
Series require more memory. NumPy occupies lesser memory

2023-YoungOnes Shortlist
No ratings yet
2023-YoungOnes Shortlist
84 pages
Ardaas With Explanation
No ratings yet
Ardaas With Explanation
25 pages
IP NOTES
No ratings yet
IP NOTES
20 pages
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
No ratings yet
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
135 pages
Unit I: Data Handling Using Pandas and Data Visualization: Marks:30
No ratings yet
Unit I: Data Handling Using Pandas and Data Visualization: Marks:30
75 pages
XII_ip_Panda_I_Part_I_2023 (1) 1 1
No ratings yet
XII_ip_Panda_I_Part_I_2023 (1) 1 1
25 pages
XII IP Ch 1 Python Pandas - I Series
No ratings yet
XII IP Ch 1 Python Pandas - I Series
45 pages
leip102
No ratings yet
leip102
36 pages
CH 2
No ratings yet
CH 2
36 pages
Data Handling Python NCERT
No ratings yet
Data Handling Python NCERT
36 pages
Ip 102
No ratings yet
Ip 102
36 pages
1 IP 12 NOTES PythonPandas 2022 PDF
100% (3)
1 IP 12 NOTES PythonPandas 2022 PDF
66 pages
Httpsncert.nic.Intextbookpdfleip102.PDF
No ratings yet
Httpsncert.nic.Intextbookpdfleip102.PDF
36 pages
Python Pandas
No ratings yet
Python Pandas
22 pages
Class 12 IP Ch-1, 2 3
No ratings yet
Class 12 IP Ch-1, 2 3
28 pages
Chapter 2 Data Handling using pandas - I(Series)
No ratings yet
Chapter 2 Data Handling using pandas - I(Series)
13 pages
Panda Ncert 1
No ratings yet
Panda Ncert 1
36 pages
Chapter 1 and 2 Series and Data Frame
No ratings yet
Chapter 1 and 2 Series and Data Frame
45 pages
Working With Pandas Notes
No ratings yet
Working With Pandas Notes
27 pages
Data Analytics Pandas
No ratings yet
Data Analytics Pandas
33 pages
Unit II Notes Revision
No ratings yet
Unit II Notes Revision
20 pages
Unit-1 Python Pandas (1)
No ratings yet
Unit-1 Python Pandas (1)
56 pages
Class12 Pandas Notes
No ratings yet
Class12 Pandas Notes
23 pages
ML Lab8
No ratings yet
ML Lab8
28 pages
Data Handlinng Using Pandas
No ratings yet
Data Handlinng Using Pandas
46 pages
Python Pandas
100% (1)
Python Pandas
35 pages
Introduction to Pandas & Data Structures
No ratings yet
Introduction to Pandas & Data Structures
11 pages
LAST MINUTES REVISION Pandas Series
No ratings yet
LAST MINUTES REVISION Pandas Series
6 pages
Python UnitIV
No ratings yet
Python UnitIV
20 pages
Ln. 1 - Data handling using Pandas - Series & Dataframe
No ratings yet
Ln. 1 - Data handling using Pandas - Series & Dataframe
14 pages
Pandas basics
No ratings yet
Pandas basics
21 pages
Data Handling using Pandas-1
No ratings yet
Data Handling using Pandas-1
23 pages
12ip 22 23
No ratings yet
12ip 22 23
188 pages
Data Handling Using Pandas - 1-2-1
No ratings yet
Data Handling Using Pandas - 1-2-1
10 pages
Pandas
No ratings yet
Pandas
20 pages
Python Pandas Series
No ratings yet
Python Pandas Series
45 pages
Exp 25_26
No ratings yet
Exp 25_26
17 pages
1 Data Handlinng Using Pandas-I
No ratings yet
1 Data Handlinng Using Pandas-I
46 pages
Class XII Data Handlinng Using PandasI
No ratings yet
Class XII Data Handlinng Using PandasI
46 pages
Data Handlinng Using Pandas-I
No ratings yet
Data Handlinng Using Pandas-I
46 pages
Python Pandas Series
No ratings yet
Python Pandas Series
30 pages
Dataframe Notes
No ratings yet
Dataframe Notes
47 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
25 pages
Ip Chapter 1
No ratings yet
Ip Chapter 1
36 pages
12 IP Questions
No ratings yet
12 IP Questions
181 pages
DV
No ratings yet
DV
53 pages
XII-IP-QuickRevision
No ratings yet
XII-IP-QuickRevision
26 pages
Pandas Ip PDF
100% (1)
Pandas Ip PDF
48 pages
Data Handling Using Pandas I - Series
No ratings yet
Data Handling Using Pandas I - Series
11 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
38 pages
Unit_III_part_2_1725700061785
No ratings yet
Unit_III_part_2_1725700061785
85 pages
Python Pandas
No ratings yet
Python Pandas
96 pages
14_Pandas
No ratings yet
14_Pandas
25 pages
Pandas Notes 1
No ratings yet
Pandas Notes 1
6 pages
PYTHON UNIT-5 Part-C
No ratings yet
PYTHON UNIT-5 Part-C
4 pages
Pandas Class 12 Ncertttt
No ratings yet
Pandas Class 12 Ncertttt
48 pages
Python Pandas
No ratings yet
Python Pandas
21 pages
Python Pandas (II)
No ratings yet
Python Pandas (II)
18 pages
P03 Introduction To Pandas Ans
No ratings yet
P03 Introduction To Pandas Ans
45 pages
Unit 2
No ratings yet
Unit 2
81 pages
Unit - 1 - Python Pandas
No ratings yet
Unit - 1 - Python Pandas
176 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Worldbuilding Spreadsheet
No ratings yet
Worldbuilding Spreadsheet
443 pages
You Exec - Annual Report Part4 Complete
No ratings yet
You Exec - Annual Report Part4 Complete
38 pages
Vapor Power System: Rankine Cycle
No ratings yet
Vapor Power System: Rankine Cycle
8 pages
RRL
No ratings yet
RRL
7 pages
Impact of Polymers On Society
No ratings yet
Impact of Polymers On Society
12 pages
Appendix B Microteaching Assessment and Lesson Plan Rubrics
No ratings yet
Appendix B Microteaching Assessment and Lesson Plan Rubrics
6 pages
The Application of Fairy Tales Entitled "The Gingerbread Man's"
No ratings yet
The Application of Fairy Tales Entitled "The Gingerbread Man's"
3 pages
A Final Exam-Composition
No ratings yet
A Final Exam-Composition
5 pages
Logistics Sector
No ratings yet
Logistics Sector
242 pages
Hxr-Mc1: Digital HD Video Camera Recorder
No ratings yet
Hxr-Mc1: Digital HD Video Camera Recorder
8 pages
AT.2903 - Audit Evidence and Documentation - The Framework
No ratings yet
AT.2903 - Audit Evidence and Documentation - The Framework
5 pages
AP Language and Composition Summer Assignment
No ratings yet
AP Language and Composition Summer Assignment
4 pages
Why Should Anyone Be Led You - PDF
No ratings yet
Why Should Anyone Be Led You - PDF
62 pages
People of The Storval Pateau
No ratings yet
People of The Storval Pateau
8 pages
7192 Part D1 Cons en
100% (4)
7192 Part D1 Cons en
126 pages
(Ebook) A Guide to General Dental Practice-v. 1, Relationships and Responses by Nick Priest (Author); Hardev Seehra (Author); Murray Wallace (Author) ISBN 9781138450684, 9781315324166, 9781315343150, 9781315379975, 9781498795135, 9781846190872, 1138450685, 1315324164, 1315343150 - The ebook is now available, just one click to start reading
100% (2)
(Ebook) A Guide to General Dental Practice-v. 1, Relationships and Responses by Nick Priest (Author); Hardev Seehra (Author); Murray Wallace (Author) ISBN 9781138450684, 9781315324166, 9781315343150, 9781315379975, 9781498795135, 9781846190872, 1138450685, 1315324164, 1315343150 - The ebook is now available, just one click to start reading
55 pages
FusionModule800 5.0 Smart Small Modular Data Center V100R022C10)
No ratings yet
FusionModule800 5.0 Smart Small Modular Data Center V100R022C10)
109 pages
A Second Marriage - Light On Vedic Astrology
100% (1)
A Second Marriage - Light On Vedic Astrology
2 pages
PHP QB
No ratings yet
PHP QB
5 pages
Biotic and Abiotic Factors
No ratings yet
Biotic and Abiotic Factors
46 pages
MOL Exercise Sol E
No ratings yet
MOL Exercise Sol E
28 pages
Nehru - A Critical Assessment
100% (3)
Nehru - A Critical Assessment
15 pages
Universities List
No ratings yet
Universities List
15 pages
Useful Vocabulary - Unit 1 - CPE
No ratings yet
Useful Vocabulary - Unit 1 - CPE
9 pages
(Cambridge Studies in Nineteenth-Century Literature and Culture) Hilary Fraser - Women Writing Art History in The Nineteenth Century - Looking Like A Woman (2014, Cambridge University Press)
No ratings yet
(Cambridge Studies in Nineteenth-Century Literature and Culture) Hilary Fraser - Women Writing Art History in The Nineteenth Century - Looking Like A Woman (2014, Cambridge University Press)
254 pages
MSDS Idrolin-K SM
No ratings yet
MSDS Idrolin-K SM
25 pages
Isla Tiner - Fears and Phobias
No ratings yet
Isla Tiner - Fears and Phobias
14 pages
NSTP Reflection Guide Questions
No ratings yet
NSTP Reflection Guide Questions
1 page

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

pandas notes

Uploaded by

pandas notes

Uploaded by

Data Handling Using Pandas - I

The main author of pandas is wes mckinney.

What are the need for Pandas?

Data Structure in Pandas:

Two commonly used data structures in Pandas:

Property series dataframe

(A) Creation of Series from Scalar Values:

>>> import pandas as pd

>>> seriesTenTwenty=pd.Series(np.arange( 10, 20, 1 ))

(1)>>> import pandas as pd

(2) >>> import pandas as pd

(1) >>> import pandas as pd

>>> seriesA.add(seriesB, fill_value=0)

B) Subtraction of two Series:

>>> seriesA – seriesB

now replace the missing values with 1000

>>> seriesA.sub(seriesB, fill_value=1000)

c) Multiplication of two Series:

>>> seriesA.mul(seriesB, fill_value=0)

d) Division of two Series

>>> seriesA.div(seriesB, fill_value=0)

We can add a new row to a DataFrame using the DataFrame.loc* + method.

 Simple and widely supported (used in Excel, databases, Python, etc.).

Importing a CSV file to a DataFrame:

>>> marks = pd.read_csv("C:/NCERT/ResultData. csv",sep =",", header=0)

Arnab Ramit Samridhi Riya Mallika

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.