0% found this document useful (0 votes)
10 views

pandas notes

The document provides an overview of data handling using the Pandas library in Python, explaining its importance for data manipulation and analysis. It details key data structures such as Series and DataFrame, their creation methods, and operations like indexing, slicing, and mathematical operations. Additionally, it highlights the advantages of using Pandas over NumPy for handling heterogeneous data types and performing data processing tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

pandas notes

The document provides an overview of data handling using the Pandas library in Python, explaining its importance for data manipulation and analysis. It details key data structures such as Series and DataFrame, their creation methods, and operations like indexing, slicing, and mathematical operations. Additionally, it highlights the advantages of using Pandas over NumPy for handling heterogeneous data types and performing data processing tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Data Handling Using Pandas - I

Library:

a library is a collection of modules (files containing Python code) that provide pre-written functions
and classes to help you perform common tasks without having to write the code from scratch.

Python libraries contain a collection of built in modules that allow us to perform many actions
without writing detailed programs for it

NumPy, Pandas and Matplotlib are three well-established Python libraries for scientific and analytical
use.

Numpy:

NumPy, which stands for ‘Numerical Python’, it is a package that can be used for numerical data
analysis and scientific computing. NumPy uses a multidimensional array object and has functions and
tools for working with these arrays.

Pandas:

PANDAS (PANel Data System) is a high-level data manipulation tool used for analysing data.It gives us
a single, convenient place to do most of our data analysis and visualisation work. Pandas has three
important data structures, namely – Series, DataFrame and Panel to make the process of analysing
data organised, effective and efficient.

The main author of pandas is wes mckinney.

Matplotlib:

The Matplotlib library in Python is used for plotting graphs and visualisation. Using Matplotlib, with
just a few lines of code we can generate publication quality plots, histograms, bar charts,
scatterplots, etc.

What are the need for Pandas?

1. A Numpy array requires homogeneous data, while a Pandas DataFrame can have different data
types (float, int, string, datetime, etc.).

2. Pandas have a simpler interface for operations like f ile loading, plotting, selection, joining, GROUP
BY, which come very handy in data-processing applications.

3. Pandas DataFrames (with column names) make it very easy to keep track of data.

4. Pandas is used when data is in Tabular Format, whereas Numpy is used for numeric array based
data manipulation

5. It can easily select subsets of data from bulky data sets and even combine multiple datasets
together.
6.It has functionality to find and fill missing values.

Data Structure in Pandas:

A data structure is a collection of data values and operations that can be applied to that data. It
enables efficient storage, retrieval and modification to the data.

Two commonly used data structures in Pandas:

• Series • DataFrame

Property series dataframe


dimesnsions 1-dimensional 2-dimensional
Type of data Homogenous: all the elements Heterogenous: dataframe
must be of same data type in a object can have elements of
series object. different data types
mutability Value mutable i.e their Value mutable i.e their
elements value can change. elements value can change.
Size mutable: size of a series Size mutable: size of a
object once created cannot dataframe object once created
change .if you want to can change in place .that is you
add/drop an element, internally can add/drop an element in an
a new series object will be existing dataframe object
created.

series :

A Series is a one-dimensional array containing a sequence of values of any data type (int, float, list,
string, etc) which by default have numeric data labels starting from zero.

Creation of Series:

(A) Creation of Series from Scalar Values:


(1)A Series can be created using scalar values as shown in the example below:
>>> import pandas as pd
>>> series1 = pd.Series(*10,20,30+)
>>> print(series1)
Output:
0 10
1 20
2 30
dtype: int64
that output is shown in two columns -index is on the left and the data value is on the right.
while creating a series, then by default indices range from 0 through N – 1.
(2) We can also assign user-defined labels to the index
>>> import pandas as pd
>>> series2 = pd.Series(*"Kavi","Shyam","Ra vi"+, index=*3,5,1+)
>>> print(series2)
Output:
3 Kavi
5 Shyam
1 Ravi
dtype: object
(3) We can also use letters or strings as indices
>>> import pandas as pd
>>> series2 = pd.Series(*2,3,4+,index=*"Feb","M ar","Apr"+)
>>> print(series2)
Output:
Feb 2
Mar 3
Apr 4
dtype: int64
B) Creation of Series from NumPy Arrays:
(1)We can create a series from a one-dimensional (1D) NumPy array
>>> import numpy as np
>>> import pandas as pd
>>> array1 = np.array(*1,2,3,4+)
>>> series3 = pd.Series(array1)
>>> print(series3)
Output:
01
12
23
34
dtype: int32
(2)
>>> import numpy as np
>>> import pandas as pd
>>> array1 = np.array(*1,2,3,4+)
>>> series4 = pd.Series(array1, index = *"Jan", "Feb", "Mar", "Apr"+)
>>> print(series4)
Output:
Jan 1
Feb 2
Mar 3
Apr 4
dtype: int32
note:- When index labels are passed with the array, then the length of the index and array
must be of the same size, else it will result in a ValueError
(C)Creation of Series from Dictionary
Python dictionary has key: value pairs and a value can be quickly retrieved when its key is
known.
>>> import pandas as pd
>>> dict1 = ,'India': 'NewDelhi', 'UK': 'London', 'Japan': 'Tokyo'-
>>> series8 = pd.Series(dict1)
>>> print(series8)
Output:
India NewDelhi
UK London
Japan Tokyo
dtype: object
Accessing Elements of a Series:
(A) Indexing
Indexes are of two types: positional index and labelled index.
Positional index takes an integer value that corresponds to its position in the series
starting from 0, whereas labelled index takes any user-defined label as index.
(1) >>> seriesNum = pd.Series(*10,20,30+)
>>> seriesNum*2+
Output:
30
(2) >>> seriesMnths = pd.Series(*2,3,4+,index=*"Feb ","Mar","Apr"+)
>>> seriesMnths*"Mar"+
Output:
3
(3) >>> seriesCapCntry**3,2++

Output:
France Paris
UK London
dtype: object
(4) >>> seriesCapCntry**'UK','USA'++
Output:
UK London
USA WashingtonDC
dtype: object
(B) Slicing:
we may need to extract a part of a series. This can be done through slicing.
We can define which part of the series is to be sliced by specifying the start and end
parameters *start :end+ with the series name. When we use positional indices for slicing,
the value at the endindex position is excluded.
(1) >>> import pandas as pd
>>> seriesCapCntry = pd.Series(*'NewDelhi', 'WashingtonDC', 'London', 'Paris'+,
index=*'India', 'USA', 'UK', 'France'+)
>>> seriesCapCntry*1:3+
output:
USA WashingtonDC
UK London
dtype: object
(2) If labelled indexes are used for slicing, then value at the end index label is also
included in the output, for example:
>>> import pandas as pd
>>> seriesCapCntry*'USA' : 'France'+
Output:
USA WashingtonDC
UK London
France Paris
dtype: object
(3) We can also get the series in reverse order, for example:
>>> import pandas as pd
>>> seriesCapCntry* : : -1+
Output:
France Paris
UK London
USA WashingtonDC
India NewDelhi
dtype: object
Attributes of Series:
Attribute Name Purpose example
name assigns a name to the >>> seriesCapCntry.name =
Series ‘Capitals’ >>>
print(seriesCapCntry)
India NewDelhi
USA WashingtonDC
UK London
France Paris
Name: Capitals, dtype: object
index.name assigns a name to the >>>seriesCapCntry.index.name
index of the series = ‘Countries’ >>>
print(seriesCapCntry) Countries
India NewDelhi
USA WashingtonDC
UK London
France Paris
Name: Capitals, dtype: object
values prints a list of the values >>>
in the series print(seriesCapCntry.values)
*‘NewDelhi’ ‘WashingtonDC’
‘London’ ‘Paris’+
size prints the number of >>> print(seriesCapCntry.size)
values in the Series 4
object
empty prints True if the series is >>> seriesCapCntry.empty
empty, and False False
otherwise # Create an empty series
seriesEmpt=pd.Series()
>>> seriesEmpt.empty
True

Methods of Series:

>>> import pandas as pd

>>> seriesTenTwenty=pd.Series(np.arange( 10, 20, 1 ))

>>> print(seriesTenTwenty)

Output:

0 10

1 11

2 12

3 13

4 14

5 15

6 16

7 17

8 18

9 19

dtype: int32

head(n) operation:

Returns the first n members of the series. If the value for n is not passed, then by default n takes
5 and the first five members are displayed.

(1)>>> import pandas as pd

>>> seriesTenTwenty.head(2)

Output:

0 10

1 11
dtype: int32

(2) >>> import pandas as pd

>>> seriesTenTwenty.head()
Output:

0 10
1 11
2 12
3 13
4 14

dtype: int32

tail(n) operation:

Returns the last n members of the series. If the value for n is not passed, then by default n takes
5 and the last five members are displayed.

(1) >>> import pandas as pd


>>> seriesTenTwenty.tail(2)
Output:
8 18
9 19
dtype: int32
(2) >>> import pandas as pd
>>> seriesTenTwenty.tail()
Output:
5 15
6 16
7 17
8 18
9 19
dtype: int32
count():
Returns the number of non-NaN values in the Series
>>> seriesTenTwenty.count()
10
Mathematical Operations on Series:
Consider the following series: seriesA and seriesB for understanding mathematical operations
on series in Pandas.
>>> seriesA = pd.Series(*1,2,3,4,5+, index = *'a', 'b', 'c', 'd', 'e'+)
>>> seriesA
a 1
b 2
c 3
d 4
e 5
dtype: int64
>>> seriesB = pd.Series(*10,20,-10,-50,100+, index = *'z', 'y', 'a', 'c', 'e'+)
>>> seriesB
z 10
y 20
a -10
c -50
e 100
dtype: int64
(A) Addition of two Series:
>>> seriesA + seriesB
a -9.0

b NaN

c -47.0

d NaN

e 105.0

y NaN

z NaN

dtype: float64

The second method is applied when we do not want to have NaN values in the output. We can
use the series method add() and a parameter fill_value to replace missing value with a
specified value.

>>> seriesA.add(seriesB, fill_value=0)

a -9.0

b 2.0

c -47.0

d 4.0

e 105.0

y 20.0

z 10.0

dtype: float64

B) Subtraction of two Series:

>>> seriesA – seriesB

a 11.0
b NaN

c 53.0

d NaN

e -95.0

y NaN

z NaN

dtype: float64

now replace the missing values with 1000

>>> seriesA.sub(seriesB, fill_value=1000)

a 11.0

b -998.0

c 53.0

d -996.0

e -95.0

y 980.0

z 990.0

dtype: float64

c) Multiplication of two Series:


>>>seriesA * seriesB
a -10.0

b NaN

c -150.0

d NaN

e 500.0

y NaN

z NaN

dtype: float64

>>> seriesA.mul(seriesB, fill_value=0)


a -10.0

b 0.0

c -150.0

d 0.0

e 500.0

y 0.0

z 0.0

dtype: float64

d) Division of two Series


>>> seriesA/seriesB
a -0.10

b NaN

c -0.06

d NaN

e 0.05

y NaN

z NaN

dtype: float64

>>> seriesA.div(seriesB, fill_value=0)

a -0.10

b inf

c -0.06

d inf

e 0.05

y 0.00

z 0.00

dtype: float64

dataFrame:
A DataFrame is another pandas structure ,which stores data in two dimensional way.It is
actually a two dimensional (tabular and spreedsheet like) labelled array,which is acutally an
ordered collection of columns where columns may store different types of data e.g numeric
or string or floating point etc.
Creation of DataFrame:
(A) Creation of an empty DataFrame
>>> import pandas as pd
>>> dFrameEmt = pd.DataFrame()
>>> dFrameEmt
Output:
Empty DataFrame
Columns: *+
Index: *+
(B) Creation of DataFrame from NumPy ndarrays
Consider the following three NumPy ndarrays. Let us create a simple DataFrame without
any column labels, using a single ndarray:
>>> import numpy as np
>>> import pandas as pd
>>> array1 = np.array(*10,20,30+)
>>> array2 = np.array(*100,200,300+)
>>> array3 = np.array(*-10,-20,-30, -40+)
>>> dFrame5 = pd.DataFrame(*array1, array3, array2+, columns=* 'A', 'B', 'C', 'D'+)
>>> dFrame5
Output:
A B C D
0 10 20 30 NaN
1 -10 -20 -30 -40.0
2 100 200 300 NaN
(C) Creation of DataFrame from List of Dictionaries
>>> import pandas as pd
>>> listDict = *,'a':10, 'b':20-, ,'a':5, 'b':10, 'c':20-+
>>> dFrameListDict = pd.DataFrame(listDict)
>>> dFrameListDict
Output:
a b c
0 10 20 NaN
1 5 10 20.0
Here, the dictionary keys are taken as column labels, and the values corresponding to
each key are taken as rows.
(D) Creation of DataFrame from Dictionary of Lists:
>>> import pandas as pd
>>> dictForest = ,'State': *'Assam', 'Delhi', 'Kerala'+, 'GArea': *78438, 1483, 38852+ , 'VDF' :
*2797, 6.72,1663+-
>>> dFrameForest= pd.DataFrame(dictForest)
>>> dFrameForest
Output:
State GArea VDF
0 Assam 78438 2797.00
1 Delhi 1483 6.72
2 Kerala 38852 1663.00
(E) Creation of DataFrame from Series :
>>> import pandas as pd
>>> seriesA = pd.Series(*1,2,3,4,5+, index = *'a', 'b', 'c', 'd', 'e'+)
>>>seriesB = pd.Series (*1000,2000,-1000,-5000,1000+, index = *'a', 'b', 'c', 'd', 'e'+)
>>>seriesC = pd.Series(*10,20,-10,-50,100+, index = *'z', 'y', 'a', 'c', 'e'+)
>>> dFrame7 = pd.DataFrame(*seriesA, seriesB+)
>>> dFrame7
Output:
a b c d e
0 1 2 3 4 5
1 1000 2000 -1000 -5000 1000
(F) Creation of DataFrame from Dictionary of Series:
>>> import pandas as pd
>>> ResultSheet=, 'Arnab': pd.Series(*90, 91, 97+, index=*'Maths','Science','Hindi'+),
'Ramit': pd.Series(*92, 81, 96+, index=*'Maths','Science','Hindi'+), 'Samridhi': pd.Series(*89,
91, 88+, index=*'Maths','Science','Hindi'+), 'Riya': pd.Series(*81, 71, 67+,
index=*'Maths','Science','Hindi'+), 'Mallika': pd.Series(*94, 95, 99+,
index=*'Maths','Science','Hindi'+)-
>>> ResultDF = pd.DataFrame(ResultSheet)
>>> ResultDF
Output:
Arnab Ramit Samridhi Riya Mallika
Maths 90 92 89 81 94
Science 91 81 91 71 95
Hindi 97 96 88 67 99
Operations on rows and columns in DataFrames:
(A) Adding a New Column to a DataFrame:
>>> import pandas as pd
>>> ResultSheet=, 'Arnab': pd.Series(*90, 91, 97+, index=*'Maths','Science','Hindi'+),
'Ramit': pd.Series(*92, 81, 96+, index=*'Maths','Science','Hindi'+), 'Samridhi':
pd.Series(*89, 91, 88+, index=*'Maths','Science','Hindi'+), 'Riya': pd.Series(*81, 71, 67+,
index=*'Maths','Science','Hindi'+), 'Mallika': pd.Series(*94, 95, 99+,
index=*'Maths','Science','Hindi'+)-
>>> ResultDF*'Preeti'+=*89,78,76+
>>> ResultDF = pd.DataFrame(ResultSheet)
>>> ResultDF
Output:
Arnab Ramit Samridhi Riya Mallika preeti
Maths 90 92 89 81 94 89
Science 91 81 91 71 95 78
Hindi 97 96 88 67 99 86
Note: Assigning values to a new column label that does not exist will create a new column
at the end. If the column already exists in the DataFrame then the assignment statement
will update the values of the already existing column
>>> ResultDF*'Ramit'+=*99, 98, 78+
>>> ResultDF
Output:
Arnab Ramit Samridhi Riya Mallika preeti
Maths 90 99 89 81 94 89
Science 91 98 91 71 95 78
Hindi 97 78 88 67 99 86
Note: We can also change data of an entire column to a particular value in a DataFrame.
>>> ResultDF*'Arnab'+=90
>>> ResultDF
Output:
Arnab Ramit Samridhi Riya Mallika preeti
Maths 90 99 89 81 94 89
Science 90 98 91 71 95 78
Hindi 90 78 88 67 99 86
(B) Adding a New Row to a DataFrame :

We can add a new row to a DataFrame using the DataFrame.loc* + method.


>>> ResultDF.loc*'English'+ = *85, 86, 83, 80, 90, 89+
>>> ResultDF
Output:
Arnab Ramit Samridhi Riya Mallika preeti
Maths 90 99 89 81 94 89
Science 91 98 91 71 95 78
Hindi 97 78 88 67 99 86
English 85 86 83 80 90 89
(C) Deleting Rows or Columns from a DataFrame:
We can use the DataFrame.drop() method to delete rows and columns from a
DataFrame.
To delete a row, the parameter axis is assigned the value 0 and for deleting a
column,the parameter axis is assigned the value 1.
>>> ResultDF
Output:
Arnab Ramit Samridhi Riya Mallika
Maths 90 92 89 81 94
Science 91 81 91 71 95
Hindi 97 96 88 67 99
English 85 86 83 80 90
(1) >>> ResultDF = ResultDF.drop('Science', axis=0)
>>> ResultDF
Output:
Arnab Ramit Samridhi Riya Mallika
Maths 90 92 89 81 94
Hindi 97 96 88 67 99
English 85 86 83 80 90
(2)>>> ResultDF = ResultDF.drop(*'Samridhi','Rami t','Riya'+, axis=1)
>>> ResultDF
Output:
Arnab Mallika
Maths 90 94
Hindi 97 99
English 95 95
Note: If the DataFrame has more than one row with the same label, the DataFrame.drop()
method will delete all the matching rows from it.
(D) Renaming Row and column Labels of a DataFrame :
We can change the labels of rows and columns in a DataFrame using the
DataFrame.rename() method.
The parameter axis='index' is used to specify that the row label is to be changed
The parameter axis='columns' implies we want to change the column labels
>>> ResultDF
Output:
Arnab Ramit Samridhi Riya Mallika
Maths 90 92 89 81 94
Science 91 81 91 71 95
Hindi 97 96 88 67 99
English 85 86 83 80 90
(1) >>> ResultDF=ResultDF.rename(,'Maths':'Sub1', ‘Science':'Sub2','English':'Sub3',
'Hindi':'Sub4'-, axis='index')
>>> print(ResultDF)
Output:
Arnab Ramit Samridhi Riya Mallika
Sub1 90 92 89 81 94
Sub2 91 81 91 71 95
Sub3 97 96 88 67 99
Sub4 85 86 83 80 90
Note:. If no new label is passed corresponding to an existing label, the existing row label
is left as it is
(2) >>> ResultDF=ResultDF.rename(,'Arnab':'Student1','Ramit':'Student2','
Samridhi':'Student3','Mallika':'Student4'-,axis='columns')
>>> print(ResultDF)
Output:
Student1 Student2 Student3 Riya Student4
Maths 90 92 89 81 94
Science 91 81 91 71 95
Hindi 97 96 88 67 99
Accessing DataFrames Element through Indexing:
(A) Label Based Indexing:
There are several methods in Pandas to implement label based indexing.
DataFrame.loc* + is an important method that is used for label based indexing with
DataFrames.
>>> ResultDF
Output:
Arnab Ramit Samridhi Riya Mallika
Maths 90 92 89 81 94
Science 91 81 91 71 95
Hindi 97 96 88 67 99

(1)>>> ResultDF.loc*'Science'+
Output:
Arnab 91
Ramit 81
Samridhi 91
Riya 71
Mallika 95
Name: Science, dtype: int64
(3) When a single column label is passed, it returns the column as a Series.
>>> ResultDF.loc*:,'Arnab'+
Output:
maths 90
Science 91
Hindi 97
Name: Arnab, dtype: int64
Also, we can obtain the same result that is the marks of ‘Arnab’ in all the subjects
by using the command:
>>> print(df*'Arnab'+)
(4) To read more than one row from a DataFrame, a list of row labels is used as
shown below. Note that using **++ returns a DataFrame.
>>> ResultDF.loc**'Science', 'Hindi'++
Output:
Arnab Ramit Samridhi Riya Mallika
Science 91 81 91 71 95
Hindi 97 96 88 67 99
B) Boolean Indexing:
Boolean means a binary variable that can represent either of the two states - True
(indicated by 1) or False (indicated by 0).
>>> ResultDF
Output:
Arnab Ramit Samridhi Riya Mallika
Maths 90 92 89 81 94
Science 91 81 91 71 95
Hindi 97 96 88 67 99
>>> ResultDF.loc*'Maths'+ > 90
Output:
Arnab False
Ramit True
Samridhi False
Riya False
Mallika True
Name: Maths, dtype: bool
To check in which subjects ‘Arnab’ has scored more than 90, we can write:
>>> ResultDF.loc*:,‘Arnab’+>90
Output:
Maths False
Science True
Hindi True
Name: Arnab, dtype: bool
Accessing DataFrames Element through Slicing:
(1)>>> ResultDF.loc*'Maths': 'Science'+
Output:
Arnab Ramit Samridhi Riya Mallika
Maths 90 92 89 81 94
Science 91 81 91 71 95
Note that in DataFrames slicing is inclusive of the end values.
(2)>>> ResultDF.loc*'Maths': 'Science', ‘Arnab’+
Maths 90
Science 91
Name: Arnab, dtype: int64
(3)>>> ResultDF.loc*'Maths': 'Science', ‘Arnab’:’Samridhi’+
Output:
Arnab Ramit Samridhi
Maths 90 92 89
Science 91 81 91
we may use a slice of labels with a list of column names to access values of those rows and
columns:
(4)>>> ResultDF.loc*'Maths': 'Science',*‘Arnab’,’Samridhi’++
Output:
Arnab Samridhi
Maths 90 89
Science 91 91
Filtering Rows in DataFrames:
In order to select or omit particular row(s), we can use a Boolean list specifying ‘True’ for the
rows to be shown and ‘False’ for the ones to be omitted in the output.
>>> ResultDF.loc**True, False, True++
Output:
Arnab Ramit Samridhi Riya Mallika
Maths 90 92 89 81 94
Hindi 97 96 88 67 99
Joining, Merging and Concatenation of DataFrames:
(A) Joining :
We can use the pandas.DataFrame.append() method to merge two DataFrames. It
appends rowsof the second DataFrame at the end of the first DataFrame. Columns
not present in the first DataFrame are added as new columns.
>>> dFrame1=pd.DataFrame(**1, 2, 3+, *4, 5+, *6++, columns=*'C1', 'C2', 'C3'+,
index=*'R1', 'R2', 'R3'+)
>>> dFrame1
C1 C2 C3
R1 1 2.0 3.0
R2 4 5.0 NaN
R3 6 NaN NaN
>>> dFrame2=pd.DataFrame(**10, 20+, *30+, *40, 50++, columns=*'C2', 'C5'+,
index=*'R4', 'R2', 'R5'+)
>>> dFrame2
C2 C5
R4 10 20.0
R2 30 NaN
R5 40 50.0
>>> dFrame1=dFrame1.append(dFrame2)
>>> dFrame1
C1 C2 C3 C5
R1 1.0 2.0 3.0 NaN
R2 4.0 5.0 NaN NaN
R3 6.0 NaN NaN NaN
R4 NaN 10.0 NaN 20.0
R2 NaN 30.0 NaN NaN
R5 NaN 40.0 NaN 50.0
if we append dFrame1 to dFrame2, the rows of dFrame2 precede the rows of
dFrame1. To get the column labels appear in sorted order we can set the parameter
sort=True. The column labels shall appear in unsorted order when the parameter sort
= False.
verify_integrity:
The parameter verify_integrity of append()method may be set to True when we want
to raise an error if the row labels are duplicate. By default, verify_integrity = False.
That is why we could append the duplicate row with label R2 when appending the
two DataFrames, as shown above.
ignore_index:
The parameter ignore_index of append()method may be set to True, when we do not
want to use row index labels. By default, ignore_index = False.
ImPortIng and exPortIng data between csV FILes and dataFrames:
Csv(comma separated value):
A CSV file (Comma-Separated Values file) is a plain text file that stores tabular data—
like a spreadsheet—in a simple format. Each line in the file represents a row of data,
and the values in that row are separated by commas (or sometimes other delimiters
like semicolons or tabs).
A Comma Separated Value (CSV) file is a text f ile where values are separated by
comma. Each line represents a record (row). Each row consists of one or more f ields
(columns). They can be easily handled through a spreadsheet application.

 Simple and widely supported (used in Excel, databases, Python, etc.).


 No formatting (just raw data—no fonts, colors, or formulas).
 Used for data exchange between systems or for import/export of data

Importing a CSV file to a DataFrame:

>>> marks = pd.read_csv("C:/NCERT/ResultData. csv",sep =",", header=0)


>>> marks
RollNo Name Eco Maths
0 1 Arnab 18 57
1 2 Kritika 23 45
2 3 Divyam 51 37
3 4 Vivaan 40 60
4 5 Aaroosh 18 27
• The first parameter to the read_csv() is the name of the comma separated data file
along with its path.
• The parameter sep specifies whether the values are separated by comma,
semicolon, tab, or any other character. The default value for sepis a space.
• The parameter header specifies the number of the row whose values are to be used
as the column names. It also marks the start of the data to be fetched. header=0
implies that column names are inferred from the first line of the file. By default,
header=0.
Exporting a DataFrame to a CSV file:
>>> ResultDF

Arnab Ramit Samridhi Riya Mallika


Maths 90 92 89 81 94
Science 91 81 91 71 95
Hindi 97 96 88 67 99

In case we do not want the column names to be saved to the file we may use the
parameter header=False. Another parameter index=False is used when we do not
want the row labels to be written to the file on disk.
>>> ResultDF.to_csv( 'C:/NCERT/resultonly.txt', sep = '@', header = False, index=
False)
If we open the file resultonly.txt, we will find the following contents:
90@92@89@81@94
91@81@91@71@95
97@96@88@67@99
Difference between Pandas Series and NumPy Arrays:
pandas Numpy
In series we can define our own labeled NumPy arrays are accessed by their
index to access elements of an array. integer position using numbers only.
These can be numbers or letters.
The elements can be indexed in The indexing starts with zero for the first
descending order also. element and the index is fixed.
If two series are not aligned, NaN or There is no concept of NaN values and if
missing values are generated. there are no matching values in arrays,
alignment fails
Series require more memory. NumPy occupies lesser memory

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy