Pandas

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Python - Pandas

# What is Pandas?

Pandas is a Python library used for working with data sets.

It has functions for analyzing, cleaning, exploring, and manipulating data.

The name "Pandas" has a reference to both "Panel Data", and "Python Data
Analysis" and was created by Wes McKinney in 2008.

Why Use Pandas?

Pandas allows us to analyze big data and make conclusions based on


statistical theories.

Pandas can clean messy data sets, and make them readable and relevant.

Relevant data is very important in data science.

What Can Pandas Do?


Pandas gives you answers about the data. Like:
Is there a correlation between two or more columns?

What is average value?


Max value?
Min value?

Pandas are also able to delete rows that are not relevant, or contains wrong values, like
empty or NULL values. This is called cleaning the data.

Where is the Pandas Codebase?

The source code for Pandas is located at this github repository


https://github.com/pandas-dev/pandas

1
1. Pandas Getting Started
1.1 Installation of Pandas

If you have Python and PIP already installed on a system, then installation
of Pandas is very easy.

Install it using this command:

C:\Users\Your Name>pip install pandas

1.2 Import Pandas

Once Pandas is installed, import it in your applications by adding the


"import" keyword:

Syntax : import pandas


Now Pandas is imported and ready to use.

import pandas

mydataset = {
'cars': ["BMW", "Volvo", "Ford"],
'passings': [3, 7, 2]
}

myvar = pandas.DataFrame(mydataset)

print(myvar)

OUTPUT :
cars passings
0 BMW 3
1 Volvo 7
2 Ford 2

1.3 Pandas as pd

Pandas is usually imported under the pd alias.

2
alias: In Python alias are an alternate name for referring to the same thing.
Create an alias with the "as" keyword while importing:

### Syntax : import pandas as pd


Now the Pandas package can be referred to as pd instead of pandas.

import pandas as pd

mydataset = {
'cars': ["BMW", "Volvo", "Ford"],
'passings': [3, 7, 2]
}

myvar = pd.DataFrame(mydataset)

print(myvar)

OUTPUT :
cars passings
0 BMW 3
1 Volvo 7
2 Ford 2

1.4 Checking Pandas Version


The version string is stored under __version__ attribute.

import pandas as pd

print(pd.__version__)

OUTPUT :
1.2.2

2. Pandas Series
2.1 What is a Series?

A Pandas Series is like a column in a table.

It is a one-dimensional array holding data of any type.

3
Example 2.1 : Create a simple Pandas Series from a list - int, float, string

import pandas as pd
a = [1, 2, 3]
myvar = pd.Series(a)
print(myvar)

OUTPUT :
0 1
1 2
2 3
dtype: int64

The datatype of the elements in the Series is int64.

Based on the values present in the series, the datatype of the series is
decided.

import pandas as pd

a = [1.1, 2.2, 3.3]

myvar = pd.Series(a)

print(myvar)

OUTPUT :
0 1.1
1 2.2
2 3.3
dtype: float64

import pandas as pd

a = ["apple", "banana", "orange"]

myvar = pd.Series(a)

print(myvar)

OUTPUT :
0 apple
1 banana
2 orange
dtype: object

4
import pandas as pd

a = [1, "banana", 3]

myvar = pd.Series(a)

print(myvar)

OUTPUT :
0 1
1 banana
2 3
dtype: object

2.2 Labels

If nothing else is specified, the values are labeled with their index number.
First value has index 0, second value has index 1 etc.

This label can be used to access a specified value.

Example 2.2 : Return the second value of the Series:

import pandas as pd

a = [1, 2, 3, 4, 5, 6, 7]

myvar = pd.Series(a)

print(myvar[1])

OUTPUT :
2

import pandas as pd

a = [1, 2, 3, 4, 5, 6, 7]

myvar = pd.Series(a)

print(myvar[1:4])

OUTPUT :
1 2
2 3
3 4
dtype: int64
5
2.3 Create Labels
With the "index argument", you can name your own labels.

Example 2.3 : Create you own labels

import pandas as pd

a = [1, 2, 3]

myvar = pd.Series(a, index = ["x", "y", "z"])

print(myvar)

OUTPUT :
x 1
y 2
z 3
dtype: int64

When you have created labels, you can access an item by referring to the label.

Example 2.4 : Return the value of "y":

import pandas as pd

a = [1, 2, 3]

myvar = pd.Series(a, index = ["x", "y", "z"])

print(myvar["y"])

OUTPUT :
2

3. Pandas DataFrames
3.1What is a DataFrame?

A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional


array, or a table with rows and columns.

In Python Pandas module, DataFrame is a very basic and important type.

6
To create a DataFrame from different sources of data or other Python
datatypes, we can use "DataFrame()" constructor.

Syntax of DataFrame() class :

DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)

Example 3.1 Create an Empty DataFrame

To create an empty DataFrame, pass no arguments to pandas.DataFrame() class.

In this example, we create an empty DataFrame and print it to the console


output.

import pandas as pd

df = pd.DataFrame()

print(df)

OUTPUT :
Empty DataFrame
Columns: []
Index: []

Example 3.2 : Create a simple Pandas DataFrame

import pandas as pd

data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}

#load data into a DataFrame object:

df = pd.DataFrame(data)

print(df)

OUTPUT :
calories duration
0 420 50
1 380 40
2 390 45

7
Example 3.3 Create a simple Pandas DataFrame with Lables - Index

import pandas as pd

data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}

df = pd.DataFrame(data, index = ["day1", "day2", "day3"])

print(df)

OUTPUT :
calories duration
day1 420 50
day2 380 40
day3 390 45

3.2 Create Pandas DataFrame from List of Lists?

To create Pandas DataFrame from list of lists, you can pass this list of
lists as data argument to "pandas.DataFrame()".

Each inner list inside the outer list is transformed to a row in resulting
DataFrame.

Example 3.4: Create DataFrame from List of Lists


import pandas as pd

#list of lists

data = [['a1', 'b1', 'c1'],


['a2', 'b2', 'c2'],
['a3', 'b3', 'c3']]

df = pd.DataFrame(data)
print(df)

OUTPUT :
0 1 2
0 a1 b1 c1
1 a2 b2 c2
2 a3 b3 c3
8
Example 3.5: Create DataFrame from List of Lists with Column Names & Index

import pandas as pd

#list of lists

data = [['a1', 'b1', 'c1'],


['a2', 'b2', 'c2'],
['a3', 'b3', 'c3']]

columns = ['C1', 'C2', 'C3']


index = ['R1', 'R2', 'R3']

df = pd.DataFrame(data, index, columns)


print(df)

OUTPUT :
C1 C2 C3
R1 a1 b1 c1
R2 a2 b2 c2
R3 a3 b3 c3

Example 3.5: Create DataFrame from List of Lists with Different List Lengths

import pandas as pd

#list of lists
data = [['a1', 'b1', 'c1', 'd1'],
['a2', 'b2', 'c2'],
['a3', 'b3', 'c3']]

df = pd.DataFrame(data)
print(df)

OUTPUT :
0 1 2 3
0 a1 b1 c1 d1
1 a2 b2 c2 None
2 a3 b3 c3 None

3.3 Create Pandas DataFrame from Python Dictionary

You can create a DataFrame from Dictionary by passing a dictionary as the


data argument to DataFrame() class.

9
Example 3.6: Create DataFrame from Dictionary

import pandas as pd

mydictionary = {'names': ['raju', 'ramu', 'ravi', 'akash'],


'physics': [68, 74, 77, 78],
'chemistry': [84, 56, 73, 69],
'algebra': [78, 88, 82, 87]}
#create dataframe using dictionary
df_marks = pd.DataFrame(mydictionary)
print(df_marks)

OUTPUT :
names physics chemistry algebra
0 raju 68 84 78
1 ramu 74 56 88
2 ravi 77 73 82
3 akash 78 69 87

Shape or Dimensions of Pandas DataFrame


To get the shape of Pandas DataFrame, use "DataFrame.shape".
The shape property returns a tuple representing the dimensionality of the
DataFrame.
The format of shape would be (rows, columns).

Example: DataFrame Shape


In the following example, we will find the shape of DataFrame.

Also, you can get the number of rows or number of columns using index on the
shape.

import pandas as pd

data = [['a1', 'b1', 'c1'],


['a2', 'b2', 'c2'],
['a3', 'b3', 'c3'],
['a4', 'b4', 'c4']]

columns = ['C1', 'C2', 'C3']


index = ['R1', 'R2', 'R3', 'R4']
df = pd.DataFrame(data, index, columns)
print('The DataFrame is :\n', df)
#get dataframe shape
shape = df.shape
print('\nDataFrame Shape :', shape)
print('\nNumber of rows :', shape[0])
print('\nNumber of columns :', shape[1])

10
OUTPUT :
The DataFrame is :
C1 C2 C3
R1 a1 b1 c1
R2 a2 b2 c2
R3 a3 b3 c3
R4 a4 b4 c4

DataFrame Shape : (4, 3)

Number of rows : 4

Number of columns : 3

Print Information of Pandas DataFrame

To print information of Pandas DataFrame, call DataFrame.info() method.

The DataFrame.info() method returns nothing but just prints information about
this DataFrame.

Example : Print DataFrame Information

In the following program, we have created a DataFrame.

We shall print this DataFrame’s information using DataFrame.info() method.

import pandas as pd

df = pd.DataFrame(
[['abc', 22],
['xyz', 25],
['pqr', 31]],
columns=['name', 'age'])

print(df)

df.info()

OUTPUT :
name age
0 abc 22
1 xyz 25
2 pqr 31
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2

11
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 name 3 non-null object
1 age 3 non-null int64
dtypes: int64(1), object(1)
memory usage: 176.0+ bytes

import pandas as pd

data = [['a1', 'b1', 'c1'],


['a2', 'b2', 'c2'],
['a3', 'b3', 'c3'],
['a4', 'b4', 'c4']]

columns = ['C1', 'C2', 'C3']


index = ['R1', 'R2', 'R3', 'R4']

df = pd.DataFrame(data, index, columns)


print(df)

df.info()

OUTPUT :
C1 C2 C3
R1 a1 b1 c1
R2 a2 b2 c2
R3 a3 b3 c3
R4 a4 b4 c4
<class 'pandas.core.frame.DataFrame'>
Index: 4 entries, R1 to R4
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 C1 4 non-null object
1 C2 4 non-null object
2 C3 4 non-null object
dtypes: object(3)
memory usage: 128.0+ bytes

12
import pandas as pd

mydictionary = {'names': ['raju', 'ramu', 'ravi', 'akash'],


'physics': [68, 74, 77, 78],
'chemistry': [84, 56, 73, 69],
'algebra': [78, 88, 82, 87]}

#create dataframe using dictionary


df_marks = pd.DataFrame(mydictionary)

print(df_marks)

df_marks.info()

OUTPUT :
names physics chemistry algebra
0 raju 68 84 78
1 ramu 74 56 88
2 ravi 77 73 82
3 akash 78 69 87
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 names 4 non-null object
1 physics 4 non-null int64
2 chemistry 4 non-null int64
3 algebra 4 non-null int64
dtypes: int64(3), object(1)
memory usage: 256.0+ bytes

Pandas Read CSV


A simple way to store big data sets is to use CSV files (comma separated
files).

CSV files contains plain text and is a well know format that can be read by
everyone including Pandas.

import pandas as pd

#load dataframe from csv


df = pd.read_csv("pandas.csv")

#print dataframe
print(df)

13
OUTPUT :
Name maths physics chemisry
0 a 11 21 31
1 b 12 22 32
2 c 13 23 32
3 d 14 24 34

Note : If you have a large DataFrame with many rows, Pandas will only return the first 5 rows,
and the last 5 rows:

import pandas as pd

#load dataframe from csv


df = pd.read_csv("pandas1.csv")

#print dataframe
print(df)

OUTPUT :
Duration Pulse Maxpulse Calories
0 60 110 130 409.1
1 60 117 145 479.0
2 60 103 135 340.0
3 45 109 175 282.4
4 45 117 148 406.0
.. ... ... ... ...
164 60 105 140 290.8
165 60 110 145 300.0
166 60 115 145 310.2
167 75 120 150 320.4
168 75 125 150 330.4

[169 rows x 4 columns]

Tip: use to_string() to print the entire DataFrame.

import pandas as pd

#load dataframe from csv


df = pd.read_csv("pandas1.csv")

#print dataframe
print(df.to_string())

OUTPUT :
Duration Pulse Maxpulse Calories
0 60 110 130 409.1
1 60 117 145 479.0
14
2 60 103 135 340.0
3 45 109 175 282.4
4 45 117 148 406.0
5 60 102 127 300.0
6 60 110 136 374.0
7 45 104 134 253.3
8 30 109 133 195.1
9 60 98 124 269.0
10 60 103 147 329.3
11 60 100 120 250.7

…..

Pandas DataFrame to Excel


Example : Write DataFrame to Excel File
You can write the DataFrame to Excel File without mentioning any sheet name.

The step by step process is given below:

1. Have your DataFrame ready. In this example we shall initialize a DataFrame


with some rows and columns.

2. Create an Excel Writer with the name of the output excel file, to which
you would like to write our DataFrame.

3. Call to_excel() function on the DataFrame with the Excel Writer passed as
argument.

4. Save the Excel file using save() method of Excel Writer.

import pandas as pd
# create dataframe
df_marks = pd.DataFrame({'name': ['raju', 'ramu', 'ravi', 'akash'],
'physics': [68, 74, 77, 78],
'chemistry': [84, 56, 73, 69],
'algebra': [78, 88, 82, 87]})

# create excel writer object


writer = pd.ExcelWriter('output2.xlsx')

# write dataframe to excel (a.b(dat))


df_marks.to_excel(writer)

# save the excel


writer.save()
print('DataFrame is written successfully to Excel File.')
print(df_marks)

15
OUTPUT :
DataFrame is written successfully to Excel File.
name physics chemistry algebra
0 raju 68 84 78
1 ramu 74 56 88
2 ravi 77 73 82
3 akash 78 69 87

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('pandas.csv')

print(df)

df.plot()

plt.show()

OUTPUT :
Name maths physics chemisry
0 a 11 21 31
1 b 12 22 32
2 c 13 23 32
3 d 14 24 34

16

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy