Data Frame in Panda 01

Data Frame
Study Notes
Data Frame
Part : 01
in
Data Analytics
Data Frame
Introduction to DataFrame
Data frame is a table or a 2-D structure where each column represent

some attribute and each row have set of values for each column.
Data of heterogeneous type can be accommodated. The pandas data

frame consist of three principal components data, row and column.
For Your Information : Pandas is one of the most widely

used python libraries in data science. It provides high-performance,
easy to use structures and data analysis tools
Creating DataFrame
In real time scenarios the dataframe will be created by loading the dataset from the existing
database storages like SQL database, CSV files, or Excel files.
Also they can be created by lists, list of dictionary etc.
Let us create the dataframe by some methods
Note : To create a data frame first you need to import the panda library.
Example 01:
import pandas as pd #import library
lst = [‘Ixambee’, ‘is’, ‘an’, ‘online’, ‘Teaching’, ‘Platform’] #creating list of strings
df = pd.DataFrame(lst) # this will create dataframe of the list
print(df) #printing the dataframe

Data Frame
Output
0
0 Ixambee
1 is
2 An
3 Online
4 Teaching
5 Platform
Example 02:
import pandas as pd //import library

# it will get all the data from this csv file and make a dataframe
df = pd.read_csv(‘StudentData.csv’)
print(df)
Output
All the data of StudentData.csv file in table will be shown.
DataFrame Attributes and methods
When we read data from a huge database then our dataframe contains lot of columns and rows.
We have many methods to operate on this dataframe that we made from any database.
1.Use head() method
Imagine a table with 9 rows.
So to see that data frame we can simply write print(df) or simply df.
It will show a table like below which is a data frame that has been created from a csv file.
Data Frame
Suppose you want to see top 5 records. Then you can write like
df.head()
This will by default return you 5 rows from the top.
Suppose you want to see only 2 top rows. Then you can write
df.head(2)
This will by default return you 2 rows from the top.

Data Frame
2.tail() Method
Suppose you want to see the last 5 records of the dataframe which has 32560 records
Then you have to write

df.tail()
This will return the last 5 records by default.
Suppose you want last 3 records then you have to write
df.tail(3)
This will return the last 3 records of the dataframe.
3.Index Method
This method is used to get the range of the rows from the start to the end with the step count.
Suppose you have a dataframe which consist of 32561 rows that starts from value 0.
You have to type df.index.
Data Frame
4.Shape Method
This method is used to get the number of rows and columns in the dataframe.
You have to write df.shape.
Suppose you have a dataframe that has 32561 rows and 15 columns and you want to know the
shape of the table. After executing this method it will return the number of rows and columns as
shown below.
5.Columns Method
This method is used to get the names of the columns. When you want to know the name of the
columns then you have to execute
df.columns
This will show the output as below(see the previous table and observe the column names
in head method)
6.dtypes Method
This method is used to get the data types of the column with the column name.
You have to write
df.dtypes
This will display result like shown below.

Data Frame
Now you can know the data type supported by each column.
7.info Method
From this method we get a dataframe object as result which has all the information about the
dataframe on which we are applying this function.
For this you have to write df.info and you will the result something like below.
Data Frame
Here as you can notice we have the range index with the total no of rows with the serial range
and the number of columns.
Also we can find that how many values are entered in a particular column as we can see that
age is having 32561 entries and are non-null with int64 as data type.
Also notice the second last line where we can see the column count based on data type.
Like we have 6 columns which has data type as int64 from the result int64(6).
8.get_dtype_counts() Method
So this method will give you the ouput what we got in second last line of above method.
What data types are there in dataframe and what are no of columns that belongs to that data
type.
So you have to write like df.get_dtype_counts()
9.axes Method
So this method is going to result the range index of the rows and the column names in the
dataframe.
Data Frame
10.describe Method
So this method is one of the most important method as it is going to give us numerical data
about the data frame.
Note that this method only consider those attributes who has numerical data type like int, float
etc.
When you execute df.describe() method then you will get the result as show below.
Values like
count of the column entries
The mean of the values of the column
The Standard deviation of the column
The MAX value in the column.
The MIN value in the column
Some important percentage calculations like 25%, 50% etc.

Data Frame in Panda 01

Uploaded by

Copyright:

Available Formats

Data Frame in Panda 01

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Frame in Panda 01

Uploaded by

Copyright:

Available Formats

Data Frame

Data frame is a table or a 2-D structure where each column represent

Data of heterogeneous type can be accommodated. The pandas data

For Your Information : Pandas is one of the most widely

Also they can be created by lists, list of dictionary etc.

Let us create the dataframe by some methods

import pandas as pd #import library

df = pd.DataFrame(lst) # this will create dataframe of the list

print(df) #printing the dataframe

import pandas as pd //import library

DataFrame Attributes and methods

1.Use head() method

Imagine a table with 9 rows.

This will by default return you 5 rows from the top.

This will by default return you 2 rows from the top.

Then you have to write

Suppose you want last 3 records then you have to write

You have to write df.shape.

This will display result like shown below.

So you have to write like df.get_dtype_counts()

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.