Data Frame in Panda 01

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Data Frame

Study Notes

Data Frame
Part : 01
in
Data Analytics
Data Frame
Introduction to DataFrame

Data frame is a table or a 2-D structure where each column represent


some attribute and each row have set of values for each column.

Data of heterogeneous type can be accommodated. The pandas data


frame consist of three principal components data, row and column.

For Your Information : Pandas is one of the most widely


used python libraries in data science. It provides high-performance,
easy to use structures and data analysis tools

Creating DataFrame

In real time scenarios the dataframe will be created by loading the dataset from the existing
database storages like SQL database, CSV files, or Excel files.

Also they can be created by lists, list of dictionary etc.

Let us create the dataframe by some methods

Note : To create a data frame first you need to import the panda library.

Example 01:

import pandas as pd #import library

lst = [‘Ixambee’, ‘is’, ‘an’, ‘online’, ‘Teaching’, ‘Platform’] #creating list of strings

df = pd.DataFrame(lst) # this will create dataframe of the list

print(df) #printing the dataframe


Data Frame
Output

0
0 Ixambee
1 is
2 An
3 Online
4 Teaching
5 Platform

Example 02:

import pandas as pd //import library


# it will get all the data from this csv file and make a dataframe

df = pd.read_csv(‘StudentData.csv’)
print(df)

Output
All the data of StudentData.csv file in table will be shown.

DataFrame Attributes and methods

When we read data from a huge database then our dataframe contains lot of columns and rows.

We have many methods to operate on this dataframe that we made from any database.

1.Use head() method

Imagine a table with 9 rows.

So to see that data frame we can simply write print(df) or simply df.

It will show a table like below which is a data frame that has been created from a csv file.
Data Frame

Suppose you want to see top 5 records. Then you can write like

df.head()

This will by default return you 5 rows from the top.

Suppose you want to see only 2 top rows. Then you can write

df.head(2)

This will by default return you 2 rows from the top.


Data Frame
2.tail() Method

Suppose you want to see the last 5 records of the dataframe which has 32560 records

Then you have to write


df.tail()
This will return the last 5 records by default.

Suppose you want last 3 records then you have to write

df.tail(3)
This will return the last 3 records of the dataframe.

3.Index Method

This method is used to get the range of the rows from the start to the end with the step count.
Suppose you have a dataframe which consist of 32561 rows that starts from value 0.
You have to type df.index.
Data Frame
4.Shape Method

This method is used to get the number of rows and columns in the dataframe.

You have to write df.shape.

Suppose you have a dataframe that has 32561 rows and 15 columns and you want to know the
shape of the table. After executing this method it will return the number of rows and columns as
shown below.

5.Columns Method

This method is used to get the names of the columns. When you want to know the name of the
columns then you have to execute
df.columns
This will show the output as below(see the previous table and observe the column names
in head method)

6.dtypes Method
This method is used to get the data types of the column with the column name.
You have to write
df.dtypes

This will display result like shown below.


Data Frame

Now you can know the data type supported by each column.

7.info Method

From this method we get a dataframe object as result which has all the information about the
dataframe on which we are applying this function.

For this you have to write df.info and you will the result something like below.
Data Frame
Here as you can notice we have the range index with the total no of rows with the serial range
and the number of columns.
Also we can find that how many values are entered in a particular column as we can see that
age is having 32561 entries and are non-null with int64 as data type.

Also notice the second last line where we can see the column count based on data type.
Like we have 6 columns which has data type as int64 from the result int64(6).

8.get_dtype_counts() Method

So this method will give you the ouput what we got in second last line of above method.

What data types are there in dataframe and what are no of columns that belongs to that data
type.

So you have to write like df.get_dtype_counts()

9.axes Method

So this method is going to result the range index of the rows and the column names in the
dataframe.
Data Frame
10.describe Method

So this method is one of the most important method as it is going to give us numerical data
about the data frame.

Note that this method only consider those attributes who has numerical data type like int, float
etc.

When you execute df.describe() method then you will get the result as show below.

Values like
count of the column entries
The mean of the values of the column
The Standard deviation of the column
The MAX value in the column.
The MIN value in the column
Some important percentage calculations like 25%, 50% etc.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy