Data Frame in Panda 01
Data Frame in Panda 01
Data Frame in Panda 01
Study Notes
Data Frame
Part : 01
in
Data Analytics
Data Frame
Introduction to DataFrame
Creating DataFrame
In real time scenarios the dataframe will be created by loading the dataset from the existing
database storages like SQL database, CSV files, or Excel files.
Note : To create a data frame first you need to import the panda library.
Example 01:
lst = [‘Ixambee’, ‘is’, ‘an’, ‘online’, ‘Teaching’, ‘Platform’] #creating list of strings
0
0 Ixambee
1 is
2 An
3 Online
4 Teaching
5 Platform
Example 02:
df = pd.read_csv(‘StudentData.csv’)
print(df)
Output
All the data of StudentData.csv file in table will be shown.
When we read data from a huge database then our dataframe contains lot of columns and rows.
We have many methods to operate on this dataframe that we made from any database.
So to see that data frame we can simply write print(df) or simply df.
It will show a table like below which is a data frame that has been created from a csv file.
Data Frame
Suppose you want to see top 5 records. Then you can write like
df.head()
Suppose you want to see only 2 top rows. Then you can write
df.head(2)
Suppose you want to see the last 5 records of the dataframe which has 32560 records
df.tail(3)
This will return the last 3 records of the dataframe.
3.Index Method
This method is used to get the range of the rows from the start to the end with the step count.
Suppose you have a dataframe which consist of 32561 rows that starts from value 0.
You have to type df.index.
Data Frame
4.Shape Method
This method is used to get the number of rows and columns in the dataframe.
Suppose you have a dataframe that has 32561 rows and 15 columns and you want to know the
shape of the table. After executing this method it will return the number of rows and columns as
shown below.
5.Columns Method
This method is used to get the names of the columns. When you want to know the name of the
columns then you have to execute
df.columns
This will show the output as below(see the previous table and observe the column names
in head method)
6.dtypes Method
This method is used to get the data types of the column with the column name.
You have to write
df.dtypes
Now you can know the data type supported by each column.
7.info Method
From this method we get a dataframe object as result which has all the information about the
dataframe on which we are applying this function.
For this you have to write df.info and you will the result something like below.
Data Frame
Here as you can notice we have the range index with the total no of rows with the serial range
and the number of columns.
Also we can find that how many values are entered in a particular column as we can see that
age is having 32561 entries and are non-null with int64 as data type.
Also notice the second last line where we can see the column count based on data type.
Like we have 6 columns which has data type as int64 from the result int64(6).
8.get_dtype_counts() Method
So this method will give you the ouput what we got in second last line of above method.
What data types are there in dataframe and what are no of columns that belongs to that data
type.
9.axes Method
So this method is going to result the range index of the rows and the column names in the
dataframe.
Data Frame
10.describe Method
So this method is one of the most important method as it is going to give us numerical data
about the data frame.
Note that this method only consider those attributes who has numerical data type like int, float
etc.
When you execute df.describe() method then you will get the result as show below.
Values like
count of the column entries
The mean of the values of the column
The Standard deviation of the column
The MAX value in the column.
The MIN value in the column
Some important percentage calculations like 25%, 50% etc.