0% found this document useful (0 votes)

3 views8 pages

UNIT II 2M

The document is a question bank for the course CCS346 Exploratory Data Analysis at Ramco Institute of Technology. It includes various topics related to data manipulation using Pandas in Python, covering features, data indexing, handling missing values, combining datasets, and more. Each question is designed to test knowledge on specific aspects of using Pandas for exploratory data analysis.

Uploaded by

Hariprajaa Balakrishnan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views8 pages

UNIT II 2M

Uploaded by

Hariprajaa Balakrishnan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

RAMCO INSTITUTE OF TECHNOLOGY

Department of Computer Science and Technology

Academic Year: 2024 - 2025 (Even Semester)

QUESTION BANK
Semester / Class : VI / III Year B.E CSE
Name of Subject : CCS346 Exploratory Data Analysis
Name of Faculty member : Mrs.S.Vijaya Amala Devi AP/CSE

Unit II: EDA using Python

Data Manipulation using Pandas – Pandas Objects – Data Indexing and Selection – Operating
on Data – Handling Missing Data – Hierarchical Indexing – Combining datasets – Concat,
Append,Merge and Join – Aggregation and grouping – Pivot Tables – Vectorized String
Operations
PART A
1. What are the key features of Pandas that make it useful for data analysis?.
Pandas is an open-source data analysis and data manipulation library written in
python.
Pandas provide data structures and functions to work on structured data seamlessly.
Key Features of Pandas:
 Data Frames are multidimensional arrays with attached row and column
labels, and with heterogeneous types and/or missing data.
 Offers a convenient storage interface for labelled data
 Datasets are mutable using pandas and allows to add new rows and columns.
 Easy to handle missing data
 Merge and join datasets
 Indexing and subsetting data.

2. What is meant by Data indexing in pandas?

 Data Indexing in pandas refer to selecting specific rows and columns of data
from a Series or Data Frame.
 Indexing means selecting all the rows and some of the columns, some of the
rows and all of the columns, or some of each of the rows and columns.
 Indexing can also be known as Subset Selection.
 .loc() : Label based
 .iloc() : Integer based
 .ix() : Both Label and Integer based

3. What methods does Pandas provide for dealing with missing values?
 Pandas uses sentinels for missing data, and two already-existing Python null
values: the special floating-point NaN value, and the Python None object.
 In Pandas missing data is represented by two value:None: None is a Python
singleton object that is often used for missing data in Python code.
 NaN : NaN (Not a Number), is a special floating-point value based on the
standard IEEE floating-point representation.

1
4. What are the two method of combining dataset in Pandas?
There are two methods for combining datasets: concatenation and merging (or
joining).

5. List the python operators and their equivalent pandas object.

Python Operator Pandas methods
+ add()
- sub(),subtract
* mul, multiply()
/ truediv(), div(), divide()
// floordiv()
% mod()
** pow()

6. List the pandas handling of NAs by type.

Typeclass Conversion When storing NAs NA sentinel values

Floating No change Np.nan
Object No change None or np.nan
Integer Cast to float64 Np.nan
Boolean Cast to object None or np.nan

7. Write a syntax for Concatenation operation in Pandas.

Pandas has a function, pd.concat(), which has a similar syntax to np.concatenate but
contains anumber of options.
pd.concat(objs, axis=0, join='outer', join_axes=None,
ignore_index=False,keys=None, levels=None, names=None, verify_integrity=False,
copy=True)

8. Name the categories of Join.

The pd.merge() function implements a number of types of joins:
1. the one-to-one,
2. many-to-one,
3. and many-to-many joins.
All three types of joins are accessed via an identical call to the pd.merge() interface;
the type of join performed depends on the form of the input data.

9. What is Group By?

Simple aggregations can give a flavor of dataset, but often prefer to aggregate
conditionally on some label or index is called group by operation. The name “group
by” comes from a command in the SQL database language, but it is perhaps more
illuminative to think of it in the terms first coined by Hadley Wickham of Rstats
fame: split, apply, combine.

2
A canonical example of this split-apply-combine operation, where the “apply” is a
summation aggregation makes clear what the Group By accomplishes:
• The split step involves breaking up and grouping a DataFrame depending on the
value of the specified key.
• The apply step involves computing some function, usually an aggregate,
transformation, or filtering, within the individual groups. The combine step merges
the results of these operations into an output array.

10. List the panda’s aggregation methods.

Aggregation Description
count() Total number of items
first(), last() First and last item
mean(), median() Mean and median
min(), max() Minimum and maximum
std(), var() Standard deviation and variance
mad() Mean absolute deviation
prod() Product of all items
sum() Sum of all item

11. What is resampling in pandas?

Pandas dataframe.resample() function is primarily used for time series data.
A time series is a series of data points indexed (or listed or graphed) in time order.
Most commonly, a time series is a sequence taken at successive equally spaced
points in time. It is a Convenience method for frequency conversion and resampling
of time series. Object must have a datetime-like index (DatetimeIndex, PeriodIndex,
or TimedeltaIndex), or pass datetime-like values to the on or level keyword.
Syntax : DataFrame.resample(rule, how=None, axis=0, fill_method=None,
closed=None, label=None, convention=’start’, kind=None, loffset=None,
limit=None, base=0, on=None, level=None)

12. Define shifting in pandas.

Pandas dataframe.shift() function Shift index by desired number of periods with an
optional time freq. This function takes a scalar parameter called the period, which
represents the number of shifts to be made over the desired axis. This function is
very helpful when dealing with time-series data.

Syntax:DataFrame.shift(periods=1, freq=None, axis=0)

where periods : Number of periods to move, can be positive or negative
freq : DateOffset, timedelta, or time rule string, optional Increment to use from the
tseries
module or time rule (e.g. ‘EOM’). See Notes axis : {0 or ‘index’, 1 or ‘columns’}
Return : shifted : DataFrame.

13. Define data wrangling?

3
Data wrangling is the process of transforming data from its original "raw" form into a
more digestible format and organizing sets from various sources into a singular
coherent whole for further processing.
14. List out the mapping between Pandas method and functions in pythons re
module.
Method Description
match() Call re.match() on each element, returning a Boolean.
extract() Call re.match() on each element, returning matched groups as
strings.
findall() Call re.findall() on each element.
replace() Replace occurrences of pattern with some other string.
contains() Call re.search() on each element, returning a Boolean.
count() Count occurrences of pattern.
split() Equivalent to str.split(), but accepts regexps.
rsplit() Equivalent to str.rsplit(), but accepts regexp

15. Name the pandas string method and its description.

Method Description
get() Index each element
slice() Slice each element
slice_replace() Replace slice in each element with passed value
cat() Concatenate strings
repeat() Repeat values
normalize() Return Unicode form of string
pad() Add whitespace to left, right, or both sides of string
wrap() Split long strings into lines with length less than a given width
join() Join Strings in each element of the series with passed separator
get_dummies() Extract dummy variables as a Data Frame

16. What is Python?

Python is a high-level scripting language which can be used for a wide variety of text
processing, system administration and internet-related tasks. Python is a true object-
oriented language and is available on a wide variety of platform.

17. How do import the necessary libraries to display plots in pandas?

To display plots in pandas, it typically need to import pandas and matplotlib libraries:
import pandas as pd
import matplotlib.pyplot as plt

18. Name the two interfaces that are used in pandas.

1. MATLAB-style Interface
2. Object-oriented interface

4
19. What are the three fundamental data structure used in Pandas?
The three fundamental data structures used in Pandas are:
1. Series
2. Data frame
3. Index

20. What is a Series in pandas?

In pandas, a Series is a one-dimensional labeled array capable of holding data of any
type (integer, float, string, Python objects, etc.). It is similar to a column in a
spreadsheet or a table.
Example of creating a Series:
import pandas as pd
# Creating a Series
s = pd.Series([1, 3, 5, 7, 9])
print(s)

21. What is a DataFrame in pandas?

A DataFrame in pandas is a two-dimensional labeled data structure with columns of
potentially different types. It can be thought of as a spreadsheet or a relational
database table, where each column is a Series.
Example of creating a DataFrame:
import pandas as pd
# Creating a DataFrame
data = {
'Name': ['John', 'Anna', 'Peter', 'Linda', 'Sophia'],
'Age': [28, 24, 22, 32, 29],
'Salary': [60000, 45000, 50000, 70000, 62000]
}
df = pd.DataFrame(data)
print(df)

22. What is an Index in pandas?

An Index in pandas is an immutable array-like structure used to label the rows and
columns of a Data Frame. It provides metadata that helps in identifying rows or
columns uniquely.
Example of using an Index in a Data Frame:
import pandas as pd
# Creating a DataFrame with custom index
data = {
'Name': ['John', 'Anna', 'Peter', 'Linda', 'Sophia'],
'Age': [28, 24, 22, 32, 29],
'Salary': [60000, 45000, 50000, 70000, 62000]

5
}
df = pd.DataFrame(data, index=['A', 'B', 'C', 'D', 'E'])
print(df)

23. Explain the process of creating a Data Frame from a list of dictionaries.
Creating a Data Frame from a list of dictionaries is a common operation in pandas,
especially when data that is structured as a list of records, where each record is
represented as a dictionary with consistent keys across all dictionaries.
import pandas as pd
data = [
{'Name': 'Alice', 'Age': 25, 'City': 'New York'},
{'Name': 'Bob', 'Age': 30, 'City': 'Los Angeles'},
{'Name': 'Charlie', 'Age': 35, 'City': 'Chicago'}
]
df = pd.DataFrame(data)
Output:
Name Age City
0 Alice 25 New York
1 Bob 30 Los Angeles
2 Charlie 35 Chicago

24. What are the methods used for operating null values in pandas?
 isnull()
o Generate a Boolean mask indicating missing values
 notnull()
o Opposite of isnull()
 dropna()
o Return a filtered version of the data
 fillna()
o Return a copy of the data with missing values filled or imputed
25. Name the pandas handling of NAs by type.
Type class Conversion storing when NA sentinel Value
NAs
Floating No Change Np.nan
Object No Change None or np.nan
Integer Cast to Float64 Np.nan
Boolean Cast to Object None or np.nan

26. How to use Hierarchical Indexes with Pandas?

Hierarchical Indexes are also known as multi-indexing is setting more than one column
name as the index. It is used to incorporate multiple index levels within a single index. In
this way, higher-dimensional data can be compactly represented within the familiar one-
dimensional Series and two-dimensional DataFrame objects.

6
# importing pandas library as alias pd
import pandas as pd
# calling the pandas read_csv() function.
# and storing the result in DataFrame df
df = pd.read_csv('homelessness.csv')
print(df.head())

27. What are the fundamentals of pandas time series data structure.
 For time stamps, Pandas provides the Timestamp type. As mentioned before, it is
essentially a replacement for Python’s native datetime, but is based on the more
efficient numpy. datetime64 data type. The associated index structure is
DatetimeIndex.
 For time periods, Pandas provides the Period type. This encodes a fixed frequency
interval based on numpy. datetime64. The associated index structure is
PeriodIndex.
 For time deltas or durations, Pandas provides the Timedelta type. Timedelta is a
more efficient replacement for Python’s native datetime.t imedelta type, and is
based on numpy.timedelta64. The associated index structure is TimedeltaIndex.

28. What is meant by hierarchical indexing?

Hierarchical indexing is a method of creating structured group relationships in data.
These hierarchical indexes, or MultiIndexes, are highly flexible and offer a range of
options when performing complex data queries. Hierarchical indexing allows us to
use multiple index levels on an axis. Hierarchical indexing is also known as multiple
indexing.

29. What is data selection in series?

The Series object provides a mapping from a collection of keys to a collection of
values:
import pandas as pd
data = pd.Series([0.25, 0.5, 0.75, 1.0],
index=['a', 'b', 'c', 'd'])
data
Output: a 0.25
b 0.50
c 0.75
d 1.00
dtype: float64

30. List any two major advantages of data indexing and selection in EDA.
Nov/Dec2023
Efficiency: Data indexing allows for rapid access to specific subsets of data, which is
crucial when exploring large datasets. Operations like filtering rows based on
conditions or selecting specific columns can be performed efficiently using indexing
techniques, such as .loc[], .iloc[], and boolean indexing.

Versatility and Flexibility: Pandas indexing provides versatile methods for selecting
data, such as selecting by label (loc[]), by integer position (iloc[]), or using boolean

7
masks. This flexibility allows analysts to tailor their selections based on the
requirements of their analysis, enhancing both the depth and breadth of exploration
possible during EDA.

31. What is preprocessing and Data Engineering? Nov/Dec2024

 Preprocessing refers to the steps taken to clean, transform, and prepare raw data
for analysis. It includes handling missing values, removing duplicates, normalizing
data, and feature scaling to improve data quality.
 Data Engineering is the broader process of designing, building, and maintaining
data pipelines to collect, store, and process data efficiently. It involves ETL (Extract,
Transform, Load), database management, and integration of data from multiple
sources for analytics and machine learning.

32. How do get the column name of your data frame using pandas in python?
Nov/Dec2024
In Pandas, you can get the column names of a DataFrame using the .columns attribute.
import pandas as pd
# Creating a sample DataFrame
data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30], 'City': ['New York', 'London']}
df = pd.DataFrame(data)
# Getting column names
column_names = df.columns
print(column_names)
Output:
Index(['Name', 'Age', 'City'], dtype='object')s

On Data Handling Using Pandas-I
100% (2)
On Data Handling Using Pandas-I
63 pages
Static and Dynamic Voltage Stability Tez
0% (1)
Static and Dynamic Voltage Stability Tez
112 pages
Interview Bit Pandas
No ratings yet
Interview Bit Pandas
62 pages
PYTHON UNIT IV- PANDAS
No ratings yet
PYTHON UNIT IV- PANDAS
36 pages
Reloading Guide For Center Fire Cartridges 1-2004 Versio 24.3
93% (14)
Reloading Guide For Center Fire Cartridges 1-2004 Versio 24.3
56 pages
Understanding Kubernetes: A Guide To
No ratings yet
Understanding Kubernetes: A Guide To
21 pages
Pandas - 1
No ratings yet
Pandas - 1
43 pages
04-Data Manipulation With Pandas
No ratings yet
04-Data Manipulation With Pandas
28 pages
Top Python Questions 1735201448
No ratings yet
Top Python Questions 1735201448
25 pages
12 Ip Dataframes Notes
No ratings yet
12 Ip Dataframes Notes
7 pages
Osram Led SMD Chip Datasheet
No ratings yet
Osram Led SMD Chip Datasheet
25 pages
12 SM Ip
No ratings yet
12 SM Ip
180 pages
Python Programming Pandas Across Examples
No ratings yet
Python Programming Pandas Across Examples
350 pages
(SET-A) SR MAINS Ex. Dt. 01-07-2024 - Q.P
No ratings yet
(SET-A) SR MAINS Ex. Dt. 01-07-2024 - Q.P
14 pages
Unit 4 Fod
100% (1)
Unit 4 Fod
21 pages
Innovatint Basic Training Manual PDF
No ratings yet
Innovatint Basic Training Manual PDF
66 pages
Class 12th QuestionBank InformaticsPractices
No ratings yet
Class 12th QuestionBank InformaticsPractices
148 pages
Python Pandas Interview Questions
100% (1)
Python Pandas Interview Questions
17 pages
Data Handling using Pandas - Revision Notes
No ratings yet
Data Handling using Pandas - Revision Notes
6 pages
100 Python Interview Questions
No ratings yet
100 Python Interview Questions
68 pages
Phan1_Pandas_Numpy_Matplotlib
No ratings yet
Phan1_Pandas_Numpy_Matplotlib
158 pages
On Data Handling Using Pandas-I
100% (2)
On Data Handling Using Pandas-I
64 pages
Ron - Klinger - Bridge in A Hurry 2011 1 PDF
100% (3)
Ron - Klinger - Bridge in A Hurry 2011 1 PDF
61 pages
HW Book Mods 1-3
No ratings yet
HW Book Mods 1-3
51 pages
Unit 2
No ratings yet
Unit 2
81 pages
PANDAS & VIS 1
No ratings yet
PANDAS & VIS 1
25 pages
2022 Application Directory - Food high resolution
No ratings yet
2022 Application Directory - Food high resolution
28 pages
Loki Temp PPT Pandas 2
No ratings yet
Loki Temp PPT Pandas 2
31 pages
Pandas
No ratings yet
Pandas
29 pages
Class Xii Information Practices Ppt on Data Handling Using Pandas-i
No ratings yet
Class Xii Information Practices Ppt on Data Handling Using Pandas-i
64 pages
Informatics Practices Book 12 Answer Key
No ratings yet
Informatics Practices Book 12 Answer Key
54 pages
1501992967_1496666168_Pandas
No ratings yet
1501992967_1496666168_Pandas
63 pages
Control Hematologico w3 X-Tra 13-09-10
No ratings yet
Control Hematologico w3 X-Tra 13-09-10
2 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
Pandas Interview Questions
No ratings yet
Pandas Interview Questions
21 pages
Solo Project Instructions
No ratings yet
Solo Project Instructions
21 pages
Python Pandas
No ratings yet
Python Pandas
15 pages
1 Data Handling Using Pandas 1
No ratings yet
1 Data Handling Using Pandas 1
63 pages
Degradation of LDPE LLDPE and HDPE in Film Extrusion: January 2003
No ratings yet
Degradation of LDPE LLDPE and HDPE in Film Extrusion: January 2003
17 pages
Metal Stamping Press Load Sensing: Integration of Load Measurement in Press Control Systems
No ratings yet
Metal Stamping Press Load Sensing: Integration of Load Measurement in Press Control Systems
8 pages
Pandas
No ratings yet
Pandas
7 pages
SNS2 - Pump Complete Dimension
No ratings yet
SNS2 - Pump Complete Dimension
1 page
Half Life ALL
No ratings yet
Half Life ALL
24 pages
Ruggedness: The Blessing of Bad Geography in Africa: Nathan Nunn and Diego Puga
No ratings yet
Ruggedness: The Blessing of Bad Geography in Africa: Nathan Nunn and Diego Puga
17 pages
Research Proposal
No ratings yet
Research Proposal
9 pages
Pandas Viva Questions
No ratings yet
Pandas Viva Questions
23 pages
99c949c0-5910-425f-9ac5-155882800fa5
No ratings yet
99c949c0-5910-425f-9ac5-155882800fa5
36 pages
? Sample Paper by Aadish
No ratings yet
? Sample Paper by Aadish
7 pages
python 2.1.2 (2)
No ratings yet
python 2.1.2 (2)
7 pages
_8th_of_10_Python_Resources_PANDAS_Interview_Q_A_?_1737825285
No ratings yet
_8th_of_10_Python_Resources_PANDAS_Interview_Q_A_?_1737825285
19 pages
Raw Meal Susp PDF
No ratings yet
Raw Meal Susp PDF
48 pages
Pandas
No ratings yet
Pandas
25 pages
Pandas Questions
No ratings yet
Pandas Questions
11 pages
Analystics Data Cleaning Questions Interview
No ratings yet
Analystics Data Cleaning Questions Interview
8 pages
Module 2 Week 3 4
No ratings yet
Module 2 Week 3 4
9 pages
All Document Reader 1715619870900
No ratings yet
All Document Reader 1715619870900
6 pages
364-C-2901-Assignment IP202 Multiple Choice Questions
No ratings yet
364-C-2901-Assignment IP202 Multiple Choice Questions
6 pages
7- Introduction to Data Science in Python
No ratings yet
7- Introduction to Data Science in Python
7 pages
ip
No ratings yet
ip
5 pages
pandasmohali
No ratings yet
pandasmohali
6 pages
Spot The Mistakes - Graphs and Charts
No ratings yet
Spot The Mistakes - Graphs and Charts
5 pages
Python Unit 2 Question Bank (2)
No ratings yet
Python Unit 2 Question Bank (2)
5 pages
Viva Voce
No ratings yet
Viva Voce
5 pages
MY Question Bank
No ratings yet
MY Question Bank
3 pages
Pandas
No ratings yet
Pandas
13 pages
Lab 1
No ratings yet
Lab 1
9 pages
A Simulation Model of IEEE 802
No ratings yet
A Simulation Model of IEEE 802
6 pages
unit 2 PART B-f
No ratings yet
unit 2 PART B-f
2 pages
MCQ
No ratings yet
MCQ
8 pages
MP Online Examination Unit 1 MCQ'S: SAJ MPMCQ
No ratings yet
MP Online Examination Unit 1 MCQ'S: SAJ MPMCQ
20 pages
Python Ques
No ratings yet
Python Ques
5 pages
Pyhton Panadas Notes Class 12
No ratings yet
Pyhton Panadas Notes Class 12
3 pages
Staad Des
No ratings yet
Staad Des
6 pages
Science 7 - Quarter 1 - WEEK 1 LESSON EXEMPLAR
100% (1)
Science 7 - Quarter 1 - WEEK 1 LESSON EXEMPLAR
16 pages
Holy Innocents Public School Term-1
No ratings yet
Holy Innocents Public School Term-1
6 pages
Qw-484B Format For Welding Operator Performance Qualifications (Wopq) (See QW-301, Section IX, ASME Boiler and Pressure Vessel Code)
No ratings yet
Qw-484B Format For Welding Operator Performance Qualifications (Wopq) (See QW-301, Section IX, ASME Boiler and Pressure Vessel Code)
7 pages
GR Xii Ip Pandas Worksheet
No ratings yet
GR Xii Ip Pandas Worksheet
6 pages
class 12 ip holiday homework
No ratings yet
class 12 ip holiday homework
3 pages
AWP Interview Question
No ratings yet
AWP Interview Question
4 pages
Python Pandas MCQs
No ratings yet
Python Pandas MCQs
7 pages
What is pandas
No ratings yet
What is pandas
9 pages
Iroha Basics
No ratings yet
Iroha Basics
5 pages
Python Data Frame New
No ratings yet
Python Data Frame New
32 pages
Python Pandas Demo PDF
100% (2)
Python Pandas Demo PDF
23 pages
Python Pandas Interview Questions and Answers
No ratings yet
Python Pandas Interview Questions and Answers
20 pages
Calculus: Indeterminate Forms & L'Hospital's Rule
No ratings yet
Calculus: Indeterminate Forms & L'Hospital's Rule
1 page
DH Using Pandas-1 SAQs
No ratings yet
DH Using Pandas-1 SAQs
1 page
Design and Analysis of Algorithms: Course Outline
No ratings yet
Design and Analysis of Algorithms: Course Outline
1 page
TF Seal Selection
No ratings yet
TF Seal Selection
42 pages
Ian Talks Python A-Z
From Everand
Ian Talks Python A-Z
Ian Eress
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

UNIT II 2M

Uploaded by

UNIT II 2M

Uploaded by

RAMCO INSTITUTE OF TECHNOLOGY

Department of Computer Science and Technology

Unit II: EDA using Python

2. What is meant by Data indexing in pandas?

5. List the python operators and their equivalent pandas object.

6. List the pandas handling of NAs by type.

Typeclass Conversion When storing NAs NA sentinel values

7. Write a syntax for Concatenation operation in Pandas.

8. Name the categories of Join.

9. What is Group By?

10. List the panda’s aggregation methods.

11. What is resampling in pandas?

12. Define shifting in pandas.

Syntax:DataFrame.shift(periods=1, freq=None, axis=0)

13. Define data wrangling?

15. Name the pandas string method and its description.

16. What is Python?

17. How do import the necessary libraries to display plots in pandas?

18. Name the two interfaces that are used in pandas.

20. What is a Series in pandas?

21. What is a DataFrame in pandas?

22. What is an Index in pandas?

26. How to use Hierarchical Indexes with Pandas?

28. What is meant by hierarchical indexing?

29. What is data selection in series?

31. What is preprocessing and Data Engineering? Nov/Dec2024

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.