0% found this document useful (0 votes)
5 views6 pages

DATASCIENCE(Unit-1) Question Bank

The document provides answer keys for a Data Science course, covering multiple choice questions and detailed explanations regarding Big Data, data wrangling, NumPy, Pandas, web scraping, and APIs. It includes both theoretical concepts and practical programming tasks related to data manipulation and analysis. Additionally, it outlines the importance of data cleaning, handling missing values, and ethical considerations in data acquisition.

Uploaded by

Sa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views6 pages

DATASCIENCE(Unit-1) Question Bank

The document provides answer keys for a Data Science course, covering multiple choice questions and detailed explanations regarding Big Data, data wrangling, NumPy, Pandas, web scraping, and APIs. It includes both theoretical concepts and practical programming tasks related to data manipulation and analysis. Additionally, it outlines the importance of data cleaning, handling missing values, and ethical considerations in data acquisition.

Uploaded by

Sa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

SRM INSTITUTE OF SCIENCE AND TECHNOLOGY

RAMAPURAM
SCHOOL OF COMPUTER SCIENCE ENGINEERING
FACULTY OF ENGINEERING AND TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

21CSS303T-DATA SCIENCE

UNIT-1 ANSWER KEY

PART-A(Multiple Choice Question)

1.Which of the following is NOT a characteristic of Big Data?


a) Veracity
b) Velocity
c) Virtualization
d) Variety

Answer: c) Virtualization

2.What is the purpose of data wrangling in the data science process?


a) To collect raw data
b) To clean and structure data for analysis
c) To visualize insights
d) To build machine learning models

Answer: b) To clean and structure data for analysis

3.Data Science is an interdisciplinary field that combines:


a) Statistics, Computer Science, and Domain Knowledge
b) Biology, Chemistry, and Mathematics
c) Marketing, Sales, and Customer Service
d) Finance, Economics, and Law

Answer: a) Statistics, Computer Science, and Domain Knowledge

4.What is the default data type of a NumPy array if not specified?


a) Integer
b) Float
c) String
d) Boolean

Answer: b) Float

5.Which of the following creates a NumPy array of zeros?


a) np.empty((3,3))
b) np.zeros((3,3))
c) np.ones((3,3))
d) np.full((3,3))
Answer: b) np.zeros((3,3))

6.What will be the output of np.arange(5, 15, 2)?


a) [5, 7, 9, 11, 13]
b) [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
c) [5, 10, 15]
d) [5, 15]

Answer: a) [5, 7, 9, 11, 13]

7.How can you reshape a NumPy array?


a) array.reshape(rows, columns)
b) array.split(rows, columns)
c) array.combine(rows, columns)
d) array.transpose(rows, columns)

·Answer: a) array.reshape(rows, columns)

8.Which function sorts elements in a NumPy array?


a) np.order()
b) np.arrange()
c) np.sort()
d) np.index()

Answer: c) np.sort()

9.What will np.eye(3) return?


a) A 3x3 identity matrix
b) A 3x3 matrix of zeros
c) A 3x3 matrix with diagonal elements as 3
d) A 3x3 matrix of ones

· Answer: a) A 3x3 identity matrix

10.What is the correct way to create a Pandas Series?


a) pd.Series([1, 2, 3, 4])
b) pd.DataSeries([1, 2, 3, 4])
c) pd.ListSeries([1, 2, 3, 4])
d) pd.series([1, 2, 3, 4])

Answer: a) pd.Series([1, 2, 3, 4])

11.What will df.head(3) return?


a) The first 3 rows of the DataFrame
b) The last 3 rows of the DataFrame
c) The first 3 columns of the DataFrame
d) The summary statistics of the DataFrame

Answer: a) The first 3 rows of the DataFrame


12.How do you drop a row in a Pandas DataFrame?
a) df.drop(index=[row_number])
b) df.remove([row_number])
c) df.delete([row_number])
d) df.clear([row_number])

Answer: a) df.drop(index=[row_number])

13.What will df.sort_values(by='column_name') do?


a) Sort the DataFrame by the values in column_name
b) Rename the column_name
c) Delete column_name
d) Convert the column_name to integers

Answer: a) Sort the DataFrame by the values in column_name

14.How do you check for missing values in a DataFrame?


a) df.isnull()
b) df.isna()
c) df.notnull()
d) Both a and b

Answer: d) Both a and b

15.What will df['column_name'].rank() do?


a) Assign ranks to the values in column_name
b) Count unique values in column_name
c) Sort column_name
d) Drop duplicates from column_name

Answer: a) Assign ranks to the values in column_name

16.What is the purpose of Web Scraping?


a) To extract and collect data from websites
b) To clean and structure datasets
c) To train machine learning models
d) To store data in databases

Answer: a) To extract and collect data from websites

17.Which Python library is commonly used for web scraping?


a) scrapy
b) beautifulsoup4
c) requests
d) All of the above

Answer: d) All of the above


18.What does an API return data in?
a) CSV format
b) JSON or XML format
c) HTML format
d) PDF format

Answer: b) JSON or XML format

19. What function in the requests library is used to fetch data from an API?
a) requests.fetch()
b) requests.call()
c) requests.get()
d) requests.retrieve()

Answer: c) requests.get()

20.What is the purpose of Open Data sources?


a) To provide free access to data for research and analysis
b) To sell data to private companies
c) To restrict data access to specific users
d) To store confidential information

Answer: a) To provide free access to data for research and analysis

PART-B(4 Marks)

1.Explain the four Vs of Big Data with examples.


2.What are the main steps in the Data Science Process? Briefly explain each step.
3.How is Data Science different from Data Analytics? Provide examples.
4.Why is data cleaning important in Data Science? Give two common data cleaning
techniques.
5.What is NumPy? Explain its advantages over Python lists.
6.Write a Python program to create a 3x3 NumPy array with random numbers and
print its shape and size.
7.Explain the difference between shallow copy and deep copy in NumPy with an
example.
8.What is an identity matrix? How can you create it using NumPy?
9.What is a Pandas Series? How is it different from a Python list?
10.Explain the difference between loc[] and iloc[] in Pandas with an example.
11.How can you sort a Pandas DataFrame based on multiple columns? Give an
example.
15.What is Web Scraping? Mention two Python libraries used for it and their
functions.
16.Explain how to extract data from an API using the Python requests library with an
example.
17.What are Open Data Sources? Give two examples of publicly available datasets.
18.What are the ethical considerations in data acquisition? Discuss any two.
19.What are missing values in a dataset? Explain two methods to handle missing
values in Pandas.

Part-C(12 Marks)

1. Explain the Data Science process in detail. Describe each step with relevant
examples and discuss its importance

2.What are the key challenges in working with Big Data? Explain the four Vs of Big
Data and discuss how these challenges are addressed in Data Science.

3.Compare and contrast Data Science, Data Analytics, and Machine Learning.
Provide examples of their applications in real-world scenarios.

4.Explain NumPy arrays and their advantages over Python lists. Provide examples
demonstrating array creation, indexing, and basic operations.

5.Write a Python program that creates a NumPy array and performs the following
operations:

A. Reshape the array


B. Find the maximum and minimum values
C. Sort the array
D. Perform matrix multiplication with another array
Explain each step in detail.

6.Discuss various ways to manipulate the shape of a NumPy array. Explain with
examples of reshaping, flattening, and transposing an array.
7.What is a Pandas DataFrame? Explain its structure with examples. Discuss common
operations like selecting data, filtering rows, and modifying columns.
8.How can missing data be handled in Pandas? Explain different methods such as
dropping, filling, and interpolation with examples.
9.Write a Python program to load a dataset into a Pandas DataFrame and perform the
following tasks:

a) Display basic information and summary statistics


b) Sort the data based on a column
c) Filter specific rows based on conditions
d) Rename columns and reset the index
Explain the output for each operation.

10Explain Web Scraping in detail. Discuss its applications, ethical concerns, and
demonstrate how to scrape data using BeautifulSoup with a Python example.
11.What are APIs, and how are they used in Data Science? Explain the process of
fetching data from an API with an example using the Python requests library.
12.Discuss different data sources used in Data Science. Compare Open Data, APIs,
and Web Scraping in terms of ease of access, reliability, and ethical considerations.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy