Chapter 4 - Data Science
Chapter 4 - Data Science
Class X
Ans: Data science is a field that uses scientific methods, processes, algorithms, and systems to
extract knowledge and insights from many structural and unstructured data to apply in AI
applications.
Ans: Targeted advertising is a form of advertising, including online advertising, that is directed
towards an audience with certain traits, based on the product or person that advertiser is
promoting. It makes use of past data about the needs and choices of the user and fixes products
and time for advertising the product accordingly.
Ans: A recommended system refers to a system that is capable of predicting the future
preference of a set of items for a user, and recommending the top item. Recommended system
helps the retailers/sellers and the users by suggesting items similar to the ones a person likes or
by suggesting items like by people who are similar to the user.
Ans: Data science provides practical insights in the crucial decision making concerning
healthcare. Data driven decision making opens up new possibilities to boost healthcare quality.
Data science has improved the healthcare in various ways, such as
iv. Reducing hospital re-admissions by suggesting preventive care and many more.
Ans: Data science really proved to be a boon to this industry as it helps to:
Ans: Outliers means the data that differs drastically from the rest of the data. The kind of unusual
data needs to be removed or replaced from the data set for accurate results. For example, value
zero, given in marks of a student who is absent instead of exemption. This will not give an
accurate class average.
* Tabular data with heterogeneously typed columns, as in an SQL table or Excel spreadsheet
Ans: KNN is also called a lazy learner algorithm because it does not learn from the training set
immediately. Instead, it stores the data set and at the time of classification, might perform an
action on the data set.
Q9. What are the important points to remember when data is collected?
Ans: While handling data online or off-line, the following points to be always remembered:
* The source of data should be authentic and reliable, as the random data source could provide
wrong or unusable data.
* Consent of the owner of the data should be seeked, before using someone’s personal data set.
Ans: The box plot graph represents the summary of the set of data values where a box is created
for each having properties like minimum, first quartile, median, third quartile and maximum. A
vertical line goes to the box at the median. Here, X axis denotes the data to be plotted while the
Y axis shows the frequency distribution.
Q11. Differentiate between arrays and lists in Python.
Ans: The following are the differences between a arrays and lists:
Array List
In arrays data of one type does not support List works perfectly by using data of one
data of another type. type by converting it into another data type.
Arrays can be accessed only through the List occupies more memory space and can
package – NumPy and occupies less be accessed directly in python without any
memory space. package support.
In arrays, the mathematical operators can In list the mathematical operators cannot be
be directly used. used directly on it instead need to be used
separately on individual elements.
Ans: Panda is an open- source Python Library used for data manipulation and data analysis.
Ans: Erroneous data is test data that falls outside of what is acceptable and should be rejected by
the system.
Incorrect Values: The values in the dataset at random placers are not correct. Either the data is
mismatched or it is not relevant to that position.
Invalid or null values: It means value is either corrupted or has no meaning. These values when
occurring in a dataset need to be removed as they hold no value for data processing.
Ans: Python Packages are a way to organize and structure our Python code into reusable
components. It is like a folder that contains related Python files (modules) that work together to
provide certain functionality. Packages help keep our code organized, make it easier to manage
and maintain, and allow us to share our code with others.
Q15. Explain the different formats in which the tabular dataset can be stored
Ans: The tabular data set can be stored in different formats. Some of the commonly used formats
are:
CSV: it stands for, separated values. It is a simple file format used to store tabular data. Each line
of this file is a data record and each record consists of one or more fields which are separated by
commas.
Extra Questions