0% found this document useful (0 votes)
32 views

Unit I 2 Marks

Unit 1 2 marks for fds

Uploaded by

ramyaproject
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Unit I 2 Marks

Unit 1 2 marks for fds

Uploaded by

ramyaproject
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

UNIT I – INTRODUCTION 2 MARKS

PART A
1 •
What is Data Science?
7 List out the tools for Data Science.
• Analysis
Data Data Science is theR,area
– Python, of and
Spark study
SASwhich involves extracting insights from vast amounts of data
using various scientific methods, algorithms, and processes.
Data Warehousing – Hadoop, SQL
• It helps you to discover hidden patterns from the raw data.
Data
• Visualization
Data Science -isR,anTableau
interdisciplinary field that allows you to extract knowledge from structured or
Machine Learningdata.
unstructured – Spark,
DataAzure
scienceMLenables
studio you to translate a business problem into a research project
8 List out Some
and then applications
translate it back of Data
into aScience.
practical solution.
• Internet Search Results (Google)
2 Why • DataRecommendation
Science needed?Engine (Spotify)
• • It Intelligent
helps you to recommend
Digital the right
Assistants (Googleproduct to the right customer to enhance your business
Assistant)
• • Allows to build intelligence ability
Autonomous Driving Vehicle (Waymo, Tesla) in machines
• • It Spam
enables you (Gmail)
Filter to take better and faster decisions
• • DataAbusive Contenthelp
Science can andyouHatetoSpeech
detect fraud
Filterusing advanced machine learning algorithms
(Facebook)
• • It Robotics
helps you(Boston
to prevent any
Dynamics)significant financial losses
3 What
• are the components of
Automatic Piracy Detection data science?
(YouTube)
• Domain expertise
9 What
• Data are the skills required to become the data scientist?
engineering
• Statistics
• Visualization
• Advanced computing
4 List out the data science jobs.
Most prominent Data Scientist job titles are:
• Data Scientist
• Data Engineer
• Data Analyst
• Statistician
• Data Architect
10 What
• Data are the Challenges of Data Science Technology?
Admin
• • Business Analystof information & data is required for accurate analysis
A high variety
• • Data/Analytics
Not adequate data Manager
science talent pool available
• Management does not provide financial support for a data science team
5. What •is Euclidean
Unavailabilitydistance ? access to data
of/difficult
• Business decision-makers do not effectively use data Science results
Ans. Euclidean
• Explaining distance is usedtotoothers
data science measure the similarity between observations. It is calculated as
is difficult
• Privacy
the square root of issues
the sum of differences between each point.
• Lack of significant domain expert
• If an organization is very small, it can’t have a Data Science team
11 What is a Project Charter?
Clients like to know upfront what they are paying for, so after getting a good understanding of the
6. List the data
business cleaning
problem, try totasks?
get a formal agreement on the deliverables. All this information is collected in a
project charter. The outcome should be a clear research goal, a good understanding of the context well-
Ans. Data deliverables
defined cleaning areand as afollows:
plan of action with a timetable. This information is then placed in a project
charter.
12
1. Data
List theacquisition and metadata
steps involved in the data cleansing
• Errors from data entry
2. Fill •in missing
Physicallyvalues
impossible values
• Missing values
3. Unified date format
• Outliers
• Spaces and types
• Errors
4. Converting against to
nominal codebook
numeric
13 What do you mean by Outliers?
5. Identify outliers and smooth out noisy data
An outlier is an observation that seems to be distant from other observations or, more specifically,
one
observation that follows a different logic or generative process than the other observations. The easiest
way to find outliers is to use a plot or a table with the minimum and maximum values.
14 What are the two operations used to combine information from different datasets?
• The first operation is joining: enriching an observation from one table with information
from another table.
• The second operation is appending or stacking: adding the observations of one table to those of
another table.

15 What do you mean by Exploratory data analysis?

▶ Exploratory Data Analysis (EDA) is an approach to analyse the data using visual techniques.
▶ Information becomes much easier to grasp when shown in a picture, therefore we mainly
use graphical techniques to gain an understanding of data and the interactions between variables.
▶ The visualization techniques used in this phase range from simple line graphs or histograms
to more complex diagrams such as Sankey and network graphs.

16 What is a Pareto diagram?

• A Pareto diagram is a combination of the values and a cumulative distribution.


• A Pareto chart is a type of chart that contains both bars and a line graph, where individual values are
represented in descending order by bars, and the cumulative total is represented by the line.

17 What are the steps involved in building a model?


Building a model is an iterative process. Most of the models consist of the following main steps:

• Selection of a modeling technique and variables to enter in the model

• Execution of the model

• Diagnosis and model comparison

18 What is data mining?


• Data mining is searching for knowledge (interesting patterns) in data.

• Data mining is an essential step in the process of knowledge discovery.

• Data mining provides tools to discover knowledge from data and it turns a large collection of
data into knowledge.
19 What is a data warehouse?
• A data warehouse is a repository of information collected from multiple sources stored under
a unified schema and usually residing at a single site.

• Data warehouses are constructed via a process of data cleaning, data integration, data
transformation, data loading, and periodic data refreshing.

20 What is a boxplot and what do we use it?

Boxplots are a popular way of visualizing a distribution.


A boxplot incorporates the five-number summary as follows:
• Typically, the ends of the box are at the quartiles so that the box length is the interquartile range.
• The median is marked by a line within the box.
• Two lines called whiskers outside the box extend to the smallest (Minimum) and largest
(Maximum) observations.

21 What do you mean by external data?

• Although data is considered an asset more valuable by certain companies, more and
more governments and organizations share their data for free with the world.
• This data can be of excellent quality and it depends on the institution that creates and manages it.
• The information they share covers a broad range of topics in a certain region and its demographics.

22 What is the need for basic statistical descriptions of data?

Basic statistical descriptions can be used to identify properties of the


data. It highlights which data values should be treated as noise or
outliers.

14 What are the two operations used to combine information from different datasets?
• The first operation is joining: enriching an observation from one table with information

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy