0% found this document useful (0 votes)

26 views

Fda End Sem

fda

Uploaded by

anshulsharmacoc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views

Fda End Sem

fda

Uploaded by

anshulsharmacoc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Q1. Explain the difference between descriptive and inferential statistics.

Provide examples of common descriptive statistics (e.g., mean, median,

standard deviation) and how they can be used to summarize data.

Descriptive Statistics

Descriptive statistics involves methods for summarizing and organizing data so that it can be
easily understood. These statistics describe the main features of a dataset, providing simple
summaries and graphical representations of the data.

Inferential Statistics

Inferential statistics involves methods that take a sample from a population and make
inferences or predictions about the larger population. These methods help in making
generalizations beyond the immediate data available.

Common Descriptive Statistics:

1. Mean (Average): The sum of all values divided by the number of values. It provides a
measure of the central tendency of the data.
o Example: If we have test scores of 70, 80, 90, and 100, the mean is (70 + 80 +
90 + 100) / 4 = 85.
o Use: The mean can be used to summarize the average performance of students
in a class.
2. Median: The middle value when the data is ordered from least to greatest. If there is
an even number of observations, the median is the average of the two middle
numbers.
o Example: For the test scores 70, 80, 90, and 100, the median is (80 + 90) / 2 =
85.
o Use: The median is useful in understanding the center of the data, especially
when the data is skewed.
3. Standard Deviation: A measure of the amount of variation or dispersion in a set of
values .

Example: If the test scores are closely packed around the mean, the standard
deviation will be low; if they are spread out, it will be high.

 Use: The standard deviation helps in understanding the variability of the data

Q2. Key principles of data visualization?

Data visualization is an essential tool for communicating information effectively. To create

clear, accurate, and compelling visualizations, it’s important to follow key principles that
guide their design and use. Here are some of the key principles of data visualization:
1. Clarity

 Readable Text: Ensure that all text, including labels, titles, and annotations, is easily
readable.

2. Accuracy

 Represent Data Honestly: Avoid distorting the data or creating misleading

representations. For example, use appropriate scales and avoid manipulating axis
ranges to exaggerate trends.

3. Consistency

 Uniform Design: Use consistent colors, fonts, and shapes across similar types of data
to avoid confusion.

4. Relevance

 Focus on Key Information: Highlight the most important data points and trends
relevant to the audience.

5. Efficiency

 Quick Insights: Design visualizations that allow viewers to quickly grasp the main
message or insights without extensive explanation.

6. Aesthetics

 Color Use: Use color effectively to highlight key data points and differentiate
categories. Be mindful of colorblindness and choose palettes that are accessible to all
viewers.

7. Simplicity

 Direct Presentation: Present information in a straightforward manner without

unnecessary embellishments.

8. Functionality

 Purpose-Driven Design: Ensure each element of the visualization serves a clear

purpose and contributes to the overall message.

9. Hierarchy

 Visual Hierarchy: Use size, color, and positioning to create a visual hierarchy that
guides viewers through the information in a logical order.

10. Storytelling
 Narrative Flow: Create a logical flow that tells a story with the data, leading the
viewer from introduction to conclusion.

Q3. Explain the concept of data visualization in EDA. Why is it important?

Key Concepts of Data Visualization in EDA [exploratory data analysis]

1. Graphical Representation: Transforming data into visual formats such as charts,

graphs, and plots, making complex data more accessible.
2. Pattern Recognition: Visual tools like scatter plots, histograms, and box plots help
identify patterns, trends, and relationships in the data.
3. Anomaly Detection: Visualization helps in spotting outliers and anomalies, which
might indicate errors or significant insights.
4. Hypothesis Testing: Visual comparison of data subsets can help in forming and
testing hypotheses about the data.
5. Data Distribution: Charts like histograms and density plots illustrate the distribution
of data, showing how values are spread.
6. Data Cleaning: Visualizations can reveal missing values, duplicates, or other data
quality issues that need addressing.

Importance of Data Visualization in EDA

1. Enhanced Understanding: Visual representations simplify complex data, making it

easier to grasp and interpret. This is crucial for both analysts and non-technical
stakeholders.
2. Effective Communication: Visuals convey insights more effectively than raw data or
summary statistics alone, aiding in clear communication of findings.
3. Quick Insights: Visualization allows for rapid identification of key trends and
patterns, speeding up the analysis process and enabling faster decision-making.
4. Data Quality Assessment: By visualizing data, analysts can quickly spot and address
data quality issues, ensuring more accurate and reliable analyses.
5. Guidance for Further Analysis: Initial visual exploration can highlight areas that
warrant deeper investigation, guiding subsequent analytical efforts.
6. Interactive Exploration: Interactive visual tools enable users to explore data
dynamically, asking new questions and uncovering deeper insights through direct
manipulation of the visuals.

Q4. What are some popular Python libraries used for data visualization
during EDA?

Matplotlib:

 Description: A comprehensive library for creating static, animated, and interactive

visualizations in Python.
 import matplotlib.pyplot as plt
 plt.hist(data['column'])
 plt.show()
Seaborn:

Description: Built on top of Matplotlib, Seaborn provides a high-level interface for

drawing attractive and informative statistical graphics.

 import seaborn as sns

 sns.distplot(data['column'])

Pandas :

 Description: Directly built into the Pandas library, it allows quick and easy plotting
directly from DataFrame and Series objects.
 data['column'].plot(kind='hist')

GGplot:

A Python implementation of the grammar of graphics, inspired by ggplot2 in R.

Example: from ggplot import ggplot, aes, geom_point

Q5. What is the null hypothesis tested in a one-way ANOVA?

In a one-way Analysis of Variance (ANOVA), the null hypothesis (H₀) being tested is that
the means of the different groups are equal. This hypothesis can be formally stated as:

H0: μ1=μ2=μ3= . . . . = μk

are the population means of the k different groups being compared.

H1 : at least one μi not equal to μj I is not equal to j

 Alternative Hypothesis (H₁): At least one group mean is different from the others.

Null Hypothesis (H₀): All group means are equal

Q6. Describe the situations where a non-parametric test would be preferred

over a parametric test.
Non-normal Distributions: When the data does not follow a normal
distribution, non-parametric tests, which do not assume normality, are more
suitable
Categorical Data: For categorical data that does not involve numerical values,
non-parametric tests are used.
Presence of outliers: Non-parametric tests are less sensitive to outliers
compared to parametric tests
Lack of Information: When there is insufficient information about the
population parameters (mean, variance), non-parametric tests are
advantageous
Unequal Variances: When homogeneity of variances is violated non-parametric
tests can be useful.
Non-linearity: In cases where the relationship between variables is non-linear
then non-parametric tests can be useful.
Q7. Briefly explain the evolution of data visualization. When and why did
data visualization become important?

Data visualization has a rich history that spans centuries. It evolved from
simple charts and diagrams to sophisticated digital graphics. Here's a
brief overview:

1. Early Visualizations: The roots of data visualization can be traced

back to ancient times with maps, cave paintings, and other
rudimentary forms of visual representation.

2. Statistical Graphics: In the 17th and 18th centuries, pioneers like

William Playfair and Joseph Priestley introduced statistical graphs,
laying the foundation for modern data visualization. Playfair, for
instance, created line graphs, bar charts, and pie charts.

3. Technological Advancements: The Industrial Revolution brought

about advancements in printing technology, enabling the mass
production of visual materials like charts and graphs.

4. Computer Age: The advent of computers in the 20th century

revolutionized data visualization. Tools like spreadsheets and
graphing software made it easier to create and analyze
visualizations.

5. Interactive Visualization: With the rise of the internet and

interactive technologies, data visualization became more dynamic
and user-friendly. Websites and software applications allowed
users to explore data in real-time.

6. Big Data Era: In the 21st century, the explosion of data from
various sources necessitated more sophisticated visualization
techniques to make sense of complex datasets. Visualization tools
and techniques continue to evolve to handle the challenges posed
by big data.

Importance of Data Visualization:

1. Cognitive Efficiency: Humans process visual information more

efficiently than text. Visualization helps in quickly understanding
complex data.
2. Pattern Recognition: Visualizations aid in identifying trends, patterns,
and outliers in datasets that might not be apparent through numerical
analysis alone.
3. Communication: Effective visualizations communicate data-driven
insights clearly and concisely to a broad audience, including non-
specialists.
4. Decision Making: In business, science, and public policy, visualizations
support informed decision-making by presenting data in an actionable
format.
5. Engagement: Interactive and dynamic visualizations engage users,
allowing them to explore data and gain insights actively.

Q8. Discuss the advantages and disadvantages of using data visualization for
data analysis.

Advantages:
Simplification of complex data
Pattern recognization
Faster processing
Customization
Quick response
asthetics
efficient
Disadvantaes:
Over-simlified
Bias
Technical expertise
Tool limitations
time and cost
maintaince

Q9. What is the role of features in machine learning? Can you give an
example of a feature and its target variable?
features play a crucial role as they are the input variables used by the model to
make predictions or decisions. Features are the measurable properties or
characteristics of the phenomenon being observed.

Role of Features in Machine Learning

1. Data Representation: Features represent the data points that the model
will use to learn. They help in capturing the underlying patterns in the
data.
2. Model Training: During the training process, the machine learning
algorithm uses features to understand how different values of the features
correlate with the target variable.
3. Prediction: Once the model is trained, it uses the features from new,
unseen data to make predictions about the target variable.
4. Feature Engineering: The process of selecting, modifying, or creating
new features can significantly improve model performance. Feature
engineering is a critical step in the machine learning workflow.

Example of a Feature and Its Target Variable:

Consider a scenario where we are building a machine learning model to predict house prices.
In this case:

 Feature: A feature could be the size of the house in square feet. (size of house).
 Target Variable: The target variable would be the price of the house. (house price).
Q10. List two types of non-parametric tests used to compare two
independent groups.
1. Mann-Whitney U Test:
Purpose: This test is used to determine whether there is a significant difference
between the distributions of two independent groups.
Data not normally distributed.
Example: Comparing the median incomes of two different cities to determine if
one city has a significantly different income distribution compared to the other.

2. Kolmogorov-Smirnov Test:
Purpose: This test is used to compare the distributions of two independent
samples. samples come from the same distribution.
differences in the shape and location
Example: Comparing the distribution of daily temperatures between two
different time periods

Q11. Compare and contrast the assumptions of ANOVA with those of a non-
parametric test like the Mann-Whitney U test.
Assumptions of ANOVA:
Normality: group are normally distributed.
Variance: variances are equal across the groups.
Independent: observations should be independent
Normal data
Sensitive to outliers

Assumptions of the Mann-Whitney U Test:

Ordinal: data that are not normally distributed.
Variance: variances are not equal across the groups.
Independent: observations are independent
Ordinal data
Less sensitive to outliers

Q12. Discuss the role of hyperparameters in machine learning models. How

are they tuned for optimal performance?
Role of Hyperparameters:
Model Capacity :
Model Complexity: control complexity of the model.
Training Process: learning rate, batch size.
Regularization:
Model-Specific Settings: Each machine learning algorithm has own
hyperparameters.

Tuning Hyperparameters for Optimal Performance:

1. Grid Search:

 Description: set of possible values for each hyperparameter and then

training model for every possible combination.

2. Random Search:

 Description: Instead of all possible combinations, random search

randomly samples from the hyperparameter space.

3. Bayesian Optimization:

Description: select the most promising hyperparameters to evaluate next.

4. Hyperband:
 Description: dynamically allocates resources to different configurations,
allowing for early stopping of poor configurations.

Q13. In simple terms, explain the concept of a decision tree. How does it
make decisions for classifying data points?
Decision tree in machine learning:
1. Prepare the data
2. Start with root node
3. Find the best split/feature
4. Create branches
5. Build tree
6. High entropy = more disorder
7. Low entropy = more purity

Q14. Discuss the differences between descriptive, predictive, and

prescriptive analytics. Provide examples of each.

1. Descriptive Analytics

Purpose: Descriptive analytics focuses on summarizing historical data to

understand what has happened in the past. It provides insights into past
performance and trends by interpreting raw data and transforming it into
meaningful information.

Key Features:

 Summarizes past events.

 Uses data aggregation and data mining techniques.
 Provides insights through reports, dashboards, and data visualization.

Annual revenue reports

Year-over-year sales reports

2. Predictive Analytics

Purpose: Predictive analytics aims to forecast future events by analyzing

historical data and identifying patterns. It uses statistical models and machine
learning algorithms to predict outcomes.
Key Features:

 Focuses on predicting future events based on historical data.

 Utilizes techniques such as regression analysis, time series analysis, and
machine learning.
 Helps in risk assessment, demand forecasting, and identifying future
opportunities.

Stock Price Forecasting:

3. Prescriptive Analytics

Purpose: Prescriptive analytics goes a step further by not only predicting future
outcomes but also recommending actions to achieve desired results. It combines
predictive analytics with optimization techniques to suggest the best course of
action.

Key Features:

 Provides actionable recommendations.

 Uses optimization and simulation algorithms.
 Focuses on decision-making and improving outcomes.

Healthcare Treatment Plans:

Q15. Describe the basic idea behind linear regression. How can you interpret
the coefficients in a linear regression model?
Linear regression is a statistical method used to model and analyze the
relationships between a dependent variable and one or more independent
variables.
best-fitting linear relationship that predicts the dependent variable based on
the independent variables
y = β0 + β1x1 + β2x2 +⋯+ βnxn + ϵ
y = dependent variable
β0 = (xi = 0)
x1 = independent variable
e = diff between predicted value and expected values
B1: coefficient
Interpreting the Coefficients in a Linear Regression Model:
1. Intercept (β0): value of the dependent variable when all independent
variables are equal to zero.
2. Slope Coefficients (β1,β2,…,βn): represents the expected change in the
dependent variable for a one-unit increase in the corresponding independent
variable.
if β1 = 2 it means that for every one-unit increase in x1 the dependent variable
y is expected to increase by 2 units

Q16. Role of P-Value in hypothesis testing interpret (one tailed two tailed)
p-value key concept in hypothesis testing that determine observed data is
consistent with the null hypothesis.

Role of the p-value in Hypothesis Testing

1. Definition: The p-value measures the probability of obtaining a test result

as the one observed, assuming that the null hypothesis (H0) is true.
2. Decision Rule:
o Low p-value (typically ≤ 0.05): suggesting that there is enough
evidence to reject H0.
o High p-value (> 0.05): suggesting that there is not enough
evidence to reject H0.

One-Tailed Test:

 Purpose: Tests if the effect is in one specific direction (either greater than
or less than a certain value).

Example:
Testing if a new drug increases recovery rates more than the existing
drug:

 H0: The new drug is not more effective.

 H1: The new drug is more effective.
 If the p-value is 0.03 you reject H0 and conclude that the new drug is
significantly more effective.

Two-Tailed Test:

 Purpose: Tests if the effect is in either direction (either greater than or

less than a certain value).

Example:

Testing if a new drug has a different effect (either more or less effective) than
the existing drug:

 H0: The new drug has the same effect. μ=μ0

 H1: The new drug has a different effect.
 If the p-value is 0.03 you reject H0 and conclude that the new drug's
effect is significantly different from the existing drug.

Q17. Describe the various stages of the Data Analysis Process (e.g., Data
Collection, Cleaning, Exploration, Analysis, Visualization). Briefly explain the
importance of each stage:
extracting meaningful data from raw data

 Data Collection: Sets the stage for analysis by ensuring relevant and comprehensive data.
 Data Cleaning: Ensures the accuracy and reliability of the data.
 Data Exploration: Provides initial insights and guides further analysis[mean, median,
mode, plots, graphs]
 Data Analysis: Extracts meaningful and actionable insights from the data.
 Data Visualization: Communicates findings effectively and aids in decision-making.

Q18. Generate a simulated time series data with a specific ARIMA model and
analyze its ACF and PACF plots
ACF: (Autocorrelation)
correlation between two past values’
more than one time lag
not remove effect of shorter lags
uses indirect impact
does not use coefficient
PACF (Partial Autocorrelatioon)
correlation between one past value and one current values.
Only one time lag
Remove the effect of shorter lag
Use direct impact
Uses coefficient

Hypo Sludge Concrete
No ratings yet
Hypo Sludge Concrete
25 pages
Unit 2
No ratings yet
Unit 2
20 pages
ds unit 2 qb
No ratings yet
ds unit 2 qb
25 pages
UNIT4
No ratings yet
UNIT4
8 pages
EDA QB Full Answers
No ratings yet
EDA QB Full Answers
18 pages
Dev Answer Key
100% (1)
Dev Answer Key
17 pages
Data Basics for ML
No ratings yet
Data Basics for ML
23 pages
Unit 1 - Intro To EDA
No ratings yet
Unit 1 - Intro To EDA
40 pages
Data Science Process
No ratings yet
Data Science Process
30 pages
exp 4-10 merged
No ratings yet
exp 4-10 merged
89 pages
RDA imp
No ratings yet
RDA imp
26 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
13 pages
Unit .......
No ratings yet
Unit .......
45 pages
Edashsh
No ratings yet
Edashsh
7 pages
Amit_Khilare_Used_Device_Data_PM_Project
No ratings yet
Amit_Khilare_Used_Device_Data_PM_Project
25 pages
Creative and Minimal Portfolio Presentation
No ratings yet
Creative and Minimal Portfolio Presentation
5 pages
Estadístic A Descriptiv A: Dr. Lázaro Bustio Martínez Otoño 2023
No ratings yet
Estadístic A Descriptiv A: Dr. Lázaro Bustio Martínez Otoño 2023
42 pages
C21_SMA_EXP4[1]
No ratings yet
C21_SMA_EXP4[1]
12 pages
Data Visualization
No ratings yet
Data Visualization
18 pages
datascience 3
No ratings yet
datascience 3
40 pages
DVT and TSA 2 marks
No ratings yet
DVT and TSA 2 marks
14 pages
Common Visualization Idioms
0% (1)
Common Visualization Idioms
95 pages
Ia - Eda
No ratings yet
Ia - Eda
10 pages
Crash Course Data Science
No ratings yet
Crash Course Data Science
7 pages
Q2 Ans
No ratings yet
Q2 Ans
5 pages
Grey Minimalist Business Project Presentation
No ratings yet
Grey Minimalist Business Project Presentation
5 pages
Assignment EDA
No ratings yet
Assignment EDA
4 pages
Document (9)
No ratings yet
Document (9)
8 pages
Unit 4 Exploratory Data Analysis and the Data Science Process (1)
No ratings yet
Unit 4 Exploratory Data Analysis and the Data Science Process (1)
9 pages
Data Science Visualization in R
No ratings yet
Data Science Visualization in R
42 pages
DV Unit-I
No ratings yet
DV Unit-I
25 pages
FDS Pyq2
No ratings yet
FDS Pyq2
10 pages
Unit 3
No ratings yet
Unit 3
47 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
3 pages
Lecture 1 Exploratory Data Analysis
No ratings yet
Lecture 1 Exploratory Data Analysis
41 pages
Data Analytics and Interactive Dashboards using Python
No ratings yet
Data Analytics and Interactive Dashboards using Python
96 pages
Chapter 2 - Understand Data
No ratings yet
Chapter 2 - Understand Data
63 pages
CH4 Exploratory Data Analysis
No ratings yet
CH4 Exploratory Data Analysis
12 pages
Exploratory Data Analysis-1
No ratings yet
Exploratory Data Analysis-1
10 pages
02Data Edited v2
No ratings yet
02Data Edited v2
43 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
5 pages
Chapter 2
No ratings yet
Chapter 2
53 pages
data mining 2
No ratings yet
data mining 2
64 pages
Module 1
No ratings yet
Module 1
64 pages
BI-LEc 3
No ratings yet
BI-LEc 3
24 pages
L4 Exploratory Analysis en
No ratings yet
L4 Exploratory Analysis en
42 pages
EDA Feature eng- Estimation Inference and Hypothesis
No ratings yet
EDA Feature eng- Estimation Inference and Hypothesis
53 pages
Histograms and Density Plots in R
No ratings yet
Histograms and Density Plots in R
9 pages
Unit 3 DATA VISUAIZATION
No ratings yet
Unit 3 DATA VISUAIZATION
25 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
62 pages
Exploratory Data Analysis - Satyajit
No ratings yet
Exploratory Data Analysis - Satyajit
35 pages
Lesson 2 Notes
No ratings yet
Lesson 2 Notes
11 pages
UNIT 1
No ratings yet
UNIT 1
15 pages
Lect 3
No ratings yet
Lect 3
51 pages
Iba Unit - Ii
No ratings yet
Iba Unit - Ii
31 pages
DSE 3 Unit 4
No ratings yet
DSE 3 Unit 4
8 pages
Unit 3 Ids Notes
No ratings yet
Unit 3 Ids Notes
31 pages
FDS - 3 SOLVED
No ratings yet
FDS - 3 SOLVED
21 pages
Exploratory Data Analysis Presentation
No ratings yet
Exploratory Data Analysis Presentation
16 pages
Unit 4
No ratings yet
Unit 4
21 pages
Statistics and Data Analysis Essentials
From Everand
Statistics and Data Analysis Essentials
Jayant Ramaswamy
No ratings yet
63-9229 - Rev-F - Puck - Spec Sheet - Web
No ratings yet
63-9229 - Rev-F - Puck - Spec Sheet - Web
2 pages
Weibull Analysis Of Failures - Barringer and Associates, Inc.
No ratings yet
Weibull Analysis Of Failures - Barringer and Associates, Inc.
14 pages
TE DBMSL Project Templet
No ratings yet
TE DBMSL Project Templet
8 pages
Yeast Genetics Methods and Protocols 1st Edition Jeffrey S. Smith pdf download
100% (2)
Yeast Genetics Methods and Protocols 1st Edition Jeffrey S. Smith pdf download
58 pages
5EST Participants List
No ratings yet
5EST Participants List
21 pages
Guide For Visual Inspection of Structural Concrete Building Components
No ratings yet
Guide For Visual Inspection of Structural Concrete Building Components
76 pages
MAPEH 6_Q1
No ratings yet
MAPEH 6_Q1
6 pages
FALLSEM2020-21 ECE2003 ETH VL2020210101783 Reference Material I 14-Jul-2020 DLD Satheesh
No ratings yet
FALLSEM2020-21 ECE2003 ETH VL2020210101783 Reference Material I 14-Jul-2020 DLD Satheesh
127 pages
Detection of Disease in Bombyx Mori Silkworm by Using Image Analysis Approach
No ratings yet
Detection of Disease in Bombyx Mori Silkworm by Using Image Analysis Approach
5 pages
Prediction of Shallow Water Resistance For A New Ship Model Using CFD Simulation Case Study Container Barge
No ratings yet
Prediction of Shallow Water Resistance For A New Ship Model Using CFD Simulation Case Study Container Barge
10 pages
MIRIAM M. GARZA v. RANIER L.L.C., Et Al.
No ratings yet
MIRIAM M. GARZA v. RANIER L.L.C., Et Al.
11 pages
Chapter 3 Calculating The WQCV and Volume Reduction
No ratings yet
Chapter 3 Calculating The WQCV and Volume Reduction
37 pages
PMI PMP Questions
No ratings yet
PMI PMP Questions
22 pages
Krishna-DCCB_Notification_Staff-Assistants
No ratings yet
Krishna-DCCB_Notification_Staff-Assistants
20 pages
Casestudy HPCL
No ratings yet
Casestudy HPCL
5 pages
Targuard
No ratings yet
Targuard
4 pages
FR Test 2 Full Syllabus Question Paper1655723440
No ratings yet
FR Test 2 Full Syllabus Question Paper1655723440
12 pages
ATC_a189f371-75c3-414f-b0881720673693632_Security3
No ratings yet
ATC_a189f371-75c3-414f-b0881720673693632_Security3
52 pages
Commercial Union Assurance v. Lepanto
No ratings yet
Commercial Union Assurance v. Lepanto
2 pages
Business Fundamentals BSC Curriculum Integration
No ratings yet
Business Fundamentals BSC Curriculum Integration
36 pages
A Study On Consumers Perception Towards
No ratings yet
A Study On Consumers Perception Towards
8 pages
REPROGRAM Your Mind To Be Rich in 22 Minutes....
No ratings yet
REPROGRAM Your Mind To Be Rich in 22 Minutes....
9 pages
QUERIES For Practical File 2023-24
No ratings yet
QUERIES For Practical File 2023-24
4 pages
Impact Oct 2013 - Ryerson University
No ratings yet
Impact Oct 2013 - Ryerson University
9 pages
Introduction To Clodu Kitchen
No ratings yet
Introduction To Clodu Kitchen
9 pages
Simple Exponential Smoothing
No ratings yet
Simple Exponential Smoothing
32 pages
TOPIC 2 ISLAMIC ECONOMIC SYSTEM Sept16
No ratings yet
TOPIC 2 ISLAMIC ECONOMIC SYSTEM Sept16
33 pages
Config Guide Trim Op Tim Ization Apo
No ratings yet
Config Guide Trim Op Tim Ization Apo
13 pages
Junard Resume
No ratings yet
Junard Resume
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Fda End Sem

Uploaded by

Fda End Sem

Uploaded by

Q1. Explain the difference between descriptive and inferential statistics.

Provide examples of common descriptive statistics (e.g., mean, median,

Common Descriptive Statistics:

Q2. Key principles of data visualization?

Data visualization is an essential tool for communicating information effectively. To create

 Represent Data Honestly: Avoid distorting the data or creating misleading

 Direct Presentation: Present information in a straightforward manner without

 Purpose-Driven Design: Ensure each element of the visualization serves a clear

Q3. Explain the concept of data visualization in EDA. Why is it important?

1. Graphical Representation: Transforming data into visual formats such as charts,

Importance of Data Visualization in EDA

1. Enhanced Understanding: Visual representations simplify complex data, making it

 Description: A comprehensive library for creating static, animated, and interactive

Description: Built on top of Matplotlib, Seaborn provides a high-level interface for

 import seaborn as sns

A Python implementation of the grammar of graphics, inspired by ggplot2 in R.

Example: from ggplot import ggplot, aes, geom_point

Q5. What is the null hypothesis tested in a one-way ANOVA?

are the population means of the k different groups being compared.

H1 : at least one μi not equal to μj I is not equal to j

Null Hypothesis (H₀): All group means are equal

Q6. Describe the situations where a non-parametric test would be preferred

1. Early Visualizations: The roots of data visualization can be traced

2. Statistical Graphics: In the 17th and 18th centuries, pioneers like

3. Technological Advancements: The Industrial Revolution brought

4. Computer Age: The advent of computers in the 20th century

5. Interactive Visualization: With the rise of the internet and

Importance of Data Visualization:

1. Cognitive Efficiency: Humans process visual information more

Role of Features in Machine Learning

Example of a Feature and Its Target Variable:

Assumptions of the Mann-Whitney U Test:

Q12. Discuss the role of hyperparameters in machine learning models. How

Tuning Hyperparameters for Optimal Performance:

 Description: set of possible values for each hyperparameter and then

 Description: Instead of all possible combinations, random search

Description: select the most promising hyperparameters to evaluate next.

Q14. Discuss the differences between descriptive, predictive, and

Purpose: Descriptive analytics focuses on summarizing historical data to

 Summarizes past events.

Annual revenue reports

Purpose: Predictive analytics aims to forecast future events by analyzing

 Focuses on predicting future events based on historical data.

Stock Price Forecasting:

 Provides actionable recommendations.

Healthcare Treatment Plans:

Role of the p-value in Hypothesis Testing

1. Definition: The p-value measures the probability of obtaining a test result

 H0: The new drug is not more effective.

 Purpose: Tests if the effect is in either direction (either greater than or

 H0: The new drug has the same effect. μ=μ0

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.