0% found this document useful (0 votes)
8 views29 pages

Data Collection

Data collection

Uploaded by

yasmin tarek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views29 pages

Data Collection

Data collection

Uploaded by

yasmin tarek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Data Collection

Agenda

• key factors that influence data collection


• data source reliability
• data relevance
• sample size
• Case studies on how data collection strategies impact the outcome of business
decisions.
• Explore the concepts of bias in data
• types of biases
• Hands-on exercises to identify biases in sample datasets
key factors that
influence data collection
Engaging the audience
Purpose of Research: Clearly defining the aim of your research is essential to
guide the data collection process.
Data Type: Decide whether you need quantitative data (numerical) or qualitative
data (descriptive) based on your research questions.
Collection Method: Choose the most suitable method for gathering data, such as
surveys, interviews, or observations.
Ethical Considerations: Address ethical issues like privacy, consent, and data
protection to maintain the integrity of your research.
Data quality
reliability
1. Reliability
Reliability refers to the consistency and dependability of data. Reliable data
yields the same results under consistent conditions. It is about the
repeatability and stability of the measurement process.
•Consistency: If the data collection process is repeated under the same
conditions, it should produce the same results.
•Dependability: Reliable data can be trusted to be consistent over time.
•Measurement Error: Minimizing random errors increases reliability. For
instance, a reliable survey instrument will yield consistent responses if used
repeatedly with the same sample.
Example: A thermometer that gives the same reading when used to
measure the temperature of the same object multiple times is reliable.
Relevance
Relevance refers to the degree to which data is pertinent to the research question or
the decision-making process. Relevant data directly impacts the objectives and
provides insights necessary for the intended purpose.
•Pertinence: Data must be closely related to the needs of the study or decision-making
process.
•Utility: Relevant data is useful and applicable in addressing specific questions or
problems.
•Contextual Fit: The data collected should align with the specific context and
requirements of the analysis.
Example: For a study on consumer behavior regarding a specific product, relevant data
would include customer reviews, purchase history, and demographic information,
rather than unrelated data such as weather patterns.
Accuracy
Accuracy refers to the correctness and precision of data. Accurate data correctly
reflects the real-world conditions or phenomena it is intended to measure.
•Correctness: The data accurately represents the truth or the actual situation.
•Precision: Accurate data is detailed and free from significant errors or biases.
•Measurement Validity: The data collection methods and instruments must be
well-calibrated and appropriate for the type of data being collected.
Example: An accurate scale will provide the exact weight of an object. If the object
weighs 50 kg, the scale should read 50 kg without significant deviation.
Summary
Differences Summarized

•Reliability focuses on the consistency of data over time and


across different conditions. It is about getting the same results
repeatedly.
•Relevance emphasizes the importance and applicability of data
to the specific needs of the research or decision-making process.
It is about whether the data matters for the purpose at hand.
•Accuracy deals with the correctness and precision of data,
ensuring it truly represents what it is supposed to measure. It is
about the data being right and error-free.
Samplimg
Define the Target Population

•Clear Definition: Specify who or what constitutes the population of interest. This includes
defining the characteristics that qualify individuals or items for inclusion in the study.
•Scope and Boundaries: Clearly outline the geographical, temporal, and demographic
boundaries of the population.
Determine the Sample Size

•Statistical Power: Larger samples provide more accurate estimates but are costlier and more time-consuming. Use
power analysis to determine the minimum sample size needed to detect an effect or difference.
Choose a Sampling Method

•Probability Sampling: Methods where every member of the population has a known, non-zero chance of being
selected. This includes:
•Simple Random Sampling: Each member has an equal chance of selection.
•stratified Sampling: Population is divided into strata, and random samples are drawn from each stratum.
•Cluster Sampling: Population is divided into clusters, some of which are randomly selected, and all members of
chosen clusters are sampled.
•Systematic Sampling: Every nth member of the population is selected after a random start.
Choose a Sampling Method
•Non-Probability Sampling: Methods where some members of the population may have no chance of
being selected. This includes:
•Convenience Sampling: Samples are chosen based on ease of access.
•Judgmental or Purposive Sampling: Samples are selected based on the researcher’s judgment
about which members are most useful or representative.
•Quota Sampling: Samples are selected to ensure certain characteristics are represented in specific
proportions.
bias
Bias
Selection Bias Observer Bias
Measurement Bias Survivorship Bias
Response Bias Attrition Bias
Confirmation Bias Confounding Bias
Selection Bias
•Selection bias occurs when the sample is not representative of the population due to non-random selection.
This leads to results that cannot be generalized to the entire population.
Measurement Bias
2. Measurement Bias
Measurement bias occurs when the data collection instruments or procedures systematically favor certain
outcomes over others.
•Instrument Bias: Arises from faulty or biased measurement tools. For example, a poorly calibrated scale
consistently giving incorrect weight readings.
•Interviewer Bias: Occurs when the interviewer's behavior or questioning influences the responses. For example,
leading questions can sway respondents' answers.
•Recall Bias: Happens when participants do not accurately remember past events or experiences. This is common
in retrospective studies.
Response Bias
• Response bias occurs when participants do not provide truthful or accurate responses, leading to distorted
data
Confirmation Bias
Confirmation bias occurs when researchers selectively collect or interpret data in a way that confirms their
preexisting beliefs or hypotheses.

•Data Mining Bias: Occurs when researchers look for patterns in the data that support their hypothesis while
ignoring those that contradict it.
•Publication Bias: Tendency for studies with positive or significant results to be published more often than those
with null or negative results.
Observer Bias
Observer bias happens when the researcher's expectations or knowledge influence their observations and
interpretations.
Attrition Bias
Attrition bias occurs when participants drop out of a study over time, and those who remain are systematically
different from those who leave.
Case Study: The Impact of Biased
Data on Decision-Making
Background
A prominent technology firm, TechSolutions,
implemented a machine learning algorithm to
streamline its hiring process. The goal was to
identify and recruit top talent more efficiently by
analyzing resumes and ranking candidates based
on predicted job performance.
The Problem
After six months of using the algorithm, it was
observed that the number of women and
minority candidates being hired had significantly
decreased. This discrepancy raised concerns
about potential biases in the hiring algorithm.
Data Analysis
Historical Bias: The training data consisted primarily of resumes from the past
decade, during which the majority of employees hired were white males. This
historical bias was encoded into the algorithm, which learned to favor similar
profiles.
Feature Selection: Certain features that correlated with higher hiring success,
such as attending specific universities or having particular job titles, were
disproportionately common among white male candidates, leading the
algorithm to favor these candidates.
Lack of Diversity in Training Data: The dataset lacked sufficient representation
from women and minority groups, preventing the algorithm from accurately
assessing the potential of candidates from these backgrounds.
Consequences

•Reduced Diversity: The company missed out on diverse


perspectives and talents, which are crucial for innovation and
problem-solving.
•Legal and Reputational Risks: The company faced potential
legal action for discriminatory hiring practices and suffered
damage to its reputation as an inclusive employer.
Thank you
Jackie Abualeam
Ministry of communication and
information technology

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy