0% found this document useful (0 votes)
6 views

Lec 5

Dyts

Uploaded by

jamalirashidali3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Lec 5

Dyts

Uploaded by

jamalirashidali3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Think of a scenario where you need to

Decide whether use primary or secondary


data collection and why.
collect data (e.g., surveying classmates 5. Task
about their favorite Data Visualization tool.

Good data quality is essential because


inaccurate or incomplete data can lead to
wrong conclusions.
Why is Data Quality Important? Definition: Data collection is the process of
gathering and measuring information from
Example: If half of a survey's responses are
different sources to gain insights or make
missing, the final analysis might be biased.
decisions.
1. What Is Data Collection?
Accuracy: The data must be correct and free
In Data Science, data collection is crucial
from errors. 4. Importance of Data Quality because the quality of your analysis depends
on the quality of your data.
Completeness: All required data is available.

Consistency: Data should be consistent


across datasets (e.g., "M" for Male, not "M" in Characteristics of High-Quality Data: Data collected directly from the source.
one place and "Male" in another).
Lecture 3: Data Example: Surveys, interviews, observations.
Timeliness: Data should be up-to-date.
Collection Methods Primary Data Collection:
Advantages: Tailored to your needs, high

And Sources
Example: Discuss how poor-quality data accuracy.
could affect decisions, such as predicting There are two main methods of data
exam scores using an incomplete dataset. collection: Disadvantages: Time-consuming, expensive.

Example: Surveying students to collect


information about their study habits.
Many organizations provide public datasets
for free use.
Data that is already collected by someone Example: Data from government databases,
2. Types of Data Collection else and made available for use. research papers, or online sources.
Kaggle: A popular platform for Data Science
competitions with datasets.
Public Datasets: Methods
Secondary Data Collection: Advantages: Cost-effective, readily available.

UCI Machine Learning Repository: A


Examples: Disadvantages: May not fit your exact needs,
collection of datasets for machine learning.
data could be outdated.

Government Databases: Websites like data.


Example: Using data from the World Bank on
gov provide publicly accessible data.
population statistics.

Definition: Web scraping is the process of


extracting data from websites. 3. Key Data Sources in Data
Web Scraping:
Science
Example: Extracting pricing information from
an e-commerce website.

Note: Web scraping should always respect


the website’s terms of service and legal
considerations.

APIs allow you to request data from a service


or website programmatically.
APIs (Application Programming Interfaces):
Example: Getting weather data from an API
provided by a weather service.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy