0% found this document useful (0 votes)
68 views

DS100 Sp22 Lec 01 - Course Overview, Data Science Lifecycle

This document provides an overview and introduction to the Data 100/200 course at UC Berkeley. It discusses what data science is, the data science lifecycle, and what students will learn in the class. The goals of the course are to prepare students for advanced data science courses, enable careers as data scientists, and empower students to apply computational thinking to real-world problems. Key topics to be covered include pandas, NumPy, SQL, exploratory data analysis, and machine learning algorithms like linear regression.

Uploaded by

nlrr33hi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views

DS100 Sp22 Lec 01 - Course Overview, Data Science Lifecycle

This document provides an overview and introduction to the Data 100/200 course at UC Berkeley. It discusses what data science is, the data science lifecycle, and what students will learn in the class. The goals of the course are to prepare students for advanced data science courses, enable careers as data scientists, and empower students to apply computational thinking to real-world problems. Key topics to be covered include pandas, NumPy, SQL, exploratory data analysis, and machine learning algorithms like linear regression.

Uploaded by

nlrr33hi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 80

LECTURE 1

Course Overview
An overview of data science, Data 100/200, and the data science lifecycle.

Data 100/Data 200, Spring 2022 @ UC Berkeley


Josh Hug and Lisa Yan
1
• Intros
• What is data science?
• What will you learn in this class?
• Course overview
• Lots of important details
• Data Science Lifecycle
• Demo

Roadmap
Lecture 01, Data 100 Spring 2022

2
Intro - Josh Hug

● B.S. in Electrical Engineering from UT Austin (2003).


○ Main focus: Running “The Knighthood of Buh”.
● Ph.D. in EECS from UC Berkeley (2011).
○ Research focus: Bacterial signal processing systems.
○ Last minute pivot into teaching, thanks to Dan Garcia!
● Lecturer in CS at Princeton University (2011-2014).
● Teaching Professor at UC Berkeley (2014-now).
● Third time teaching this class. Also taught:
○ An AI/ML bootcamp for the College of Engineering (with emeritus.org).
○ 数据分析 for 网易云课堂 (https://study.163.com/).
● This semester, I’m:
○ Teaching CS195.
○ Maybe teaching a freshman/sophomore seminar on “Dark Play”.
○ Raising two small kids (1 and 4).
○ Serving as vice chair for the undergraduate CS program.
○ Advising a small research group.
○ Hoping to go surfing again sometime this winter before I miss the whole darn thing.
3
Intro - Lisa Yan

● B.S. in EECS from UC Berkeley (2013).


○ UGSI-ed for EE20N (before EECS 16 series)
● Ph.D. in Electrical Engineering from Stanford University (2019).
○ Research focus: Educational tools in Computer Science
○ Previously: Software-Defined Networks
● Lecturer in CS at Stanford University (2019-2021).
● Teaching Professor at UC Berkeley (2022-now).
My interests over time
● First time teaching this class!
○ Previously taught equivalent of Networks, Create
(0.7 STAT140 + 0.3 CS189) Data Science
technology
● This semester, I’m:
○ Teaching this course
○ Designing more pathways and support Help
Teaching
for undergrad EECS/DS research people
○ Feeling super happy
to be back. Create
GO BEARS!! Education technology to
Tools help people 4
• Intros
• What is data science?
• What will you learn in this class?
• Course overview
• Lots of important details
What is Data •

Data Science Lifecycle
Demo

Science?
Lecture 01, Data 100 Spring 2022

5
Why I Care About Data Science
(A Coronavirus Story)

6
The World is Complicated

7
Belief is Social
Map Link

8
Data is a Tool for Finding Truth

Link

9
But Data Can Be Misleading, and Analysis is Hard

10
But Data Can Be Misleading, and Analysis is Hard (Source)

11
From Tyler Vigen (http://tylervigen.com)
12
But Data is Easy to Abuse or Misinterpret

13
But Data is Easy to Abuse or Misinterpret

14
Even Important Entities Communicate Poorly!

15
Actual Data: Link
Example of a Gap in Communication: Childcare

I have a 1 year old and 4 year old.


● Perhaps as many as 1% of people in Alameda county are getting infected per day by
Omicron.

I had to decide between two choices last week:


● Keep kids home and significantly harm the quality of my courses (and sleep schedule)!
● Send my kids into daycare/preschool, giving them a much higher chance of contracting
COVID.

Try as I might, I could not find a compelling argument out there!


● I ended up having to construct my own using this paper:
https://www.medrxiv.org/content/10.1101/2021.11.30.21267048v1.full.pdf
● My guess: If they get COVID, there’s a 1/600 to 1/3000 chance they’ll end up injured
enough to be a problem. High risk, but barely within my tolerance.
16
My Primary Goal for you in This Course

Be able to take data and produce useful insights on the world’s most challenging and
ambiguous problems.

17
What is Data Science?

PRINCIPLES AND TECHNIQUES OF DATA SCIENCE

18
Data is changing the world

From Joey Gonzalez. 19


Data science is a fundamentally interdisciplinary field

Data Science is the application of data centric,


computational, and inferential thinking to:
● Understand the world (science).
● Solve problems (engineering).

Joey Gonzalez
(co-creator of this course)

20
Data Science Venn Diagram

by Drew Conway in 2010 (link) 21


Data science in industry

The tasks that data scientists say they work on regularly.


Self-reported. Based on the results of the 2016 Data Science Salary Survey. 22
Insight

Good data analysis is not:


● Simple application of a statistics recipe.
● Simple application of statistical software.

There are many tools out there for data science, but they are merely tools.
● They don’t do any of the important thinking!

“The purpose of computing is insight, not numbers.” - R. Hamming. Numerical Methods for
Scientists and Engineers (1962).

23
Example Questions in Data Science

Some (broad) questions we might try to answer with data science:


● What show should we recommend to our user to watch?
● In which markets should we focus our advertising campaign?
● Is the use of the COMPAS algorithm for prison sentencing fair?
● Should I send my kids to daycare?
● Is the world getting better or worse?
● What areas of the world are at higher risks for climate change impact in 10 years? 20?
● Where should we put docking ports for our bikes?
● What should we eat to avoid dying early of heart disease?
● Do immigrants from poor countries have a positive or negative impact on the economy?

24
• Intros
• What is data science?
• What will you learn in this class?
• Course overview
• Lots of important details
What will you learn •

Data Science Lifecycle
Demo

in this class?
Lecture 01, Data 100 Spring 2022

25
What are the Principles and Techniques that We’ll Learn?

PRINCIPLES AND TECHNIQUES OF DATA SCIENCE

26
Course goals

Prepare students for advanced Berkeley courses in data


Prepare management, machine learning, and statistics, by providing
the necessary foundation and context.

Enable students to start careers as data scientists by


Enable providing experience working with real-world data, tools,
and techniques.

Empower students to apply computational and inferential


Empower thinking to address real-world problems.

27
Tentative List of Topics to be Covered in Data 100

● Pandas and NumPy ● Linear Regression


● Relational Databases & SQL ● Feature Engineering
● Exploratory Data Analysis ● Regularization, Bias-Variance Tradeoff,
● Regular Expressions Cross-Validation
● Visualization ● Gradient Descent
○ matplotlib ● Logistic Regression
○ Seaborn ● Decision Trees and Random Forests
○ plotly ● PCA
● Sampling
● Probability and random variables
● Model design and loss formulation

28
Prerequisites

Official prerequisites for this course:


● Completion of Data 8.
● Completion of CS 61A or CS 88.
● Co-enrollment in EE 16A or Math 54 or Stat 89A.

The prereqs are being strictly enforced! We will not be teaching:


● How to use Python.
● How to use Jupyter notebooks.
● Inference from Data 8.
● Linear algebra (though we will review this topic to a greater degree since linear algebra is
a corequisite, not prerequisite).

Homework 1 and Lab 1 will help calibrate your background.


● For Homework 1, the Data 8 textbook will be helpful.
29
• Intros
• What is data science?
• What will you learn in this class?
• Course overview
• Lots of important details
• Data Science Lifecycle
• Demo

Course Overview
Lecture 01, Data 100 Spring 2022

30
Staff

31
GSIs

Kunal Agarwal Bella Crouch Yulei Lin

Anirudhan Badrinath Jay Feng Vasanth Madhavan

Parth Baokar Kelly Han Mrunali Manjrekar

Francis Geng Neha Haq Minh Phan

Kanu Grover Aaron Huang James Susilo

Samantha Hing Priyanka Kargupta Arda Ulug

Andrew Lenz Michelle Li Zachary Wu

Dominic Liu Wallace Lim Xinqi Yu

GSIs teach discussion, hold office hours, and help create assignments and
exams. Contact info: ds100.org/sp22/staff.

32
Bold donates 20 hour GSI.
Tutors

Ayela Chughtai Siddhant Satapathy Emily Le

Eric Hao Yike Wang

Alina Herri Nancy Xu

Jenny Jiang William Xu

Shiangyi Lin Jacob Yim

Rachel McCarty Pragnay Nevatia

Conan Minihan Ishaan Mishra

Yiming Ni Neal Kothari

Tutors hold office hours and grade the written components of


homeworks and projects. Contact info: ds100.org/sp22/staff.

33
Course Websites / Platforms

34
Online platforms

Course website (ds100.org/sp22)


● Where all lectures, assignments, and discussions are posted.
DataHub (data100.datahub.berkeley.edu)
● Where you will work on all assignments (links on the course website automatically take you here).
Ed (https://edstem.org/us/courses/15436)
● A place to ask and answer questions about assignments and concepts.
● Where all announcements are posted (exam logistics, new assignment released, etc).
Gradescope (gradescope.com, by invitation)
● Where all assignments are submitted, and where all of your grades in this course will live.
Textbook (www.textbook.ds100.org)
● Supplemental reading.

35
Programming Environment for our Course: JupyterLab

36
Learning Advanced JupyterLab

JupyterLab offers notebooks and more tools for data science.

We’ll be accessing JupyterLab using DataHub (data100.datahub.berkeley.edu).


● At the end of the semester we’ll tell you how to use JupyterLab locally on your own machine.

Resources for learning fancier JupyterLab functionality:


● A quickest intro is this great 2-minute overview by Serena Bonaretti.
○ Note: Unlike Serena’s example, in our course we’re using JupyterLab notebooks hosted on the
internet, not on your own local computer.
● The interface overview from the official docs has more details and short, embedded videos.
● A more detailed discussion from a bio/data angle: ~45 minute video.
● Full ~3h in-depth tutorial is available from the core team.

37
Course Logistics
Content and workflow

38
Weekly Flow

39
Hybrid format

This fall, Data 100 will be run in a hybrid format. There are a lot of moving parts; we want to
cover them all now so that everyone is on the same page.
● Please give us feedback throughout the semester! Based on the data, we may change
various aspects of the course throughout the semester.
● Note: In-person meetings are fully dependent on public health guidelines. We are
prepared to hold all course activities online should circumstances demand.
Useful links:
● The following information is all on the syllabus page of the website.
● The calendar page contains the scheduling for all live events.
● Ed contains the Zoom links for all live events.

40
Hybrid Format

Due to the Omicron variant, we have a new hybrid format.


● Give us feedback! We will adapt.
● Note: In-person meetings are fully dependent on public health guidelines. We are
prepared to hold all course activities online should circumstances demand.
Useful links:
● syllabus - everything presented in this video.
● calendar - all events.
● Ed contains the Zoom links for all live events.

41
Lecture format

Two lectures per week.


● Tuesday/Thursday 3:40 - 5:00. Options:
○ Attend in person (starting Thursday 1/20 in Li Ka Shing 245).
■ Must sign up in advance:
https://www.signupgenius.com/go/805094ea8aa28a3fd0-inperson
■ Max capacity is 150 (out of 300 seats).
■ Note: Currently perhaps as many as 5% of people in Alameda county have
contagious Omicron. Not unlikely that out of 150 students at least one has
asymptomatic contagious Omicron.
○ Watch live on Zoom.
○ Watch recording afterwards (posted by the following morning).
● (optional) Quick Check questions for each video segment. Spring 2022: Just Weekly Check
● Links to slides + supplementary code.
● Posted on course website.

42
Discussion Section

Every week there will be a discussion worksheet. Two types of topics:


● Topics recently covered in lecture (more basic)
● Topics recently covered in homework (more advanced).

Structure:
● Worksheet posted and discussions held on Fridays.
● Two section types: Online and in person.
○ In person sections start week 3, pending public health guidance.
● Worksheet may include extra problems at the end that TAs will not have time to cover!

43
Discussion Section Attendance

You must sign up for a section (online or in person) before attending!


● Sign up sheet will be sent out at 5 PM today.
● 30 slots per section will be opened at 5 PM.
● 5 additional slots per section will be opened at midnight for students in other time zones.
● You’re free to switch sections at any time during the first 3 weeks. After the 3rd week,
your section choice becomes permanent .

Francis’s second online 4:00pm - 5:00pm section will be recorded and posted. Only sign up for
this section if you are OK with being recorded.

Attendance is optional, but can boost your grade if your homework score is less than perfect.
● More shortly.

44
Lab Format

There is one lab assignment per week. Labs are shorter programming assignments designed
to give you familiarity with new concepts.
● In a typical week, lab is released on Friday and is due the following Tuesday.
● All lab autograder tests are visible.

Support:
● Spoiler walkthroughs released with each lab.
○ Don’t just go straight to the spoiler video! Try on your own first.
● In-person lab support Tuesdays 5 - 8 PM.
○ Will start week 3, pending public health guidance.
● Labs are fully autograded.

45
Homework

Week-long assignments for in-depth understanding.


● Released on Friday and due the following Thursday at 11:59PM.
● 1 or 2 on paper. Remaining in Jupyter notebooks.
● Can get homework help in office hours and Ed.
● Autograded and manually graded. Contain hidden test cases.
● Must be completed individually (for details, see the Collaboration Policy).

46
Weekly Check-ins

Weekly Check-ins
● Released on Mondays, and are due the following Monday.
● Weekly surveys may also contain logistical questions.
○ For instance, the Week 1 survey asks what timezone you think you’ll be in this
semester.
● You submit weekly surveys via Google forms
○ The links to these forms will be on the website.
● Mandatory (for undergraduates).
○ We’ll drop up to four missed or late surveys.

47
Office hours

Office hours are listed on the calendar and will be held both virtually and in person.
● These are led by GSIs, tutors, and academic interns.
● Come to get help on assignments – labs, homeworks, and projects – and concepts.
● To access virtual office hours, join the queue at oh.ds100.org.
○ When joining the queue, specify which assignment and question you need help with.
○ Once it’s your turn, you will be given a Zoom link.
● In person office hours will be held in various locations specified on the calendar
○ To adhere to public health guidelines, we ask that students leave the OH room
after their questions have been answered.

Josh and Lisa will also be hosting their own office hours.
● Primary focus will be on non-HW, non-project, non-lab questions, but these are also
welcome.
● Details TBA.

48
Exams

Two midterms:
● Midterm 1: Thursday February 24th, 7-9PM Pacific.
○ Primary focus: Programming and tools.
● Midterm 2: Thursday, April 7, 7 - 8:30PM Pacific.
○ Primary focus: Math and theory. Smaller, lighter weight midterm.
● Final: Friday, May 13, 7-10pm Pacific.
○ Comprehensive.

Format:
● Current plan: Primarily in-person exams with the option for virtual exams. Online details
TBD.
● Alternate exam times will be provided for all exams for pre-approved reasons, such as a
concurrent final exam.
● If you miss an exam due to a personal emergency or illness, please contact the Head TA
Andrew Lenz immediately.
49
Grading

50
Grading Logistics

Grades will be posted on Gradescope (including discussion attendance if applicable).

Deadlines are firm at 11:59PM. Extensions are provided only to students with DSP
accommodations, or in the case of exceptional circumstances.
● No late homework or lab submissions will be accepted.
○ Gradescope may allow you to submit late, but you will be given a 0.
● You can submit projects up to 2 days late, at 10% off per day.
○ Rounded up to the next day: 2 minutes late = 1 day late.

If you have DSP accommodations, you should receive an email from us shortly.

51
Collaboration and Academic Dishonesty

We will be following the EECS Department Policy on Academic Dishonesty, which states that using
work or resources that are not your own or permitted by the course constitutes plagiarism and may
lead to disciplinary actions.
Assignments
Data science is a collaborative activity! It is okay to discuss problems with friends.
● List their names at the top of your assignments. We provide a place to do this.
● You must write your solutions individually! Do not copy any other student’s work.
● If we suspect that you have submitted plagiarized work, we will call you in for a meeting. If we
then determine that plagiarism has occurred, we reserve the right to give you a negative full
score (-100%) or lower on the assignments in question, along with reporting your offense to the
Center of Student Conduct.
Exams
● Cheating on exams is a serious offense. We will have proctoring in place and will prosecute
those caught cheating, with serious consequences for your career – so don’t do it!

52
Weekly Announcements

Weekly announcements will appear on EdStem only!


● We will post on EdStem such that you receive emails.
● You are responsible for reading these announcements!
● We will also try to cover announcements at the beginning of lecture, but the ground truth is
what you see in the weekly EdStem post by the head TA.

53
We are Here to Help!

We want you to succeed!


● All of these policies have been created to keep you on track and learning efficiently.
● But exceptions are possible and we’ll change course if we’ve made suboptimal decisions.
○ Feel free to reach out to staff with comments or concerns!

Welcome to Data 100/Data 200!

54
• Intros
• What is data science?
• What will you learn in this class?
• Course overview
• Lots of important details
Data Science •

Data Science Lifecycle
Demo

Lifecycle
Lecture 01, Data 100 Spring 2022

55
The “data science lifecycle” you will see out in the wild may be slightly different than
the one we teach you, but the core ideas are all the same.

56
Data science lifecycle
The data science lifecycle is a
high-level description of the data
science workflow.

Note the two distinct entry points! Ask a Obtain


Question Data

Understand Understand
the World the Data

Reports, Decisions,
and Solutions
57
1. Question/Problem Formulation

● What do we want to know?


● What problems are we trying to
solve?
● What are the hypotheses we want Ask a Obtain
to test? Question Data
● What are our metrics for success?

Understand Understand
the World the Data

Reports, Decisions,
and Solutions
58
2. Data Acquisition and Cleaning

● What data do we have and what


data do we need?
● How will we sample more data?
● Is our data representative of the Ask a Obtain
population we want to study? Question Data

Understand Understand
the World the Data

Reports, Decisions,
and Solutions
59
3. Exploratory Data Analysis & Visualization

● How is our data organized and


what does it contain?
● Do we already have relevant data?
● What are the biases, anomalies, or Ask a Obtain
other issues with the data? Question Data
● How do we transform the data to
enable effective analysis?

Understand Understand
the World the Data

Reports, Decisions,
and Solutions
60
4. Prediction and Inference

● What does the data say about the


world?
● Does it answer our questions or
accurately solve the problem? Ask a Obtain
● How robust are our conclusions Question Data
and can we trust the predictions?

Understand Understand
the World the Data

Reports, Decisions,
and Solutions
61
• Intros
• What is data science?
• What will you learn in this class?
• Course overview
• Lots of important details
Demo: The Data •

Data Science Lifecycle
Demo

Science Lifecycle
Available on the course website:
Lecture 01, Data 100 Spring 2022 https://ds100.org/sp22/lecture/lec01/

62
[1] Ask a Question: Who are you?

Ask a Obtain
Question Data
Demo Slides

Understand Understand
the World the Data
63
Reports, Decisions
[2] Data Acquisition and Cleaning

Ask a Obtain
Question Data
Demo Slides

Understand Understand
the World the Data
64
Reports, Decisions
[3] Exploratory Data Analysis and Visualization

Let’s understand what our data tells us, and


let’s clean the data while we’re at it.

Ask a Obtain
Question Data
Demo Slides

Understand Understand
the World the Data
65
Reports, Decisions
[3] Exploratory Data Analysis and Visualization

Population: Data 100 students, Spring 2022


Some sub-questions:

1. How many students are in the class?

2. What are your majors?

3. What year are you?
4. Diversity ...?

Ask a Obtain
Question Data
Demo Slides

Understand Understand
the World the Data
66
Reports, Decisions
[3] A harder direction to explore

Diversity ...?
Unfortunately, surveys of data scientists
suggest that there are far fewer women:

Demo Slides

To learn more, check out the Kaggle Executive


Summary or study the Raw Data. 67
[4, 1] “What fraction of the students are female?”

This is a complex question. Are we asking


about sex (biological trait) or gender
(individual, social, cultural identity)?
The Data Science Program wants to improve
gender diversity.

Ask a Obtain
Question Data
Demo Slides

Understand Understand
the World the Data
68
Reports, Decisions
What is the gender diversity of this class?

We don’t currently have data to answer this


question. We could either:
1. Survey the students, or…
2. …Use the data we have to estimate the
sex of the students as a proxy for
gender???*

*Do not attempt #2 alone; it is flawed in many


ways (we’ll discuss this later).

We are only exploring #2 in this lecture to


Demo Slides illustrate inferential modeling and combining
multiple data sources to reason about
something we haven’t measured.

69
[1, 2] (again, but for Baby Names Data)

1. Can we estimate a person’s sex using their


name?
2. Obtain more data: SSN Baby Names
Discuss: Based on the description of the SSN
data: What are some limitations of this
datasource?
What limitations might it have
with respect to our original task?
��
Ask a Obtain
Question Data
Demo Slides In Zoom chat:
Type your thought and **wait** to send.
After 30 seconds, you’ll simultaneously
Understand
press Enter. Understand
the World the Data
70
We’ll come back to this…
Reports, Decisions
[2, 3] (again, but for Baby Names Data)

What does each row/column represent?


What can you observe about how U.S. baby
names have changed over time?

Ask a Obtain
Question Data
Demo Slides

Understand Understand
the World the Data
71
Reports, Decisions
[4] Prediction and Inference: Simple Classifier

Let’s use this data to estimate the fraction of


female students in the class.

Simple classifier:
1. SSN: Proportion of F babies per name
2. Use step 1 to classify each student name
as F, M, or Unknown
3. Average step 2 to get a class prop. F

1. How
Askdo
a you feel about the estimatedObtain
proportion of females in this class? Data
Question
2. Do you trust it?
Demo Slides

Understand Understand
the World the Data
72
Reports, Decisions
A Classifier that Captures Uncertainty

Our current model doesn’t capture the we saw


in the data. We can use simulation to provide a
better distributional estimate.

Updated classifier:
1. SSN: Proportion of F babies per name
2. For each student name with step 1:
a. Pick a number in [0.0, 1.0)
b. If 2a is less than SSN prop F (or 0.5 for
Unknowns), classify student as F.
Otherwise, classify as M.
Demo Slides 3. Average step 2 to get a class prop. F

1. How do you feel about the estimated


proportion of females in this class?
2. Do you trust it? 73
Recap of what we just saw:

Find Spring 2022 DS100 data


Explore interesting things about our class:
names, majors, counts
● Get stuck on a question: gender diversity
Find more data: Baby Names (U.S. SSN)
● Approximate gender with sex
Create a classifier
● Simple classifier: names are exactly F/M
● Random classifier: all names have some
probability of F
Demo Slides
Gut check: How comfortable were you being
the data subject in this study? �
Reality check: What about those
limitations we talked about? 74
What are some limitations of our analysis?

Possible limitations:
● U.S. name data, not global data
● Everyone born since 1937
● No “rare” names
● Sex as a proxy for gender

How might this impact our analysis?


● UC Berkeley students are from around
the world
● Most of our class is born around 2000
● Gender has been proxied to a binary
Demo Slides classification (Learn more: GenEq)
● A lot is encoded in a name. Maybe our
class data was fundamentally
insufficient to answer our original
question on gender diversity.
75
Human Contexts in Data Science

Representation: How does data stand in for


complex phenomena in the world?
Identity: What kinds of identities are involved
in the data? Whose? What happens to identity
in the process of data analysis?

In our (faulty) analysis:


Name → Sex → Gender

Reductions of Identity based on Name have


historically reproduced existing social bias
Demo Slides against minoritized groups:
Job seekers with White-sounding first
names received 50% more callbacks from
employers than job seekers with
Black-sounding names.
76
[Bertrand & Mullainathan, 2003]
How can we fix these flaws?

Our original question:


What is the gender diversity of our class?
We didn’t have data to answer this question.
We could either:

1. Weekly Check 1!
Survey the students, or…
2.
❌ …Use the data we have to estimate the
sex of the students as a proxy for
gender???

What you learn in Data 100 will help you


explore, challenge, and justify these beliefs
Demo Slides in every step of the Data Science Lifecycle.
…And sometimes the takeaway is that we
need to collect better data.
77
What’s the point of this demo?

There are many assumptions in data science:


● Whether the data is representative:
○ Of the question being asked
○ Of the world and its implications
● Beliefs/backgrounds of data collectors
● Beliefs/backgrounds of data analysts
● Beliefs/backgrounds of the population

Data Science does not and cannot live in a


Demo Slides theoretical vacuum. Data Science is a
human-centered technical practice.

78
See you soon!
Weekly Check 1 (due Mon 2/24)
https://forms.gle/y46QNWarM27i8BUp8
Preferences for online/in-person
Discussion Sign Up (first-come first-serve, attendance grade optional)
Some slots reserved for release at midnight for async students
In-person Lecture Sign Up (for Th 1/20, Tu 1/25, Th 1/27):
https://www.signupgenius.com/go/805094EA8AA28A3FD0-inperson
Discussion this Friday, Zoom links to be posted on Ed 79
LECTURE 1

Course Overview
Content credit: Suraj Rampure, Allen Shen, Joey Gonzalez, Josh Hug, and Sam Lau

80

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy