Lec01 Slides

Download as pdf or txt
Download as pdf or txt
You are on page 1of 58

6.

100B:
Introduction to
Computational Thinking
and Data Science

April 3, 2023 6.100B LECTURE 1 1


Introduction,
Optimization Problems,
and a little Python
(download slides and .py files from Stellar to follow along)
Andrew Wang
MIT Department of Electrical Engineering and
Computer Science

April 3, 2023 6.100B LECTURE 1 2


Welcome

▪ For those of you who just completed 6.100A –


welcome back!
▪ To those of you just joining us for 6.100B – welcome!
▪ After class, make sure:
◦ You have access to our course website
◦ You have signed up for our course Piazza
◦ You have installed Python via Pset 0
◦ You complete Finger Exercise 1
◦ You are able to complete Microquiz 0

April 3, 2023 6.100B LECTURE 1 4


Course Staff
▪ Lecturers
◦ Ana Bell (behind the scenes)
◦ Frédo Durand
◦ Stefanie Mueller
◦ Andrew Wang
▪ Great group of TA’s
▪ Large group of undergraduate LA’s

6.100A LECTURE 1 6
6.100B Prerequisites

▪ Experience writing object-oriented programs in Python 3


▪ Familiarity with concepts of computational complexity
▪ Familiarity with some basic algorithms
◦ For example, recursion, iteration, bisection search, sorting
▪ 6.100A sufficient, but not necessary

April 3, 2023 6.100B LECTURE 1 11


Topics
▪ 6.100A (what you saw, or should know before 6.100B)
◦ Solving problems using computation
◦ Python programming language and standard libraries
covered in 6.100A
◦ Organizing modular programs – functions, classes
◦ Some simple but important algorithms
◦ Algorithmic complexity
▪ 6.100B (what we are going to cover)
◦ Using computation to model the world
◦ Simulation models
◦ Understanding data
▪ Variant of 6.100B: "Intro to Computational Science and
Engineering" (CSE.C20/16.C20/18.C20)
April 3, 2023 6.100B LECTURE 1 12
Intro to Computational Science & Engineering
(CSE.C20/16.C20/18.C20) Alternative version of 6.100B

For
Mathematics Computer Science more
info: Global warming

CSE
Neuron
bursting
Science &
Engineering

Cell tower placement

Martian lander 6.100A LECTURE 1 13


Optimization Problems,
and a little Python

April 3, 2023 6.100B LECTURE 1 14


Relevant Reading
▪ Today
◦ Section 14.1
▪ Next Lecture
◦ Chapter 15
◦ Section 5.3.2

All code and an errata sheet at


https://github.com/guttag/Intro-to-Computation-and-Programming
April 3, 2023 6.100B LECTURE 1 15
Models
▪ Abstractions that help us to understand something
that has happened or to predict the future
▪ Why has burning fossil fuels led to global climate
change?
▪ Will closing schools reduce the spread of Covid-19?

April 3, 2023 6.100B LECTURE 1 16


Computational Models
▪ Use computation to help understand the world:
◦ Study behavior of complex physical system
◦ simulate solution to mathematical model that has no analytic solution;
◦ or measure statistical parameters from noisy data
◦ Computational can complement physical experiments and
mathematical models (in silico vs. in vitro or in vivo vs. in
doctrina)
▪ Cheaper to run computational experiment than an
actual one

▪ Often can’t run an actual experiment


◦ Not easy to reduce fossil fuel consumption or change the
tax code

April 3, 2023 6.100B LECTURE 1 17


Kinds of Models
▪ Optimization models
◦ Find mathematical models that maximize (or minimize)
some criterion, subject to constraints on solution
◦ Knapsack, graph search / shortest path
▪ Simulation models
◦ Find models that explain observations from large
numbers of random, noisy trials
◦ Monte Carlo, random walks
▪ Statistical models
◦ Find models that deduce parameters or label examples in
presence of noise
◦ Curve fitting, clustering, classifiers (machine learning)

April 3, 2023 6.100B LECTURE 1 18


What Is an Optimization Model?
▪ An objective function that is to be maximized or minimized
◦ E.g., maximize information collected about the seafloor by an
underwater robot

Just optimizing
objective function is
often straightforward

▪ A set of constraints (possibly empty) that must be


honored, e.g.,
◦ Maintain enough power to deploy from and return to ship
◦ Don’t collide with seafloor
This problem is often
◦ Avoid high temperatures
more interesting
April 3, 2023 6.100B LECTURE 1 20
Optimization Problems
▪ Anytime you are trying to maximize or minimize
something, you are solving an optimization problem

April 3, 2023 6.100B LECTURE 1 21


Disclaimer: We are not
Imagine that You Are a Burglar endorsing this career path!

April 3, 2023 6.100B LECTURE 1 22


Knapsack Problems

▪ You have limited strength, so there is a maximum


weight knapsack that you can carry
Also
▪ You want to take more stuff than you can carry known as a
rucksack,
▪ How do you choose what to take vs. leave behind? haversack,
backpack
◦ Want to optimize “value” of things to take
▪ Two variants
◦ Continuous or fractional knapsack problem Straightforward
◦ 0/1 knapsack problem Much more interesting

Quanta are Quanta are


infinitesimal versus large relative to
relative to available space
available space

April 3, 2023 6.100B LECTURE 1 23


YOUR TURN
▪ Is it a knapsack problem?
◦ Ranking professors by quality
◦ Picking stocks for a portfolio
◦ Finding the shortest path between two places
◦ Debugging a Python program
◦ Deciding which subjects to take freshmen year
◦ Managing work/life balance

April 3, 2023 6.100B LECTURE 1 26


Our Running Example

▪ You are about to sit down to


a meal
▪ You know how much you
value different foods, e.g.,
you like donuts more than
Brussels sprouts
▪ But you have a calorie
budget, e.g., you don’t want
to consume more than 750
calories
▪ Choosing what to eat is a
knapsack problem

April 3, 2023 6.100B LECTURE 1 27


Our Running Example

750
Calorie
Capacity

You have a limit on consumed calories;


would like to optimize choice of foods
How you measure “optimal” is your choice;
we will focus on value (how much you enjoy a food)
April 3, 2023 6.100B LECTURE 1 28
0/1 Knapsack Problem, Formalized

▪ Each item is represented by a pair, <value, weight>


◦ Referents for value and weight will vary by problem
◦ For food, this is enjoyment and calories
▪ The knapsack can accommodate items with a total
weight of no more than w
▪ A vector, I, of length n, represents the set of
available items. Each element of vector is an item.
▪ A vector, V, of length n, is used to indicate
whether or not items are taken. If V[i] = 1, item I[i]
is taken. If V[i] = 0, item I[i] is not taken.
Hence name 0/1 knapsack
April 3, 2023 6.100B LECTURE 1 29
For Example
Here is a set of items, each with a “value” (enjoyment)
and a “weight” or “cost” (calories)
Food wine beer pizza burger fries coke apple donut cake
Value 89 90 30 50 90 79 90 10 85
Cals 123 154 258 354 365 150 95 195 107

I= [ , , , , , , , , ]

If we select a vector V, for example:

V= [ 0 , 1 , 0 , 1 , 0 , 0 , 0 , 0 , 1]

that means we are selecting the subset:


beer, burger, cake
April 3, 2023 6.100B LECTURE 1 30
0/1 Knapsack Problem, Formalized
Find a 𝑉 that maximizes
𝑛−1

෍ 𝑉 𝑖 × 𝐼 𝑖 . value
𝑖=0

subject to the constraint that


𝑛−1

෍ 𝑉 𝑖 × 𝐼 𝑖 . weight ≤ 𝑤
𝑖=0

Going from an informal understanding of a problem to a


rigorous problem statement is an important skill to develop.

Vague problem → Rigorous problem → Algorithm → Code


statement statement
Real World 6.100B Pset 6.100A Pset
(6.100B lectures)
April 3, 2023 6.100B LECTURE 1 31
Many Closely Related Problems
▪ Bin-packing problem
◦ Items of different volumes must be packed into a finite number of
bins, each of a fixed given volume, in a way that minimizes the
number of bins used

▪ Multiple knapsack problem


◦ Like Bin Packing Problem, but only a subset of items may be selected;
in Bin Packing Problem, all items have to be packed into certain bins

▪ Integer knapsack problem


▪ Multiple constraint knapsack problem
▪…

April 3, 2023 6.100B LECTURE 1 32


Solving the 0/1 Knapsack
Problem:
BRUTE FORCE

April 3, 2023 6.100B LECTURE 1 53


Brute Force strategy

1. Enumerate all possible combinations of items. That


is to say, generate all subsets of the set of items.
This is called the power set.
2. Remove all of the combinations whose total units
exceeds the allowed weight.
3. From the remaining combinations choose any one
whose value is the largest.

April 3, 2023 6.100B LECTURE 1 54


Problem Input: A menu

Food wine beer pizza burger fries coke apple donut cake
Value 89 90 30 50 90 79 90 10 85
Cals 123 154 258 354 365 150 95 195 107

▪ Let’s consider how to represent this input to the


knapsack problem

April 3, 2023 6.100B LECTURE 1 55


Problem Input: Class Food
class Food(object):
def __init__(self, n, v, w):
self._name = n
self._value = v
self._calories = w
def get_value(self):
return self._value
def get_cost(self):
return self._calories
def get_density(self):
if self._calories > 0:
return self._value / self._calories
else:
return float('inf')
def __str__(self):
return (f'{self._name}: '
f'<{self._value}, {self._calories}>')
April 3, 2023 6.100B LECTURE 1 56
Problem Input: A menu of Foods

names = ['wine', 'beer', 'pizza', 'burger’,


'fries', 'cola', 'apple', 'donut', 'cake']
values = [89, 90, 95, 100, 90, 79, 50, 10, 85]
calories = [123, 154, 258, 354, 365, 150, 95, 195, 107]

def build_menu(names, values, calories):


"""names, values, calories are lists of same length
names are strings
values and calories are non-negative numbers
returns a list of Foods"""
menu = []
for i in range(len(values)):
menu.append(Food(names[i], values[i], calories[i]))
return menu

calorie_limit = 750
April 3, 2023 6.100B LECTURE 1 57
Brute Force Step 1:
Generate power set

def create_combinations(n):
"""Create a list of strings
each string is a sequence of 0's and 1's
the 1's are items to include in combination"""

if n == 0:
return ['']

else:
smaller_L = create_combinations(n-1)
full_L = []
for s in smaller_L:
full_L.append(s + '0')
full_L.append(s + '1')
return full_L

April 3, 2023 6.100B LECTURE 1 58


Brute Force Steps 2 and 3:
Search over Combinations
def brute_force(items, cost_limit):
combos = create_combinations(len(items))
best_soln, best_value = [], 0
for combo in combos:
result, total_value, total_cost = [], 0, 0
for i in range(len(combo)):
if combo[i] == '1':
result.append(items[i])
total_value += items[i].get_value()
total_cost += items[i].get_cost()
if total_cost <= cost_limit and total_value > best_value:
best_soln = result
best_value = total_value
return best_soln, best_value

April 3, 2023 6.100B LECTURE 1 60


Test it out

names = ['wine', 'beer', 'pizza', 'burger', 'fries',


'cola', 'apple', 'donut', 'cake']
values = [89, 90, 95, 100, 90, 79, 50, 10, 85]
calories = [123, 154, 258, 354, 365, 150, 95, 195, 107]

calorie_limit = 750
foods = build_menu(names, values, calories)
solution, value = brute_force(foods, calorie_limit)

Total value of items taken = 409


wine: <89, 123> 5
beer: <90, 154> 3 Rank by Order in selection is based on order
pizza: <95, 258> 2 value in menu, not order of preference
apple: <50, 95> 8
cake: <85, 107> 6 But note that best solution does not
include most valuable item

April 3, 2023 6.100B LECTURE 1 61


Let’s Try a Larger Menu
random integers list comprehensions
import random between 1 and 100

def generate_foods(num_foods):
names = [f'food{n}' for n in range(num_foods)]
values = [random.randint(1, 100) for _ in range(num_foods)]
calories = [random.randint(1, 300) for _ in range(num_foods)]
return names, values, calories

names, values, calories = generate_foods(15)

How about 20 menu items?

How about 30 menu items? Only ≈65,000 times longer

April 3, 2023 6.100B LECTURE 1 62


Are We Just Being Stupid?

▪ Alas, no
▪ 0/1 knapsack problem is inherently exponential
▪ Why? Because the power set grows that fast
◦ Power set of 𝑛 elements has 𝟐𝒏 combinations
◦ Recall that we represent each combination as a string of
𝒏 zeros and ones
◦ So the cost of generating the power set is 𝚯(𝒏𝟐𝒏 )
◦ Remaining brute-force algorithm performs Θ(𝑛) work for
each combination in the power set
▪ But don’t despair

April 3, 2023 6.100B LECTURE 1 63


Five Minute Break

5 minutes

April 3, 2023 6.100B LECTURE 1 64


Are We Just Being Stupid?
▪ Alas, no
▪ 0/1 knapsack problem is inherently exponential
▪ But don’t despair

April 3, 2023 6.100B LECTURE 1 65


Solving the 0/1 Knapsack
Problem:
GREEDY STRATEGY

April 3, 2023 6.100B LECTURE 1 66


Greedy strategy in a nutshell
▪ Repeatedly:
◦ Choose the “best” remaining item
◦ Stuff it in the knapsack only if it will fit

▪ But what does “best” mean?


◦ Most valuable?
◦ Lightest weight (or lowest cost)?
◦ Highest value per unit weight?

April 3, 2023 6.100B LECTURE 1 67


Implementation of Greedy
def greedy(items, cost_limit, key_function):
items_sorted = sorted(items, key=key_function
reverse=True)
result, total_value, total_cost = [], 0, 0
for item in items_sorted:
if total_cost + item.get_cost() <= cost_limit:
result.append(item)
total_value += item.get_value()
total_cost += item.get_cost()
return result, total_value

How does complexity grow relative to len(items)?

April 3, 2023 6.100B LECTURE 1 68


YOUR TURN
▪ How does the complexity of greedy grow
relative to n = len(items)?
1. Θ(𝑛)
2. Θ(𝑛 log 𝑛)
3. Θ(𝑛2 )
4. Θ(2𝑛 )
Way better than Θ(𝑛2𝑛 )

April 3, 2023 6.100B LECTURE 1 70


Using greedy

def test_greedy(foods, calorie_limit, metric):


metrics = {'value': Food.get_value,
'cost': lambda x: float('inf') if x.get_cost() == 0
else 1 / x.get_cost(),
'density': Food.get_density}
solution, value = greedy(foods, calorie_limit, metrics[metric])
print(f'Use greedy by {metric} to allocate {calorie_limit} cals')
print(f'Total value of items taken = {value}')
for item in solution:
print(f' {item}')

Remember we sort items from “best” to ”worst”


• get_value describes how much we like a food
• get_cost returns calories, and less is better
• get_density measures value/calories
April 3, 2023 6.100B LECTURE 1 71
Running the Tests

Rank of choices by metric


Use greedy by value to allocate 750 calories
Total value of items taken = 284 So don’t always have room for next best
burger: <100, 354> 1 Use greedy by density to allocate 750
pizza: <95, 258> 2 calories
wine: <89, 123> 5 Total value of items taken = 393
cake: <85, 107> 1
Use greedy by cost to allocate 750 calories wine: <89, 123> 2
Total value of items taken = 393 beer: <90, 154> 3
apple: <50, 95> 1 cola: <79, 150> 4
cake: <85, 107> 2 apple: <50, 95> 5
wine: <89, 123> 3
cola: <79, 150> 4 Remember best brute force solution
beer: <90, 154> 5 has value of 409
Why Different Answers?
▪ Different metrics lead to different priority orders
▪ While always trying to optimize value, order in which
we consider items may be different
▪ Can’t backtrack
◦ Once we decide on including an item, can’t later decide
that some combination of lower valued items is actually
better than taking that item

April 3, 2023 6.100B LECTURE 1 74


A Problem with Greedy Algorithms
▪ Sequence of locally “optimal” choices don’t always
yield a globally optimal solution
◦ Consider a surface
◦ Local optimization is “hill climbing” – a greedy choice is
equivalent to walking up the steepest slope along a
dimension corresponding to this item until reach a peak
◦ Finding the best solution is possible, but not guaranteed

April 3, 2023 6.100B LECTURE 1 75


Running the Tests using constraint of 1000
Is greedy by cost always a winner? Use greedy by density to allocate 1000
Try test_greedy(foods, 1000) calories
Use greedy by value to allocate 1000 calories Total value of items taken = 488
Total value of items taken = 459 cake: <85, 107>
burger: <100, 354> wine: <89, 123>
pizza: <95, 258> beer: <90, 154>
beer: <90, 154> cola: <79, 150>
wine: <89, 123> apple: <50, 95>
cake: <85, 107> pizza: <95, 258>

Brute Force method


Use greedy by cost to allocate 1000 calories
Total value of items taken = 493
Total value of items taken = 403
wine: <89, 123>
apple: <50, 95>
beer: <90, 154>
cake: <85, 107:
burger: <100, 354>
wine: <89, 123>
cola: <79, 150>
cola: <79, 150>
apple: <50, 95>
beer: <90, 154>
cake: <85, 107>
donut: <10, 195>

April 3, 2023 6.100B LECTURE 1 76


What if we increase the problem size?
▪ Try a menu with 15 items (added juice, carrots,
chocolate bar, celery, onion rings, and Brussels sprouts)
▪ Know that brute force looks at 215 (or 32,768)
combinations so probably still manageable
▪ Run with a budget of 1500 calories

April 3, 2023 6.100B LECTURE 1 77


Running the test
names = ['wine', 'beer', 'pizza', 'burger', 'fries',
'cola', 'apple', 'donut', 'cake', 'juice',
'carrot', 'chocolate', 'celery', 'orings', 'brussels']
values = [89,90,95,100,90,79,50,10,85,80,20,100,10,90,1]
calories = [123,154,258,354,365,150,95,195,107,39,25,406,15,190,38]

foods = build_menu(names, values, calories)


calorie_limit = 1500
Use greedy by cost to allocate 1500 calories Use greedy by density to allocate 1500 calories
Total value of items taken = 699 Total value of items taken = 699
celery: <10, 15> juice: <80, 39>
carrot: <20, 25> carrot: <20, 25>
brussels: <1, 38> cake: <85, 107>
juice: <80, 39> wine: <89, 123>
apple: <50, 95> celery: <10, 15>
cake: <85, 107> beer: <90, 154>
wine: <89, 123> cola: <79, 150>
cola: <79, 150> apple: <50, 95>
beer: <90, 154> orings: <90, 190>
orings: <90, 190> pizza: <95, 258>
donut: <10, 195> donut: <10, 195>
pizza: <95, 258> brussels: <1, 38>
April 3, 2023 6.100B LECTURE 1 78
Running the test
names = ['wine', 'beer', 'pizza', 'burger', 'fries',
'cola', 'apple', 'donut', 'cake', 'juice',
'carrot', 'chocolate', 'celery', 'orings', 'brussels']
values = [89,90,95,100,90,79,50,10,85,80,20,100,10,90,1]
calories = [123,154,258,354,365,150,95,195,107,39,25,406,15,190,38]

foods = build_menu(names, values, calories)


calorie_limit = 1500

Use greedy by value to allocate 1500 calories Use brute force to allocate 1500 calories
Total value of items taken = 574 Total value of items taken = 778
burger: <100, 354> wine: <89, 123>
chocolate: <100, 406> beer: <90, 154>
pizza: <95, 258> pizza: <95, 258>
beer: <90, 154> burger: <100, 354>
orings: <90, 190> cola: <79, 150> By cost and by density
wine: <89, 123> apple: <50, 95> both had an optimal
celery: <10, 15> cake: <85, 107> value of 699
juice: <80, 39>
carrot: <20, 25>
orings: <90, 190>

April 3, 2023 6.100B LECTURE 1 79


And What About Large Menus?
▪ Recall that 30 items was too large for brute force
▪ Let’s try 10,000 with greedy
▪ This is not surprising

30 × 230 ≈ 30,000,000,000

10,000 × log(10,000) ≈ 133,000


somewhere between 13 × 213 and 14 × 214

April 3, 2023 6.100B LECTURE 1 80


The Pros of Greedy
▪ Easy to implement
▪ Computationally efficient
◦ Log linear versus exponential
◦ Some problems could be solved
by greedy algorithm that are not
feasible by brute force

April 3, 2023 6.100B LECTURE 1 81


The Con of Greedy

▪ Does not always yield the best solution


◦ Don’t even know how good the approximation is
◦ Not clear what criterion we should use for greedy – different
choices lead to different solutions
▪ Can’t backtrack
◦ Once we decide on including an item, can’t later decide that
some combination of lower value items is actually better than
taking that item
▪ Suppose we want to find a truly optimal solution, but can’t
afford to use exhaustive search?
◦ Next lecture’s topic

April 3, 2023 6.100B LECTURE 1 83


Take Home Message

▪ Optimization problems cover a range of interesting


scenarios
◦ Often characterized by desire to maximize (or minimize)
an objective function, subject to a constraint(s) on some
parameters of problem
▪ 0/1 knapsack problem describes one broad class of
optimization problems
▪ Brute force approaches often far too expensive
▪ Greedy algorithms can provide reasonable (but not
optimal) solutions in many cases, at low computational
cost
April 3, 2023 6.100B LECTURE 1 84
COURSE POLICIES
COMMON MISUNDERSTANDINGS

April 3, 2023 6.100B LECTURE 1 85


Important links

COURSE INFORMATION
https://introcomp.mit.edu/spring23/information

Piazza forum
https://piazza.com/class/lckt7mgnhk0oa

Staff email
6.100-staff@mit.edu
April 3, 2023 6.100B LECTURE 1 86
Microquizzes
▪ Must take them in-class
▪ Best 3 out of 4, no make-ups
▪ Bring your charged laptop and charger
▪ Click submit on every question
▪ Submit a checkout password at the end

April 3, 2023 6.100B LECTURE 1 87


Checkoffs and Office Hours

▪ Must complete checkoffs in office hours


◦ TA or LA will claim you from help queue
▪ Office hours end at 9 pm each weekday
◦ Except 5 pm on Fridays

▪ Office hours are for any and all questions, not just
checkoffs
▪ If LA can’t answer your question, ask the TA in charge

April 3, 2023 6.100B LECTURE 1 88


Collaboration Policy
▪ Your code should not share the same
syntactic structure as anyone else’s
▪ Do:
◦ Clarify the problem statement
◦ Discuss conceptual strategies
◦ Post error messages on Piazza
◦ Strip any text that could reveal your solution
▪ Don’t:
◦ Agree on what data structures and logic conditions to use
◦ Map out a solution on the board and copy it down
◦ Believe there’s only one way of implementing it
▪ Don't share details that could result in
identical line-by-line structure

April 3, 2023 6.100B LECTURE 1 89


Extensions and Grading
▪ Extensions
◦ None for finger exercises, checkoffs, or quizzes
◦ 3 late days allowed for psets
◦ 6.100A does not carry over
◦ We consider (but do not guarantee) pset extensions
with S^3 support

▪ Grading
◦ Thresholds on total score guarantee certain letter grades
◦ Those below each threshold are considered individually
◦ We don’t assign extra work
◦ TAs will let us know if they regularly interact with you
◦ Email us ASAP if there are exceptional circumstances

Don’t suffer alone!


April 3, 2023 6.100B LECTURE 1 90
Next Time

Complementary Knapsack

Decision Trees

DYNAMIC PROGRAMMING

April 3, 2023 6.100B LECTURE 1 91

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy