Udacity Cs101.Building A Search Engine
Udacity Cs101.Building A Search Engine
Udacity Cs101.Building A Search Engine
Extracting a Link
Introducing the Web Crawler (Video: Web Crawler) ........................................................................... 2 Quiz (Video: First Quiz) ........................................................................................................................................................2 Programming (Video: Programming)....................................................................................................... 2 Quiz (Video: What is a Programming Language) ....................................................................................................3 Getting Started with Python Programming ....................................................................................................................3 Quiz (Video: First Programming Quiz) .........................................................................................................................4 Programming Languages (Video: Would_You_Rather).............................................................................................4 Grammar (Video: Grammar) ....................................................................................................................... 4 Quiz (Video: Eat Quiz) ...........................................................................................................................................................6 Python Grammar for Arithmetic Expressions (Video: Python Expressions) ................................................6 Quiz (Video: Python Expressions) ..................................................................................................................................7 Quiz (Video: Speed of Light) ..............................................................................................................................................7 Admiral Grace Hopper (1906- 1992) (Video: Grace Hopper) ...............................................................................9 Variables (Video: Variables) ....................................................................................................................... 9 Quiz (Video: Variables) ......................................................................................................................................................10 Variables Can Vary (Video: Variables Can Vary) .......................................................................................................10 Quiz (Video: Varying Variables Quiz 1) .....................................................................................................................11 Quiz (Video: Varying Variables Quiz 2) .....................................................................................................................11 Quiz (Video: Spirit Age) .....................................................................................................................................................12 Strings (Video: Strings) ............................................................................................................................... 12 Quiz (Video: Valid Strings)...............................................................................................................................................12 August Ada King (Video: Ada) .............................................................................................................................................13 Quiz (Video: Hello!!!) ..........................................................................................................................................................13 Using Operators on Strings (Video: Strings and Numbers) .................................................................................13 Indexing Strings (Video: Indexing Strings) ..................................................................................................................14 Quiz (Video: Same Value) .................................................................................................................................................15 Selecting Sub-Sequences from String (Video: Selecting Sub-Sequences) ....................................................15 Quiz (Video: Capital Udacity) .........................................................................................................................................15 Quiz (Video: Understanding Selection) .....................................................................................................................16 Finding Strings in Strings (Video: Finding Strings in Strings)............................................................................16 Quiz (Video: Testing) ..........................................................................................................................................................16 Quiz (Video: Testing) ..........................................................................................................................................................17 Using find with Numbers (Video: Finding with Numbers) ..................................................................................17 Quiz (Video: Finding with Numbers Quiz) ..............................................................................................................18 Extracting Links (Video: Extracting Links)............................................................................................ 18 Quiz (Video: Extracting Links) .......................................................................................................................................20 Quiz (Video: Final Quiz) ....................................................................................................................................................20
To try running your code, click the "Run" button avoid the editor.
We can see the value of something in Python by using print like this:
print 3
prints out the value 3. The expression after the print can be any value Python expression. Here are some examples: 1 + 1 2 - 1 2 * 6 addition subtraction multiplication 3
is different from
52 * (3 + 12) * 9
For example, this code prints out the the number of hours in a day:
print 365 * 24 * 60 * 60
Basic English Grammar Rules: Sentence Subject Verb Object Subject Noun Object Noun Verb Eat Verb Like Noun I Noun Python Noun Cookies When programming language grammar is not followed the interpreter will return a "SyntaxError" message. This means that the structure of the code is inconsistent with the rules of the programming language. Backus-Naur Form (Video: Backus-Naur_Form) The notation we used to describe the grammar is known as Bakus-Naur Form. It was introduced in the 1950s by John Backus, the lead designer of the Fortran programming language at IBM. The purpose of Bakus-Naur Form is to describe a programming language in a simple and concise manner. The structure of this form is: <Non-Terminal> replacement The replacement can be any sequence of zero or more non-terminals or terminals. Terminals never appear on the left side of a rule. Once you get to a terminal there is nothing else you can replace it with. Here is an example showing to derive a sentence by following the replacement rules: Sentence Subject Verb Object Noun Verb Object I Verb Object I Like Object I Like Noun I Like Python
The important thing about a replacement grammar is that we can describe an infinitely large language with a small set of precise rules.
Here is one of the rules of the Python grammar for making expressions: Expression Expression Operator Expression The Expression non-terminal that appears on the left side can be replaced by an Expression, followed by an Operator, followed by another Expression. For example, 1 + 1 is an Expression Operator Expression. The interesting thing about this rule is that it has Expresison on both the left and right sides! This looks circular, and would be, except we also have other rules for Expresison that do not include Expression on the right side. This is an example of a recursive definition. To make a good recursive definition you need at least two rules: 1. 2. A rule to that defines something in terms of itself. Expression Expression Operator Expression A rule to that defines that thing in terms of something else that we already know. Expression Number
Recursive definitions are a very powerful idea in computer science. They allow us to define infinitely many things using a few simple rules. We will talk about this a lot more in Unit 6. Here are some of the Python grammar rules for arithmetic expressions: Expression Expression Operator Expression Expression Number Operator + Operator * Number 0, 1, ...
Here is an example derivation using this grammar: Expression Expression Operator Expression Expression + Expression Expression + Number Expression + 1 Expression Operator Expression + 1 Number Operator Expression + 1 2 Operator Expression + 1 2 * Expression + 1 2 * Expression Operator Expression + 1 2 * Number Operator Expression + 1 2 * 3 Operator Expression + 1 2 * 3 * Expression + 1 2 * 3 * Number + 1 2*3*3+1
(Note: the example here is slightly different than the one in the video. The 3+3 expression has been changed to 3*3, since the precedence rules in Python would have grouped 2 * 3 + 3 + 1 as (2 * 3) + 3 + 1, so it would not be interpreted as shown in the derivation.)
We need to add one more rule to our expression grammar to be able to produce all of the expressions we have used so far: Expression (Expression)
using is a 2.7 GHz processor. The GHz means gigahertz which is a billion cycles per second. So, the computer executes 2700000000 cycles per second. You can think of each cycle as executing a very small instruction step. If you are using a Mac, you can see how fast your processor is by selecting the Apple menu and choosing About this Mac. If you are using a Windows 7 machine, open the Control Panel and select System and Security, then under System select View amount of RAM and processor speed. We can compute how far light travels in the time it takes for the computer to complete one cycle:
print 299792458 * 100 * 1.0/1000000000 *1/2.7
11.1034243704
This is approximately 3/4 of the length of a dollar bill.
A processor is the part of the computer that carries out the steps specified in a computer program. Sometimes people call the processor the "central processing unit" or CPU. A processor has to be small to execute programs quickly. If your computer's processor were any larger than the size of a dollar bill, then you couldn't even send light from one end of the processor to the other before finishing the execution of a single step in a program.
To introduce a new variable, we use an assignment statement: Name = Expression After executing an assignment expression, the name refers to the value of the expression on the right side of the assignment:
speed_of_light = 299792458
We can use the variable name anywhere we want and it means the same things as the value it refers to. Here is an expression using the name to print out the speed of light in centimeters:
print speed_of_light * 100
You can create new variables to keep track of values in programs. Here is an expression to find the length of the nanostick in centimeters:
speed_of_light = 299792458 billionth = 1.0 / 1000000000 nanostick = speed_of_light * billionth * 100 print nanostick
0.111.34243704
cycle_distance = speed_of_light/ cycles_per_second print cycle_distance
0.107068735
Since the value that a variable refers to can change, the same exact expression can have different values at the different times it is executed.
10
This gets more interesting when we use the same variable on both sides of an assignment. The right side is evaluated first, using the current value of the variable. Then the assignment is done using that value. In the following expressions, the value of days changes from 49 to 48 and then to 47 as the expression changes:
days days days days = = = = 7 * 7 48 days - 1 days - 1 # # # # after after after after the the the the assignment, assignment, assignment, assignment, days days days days refers refers refers refers to to to to 49 48 47 46
It is important to remember that although we use = for assignment it does not mean equality. You should think of the = sign in Python as an arrow, , showing that the value the right side evaluates to is being assigned to the variable name on the left side.
a. 9 b. 10 c. 18 d. 20 e. 22 f. Error
a. 0 b. 60 c. 120 d. Error For Python to be able to output a result, we need to always define a variable by assigning a value to it before using it.
minutes = 30 minutes = minutes +1 seconds + minutes * 60 print seconds
1860
11
9490
The only requirement is that the string must start and end with the same kind of quote.
"I prefer double quotes!"
This allows you to include quotes inside of quotes as a character in the string.
"I'm happy I started with a double quote!"
Using the interpreter, notice how the color of the input changes before and after you put quotes on both sides of the string. What happens when you do not include any quotes:
print Hello
As we saw above, Python will not print an undefined variable, which is why we get the name error.
Try concatenating the string 'Hello' to name. You can create a space between the strings by adding a space to one of the strings. You can also continue to add strings as many times as you need.
name = 'Dave' print 'Hello ' + name + '!' + '!' + '!'
Hello Dave!!!
However, you cannot use the plus operator to combine strings and integers, as in the case of:
print 'My name is ' + 9
When you run this program you should see an error message like this:
TypeError: cannot concatenate 'str' and 'int' objects.
This program multiplies the string by the integer to return 12 exclamation points!
!!!!!!!!!!!!
13
The positions in a string are numbered starting with 0, so this evaluates to'u'. Indexing strings is the most useful when the string is given using a variable:
'udacity'[0] 0 'udacity'[1 +1 ] 'a' name = 'Dave' name[0] 'D'
When you use negative numbers in the index it starts counting from the back of the string:
name = 'Dave' print name[-1]
e
or,
name='Dave' print name[-2]
v
When you try to index a character in a position where there is none, Python produces an error indicating that the index is out of range:
name = 'Dave' print name[4]
14
which of these pairs are two things with the exact same value? a. s[3], b. s[0], c. s[0] + s[1], d. s[1], e. s[-1], s[1+1+1] (s+s)[0] s[0+1] (s + 'ity') [1] (s + s)[-1]
print word[4:6]
me me as
print word[4:]
assume
15
which of these is always equivalent to s: a. s[:] b. s + s[0:-1 +1] c. s[0:] d. s[:-1] e. s[:3] + s[3:]
40
86
print pythagoras[86:]
spheres -1
print pythagoras.find('algebra')
16
which of the following always has the value 0? a. s.find(s) b. s.find('s') c. 's'.find('s') d. s.find('') e. s.find(s + '!!!') +1
5 5 5
25 25 47
17
which of these are equivalent to s.find(t,i): a. s[i: ].find(t) b. s.find(t)[ :i] c. s[i: ].find(t) + i d. s[i: ].find(t[i: ])
18
The raw string for the web page pops up in a new window:
For our web crawler, the important thing is to find the links to other web pages in the page. We can find those links by looking for the anchor tags that match this structure:
<a href="<url>">
To build our crawler, for each web page we want to find all the link target URLs on the page. We want to keep track of them and follow them to find more content on the web. For this unit, we will do the first step which is to extract the first target URL from the page. In Unit 2, we will see how to keep going to get all the link targets, and in Unit 3, we will see how to keep track of them to be able to crawl the target pages. For now, our goal is to take the text from a web request and find the first link target in that text. We can do this by finding the anchor tag, <a href=", and then extract from that tag the URL that is found between the double quotes. We will assume that with the page's contents in a variable, page.
19
Yay! You did it and are off to a great start! You've learned about programs, variables, expressions, and strings, and a well on your way to building a web crawler. Next unit, we will learn some big ideas in computer science that will make this code more useful and enable us to get all the links on the page, not just the first one.
20