Assignment 1: Learning Goals
Assignment 1: Learning Goals
Assignment 1
Due: Mon Feb 4th, 11:59pm
Learning Goals
By the end of this assignment you should be able to:
1. read a new relational schema, and determine whether or not a particular instance is valid with respect
to that schema,
2. apply the individual techniques for writing relational algebra queries and integrity constraints that we
learned in class,
3. combine the individual techniques to solve complex problems, and
4. identify problems that cannot be solved using relational algebra.
Domain
For this assignment, you will operate on a database for Instagram. Instagram is a social media platform on
which users share pictures and videos, and post comments about them. You need to know a few facts about
the domain:
The above may or may not be completely accurate for Instagram; regardless, our schema and your queries
will be based on it.
Schema
Rather than store pictures and videos themselves, our database will store URL references to their locations
in the cloud.
1
Relations
• User(uid, name, website, about, email, phone, photo)
A tuple in this relation represents an Instagram user. uid is the string identifier selected by the user.
name, website, about, email, and phone are information about this user. photo is the url of the profile
photo of this user.
• Follows(follower, followed, start)
A tuple in this relation represents the fact that the user with identifier follower follows the user with
the identifier followed, beginning at date-time start.
• Hashtag(pid, tag)
A tuple in this relation represents the fact that the caption of post pid includes the hashtag tag.
• Likes(liker, pid, when)
A tuple in this relation represents the fact that user liker has liked the post pid at date-time when.
Integrity constraints
• Follows[follower] ⊆ User[uid]
• Follows[followed] ⊆ User[uid]
• Post[uid] ⊆ User[uid]
2
• PIncludes[pid] ⊆ Post[pid]
• Hashtag[pid] ⊆ Post[pid]
• Likes[liker id] ⊆ User[uid]
• Likes[pid] ⊆ Post[pid]
• Comment[pid] ⊆ Post[pid]
• Comment[commenter] ⊆ User[uid]
• Story[uid] ⊆ User[uid]
• SIncludes[sid] ⊆ Story[sid]
• Saw[viewerid] ⊆ User[uid]
• Saw[sid] ⊆ Story[sid]
• σf ollower=f ollowed F ollows = ∅
• Story[current] ⊆ {“yes”, “no”}
To get familiar with the schema, ask yourself questions like these (but don’t hand in your answers):
Part 1: Queries
Write the queries below in relational algebra. There are a number of variations on relational algebra, and
different notations for the operations. You must use the same notation as we have used in class and on the
slides. You may use assignment, and the operators we have used in class: Π, σ, ./, ./condition , ×, ∩, ∪, −, ρ.
Assume that all relations are sets (not bags), as we have done in class, and do not use any of the extended
relational algebra operations from Chapter 5 of the textbook (for example, do not use the division operator).
Some additional points to keep in mind:
3
• Do not make any assumptions about the data that are not enforced by the original constraints given
above, including the ones written in English. Your queries should work for any database that satisfies
those constraints.
• Assume that every tuple has a value for every attribute. For those of you who know some SQL, in
other words, there are no null values.
• Remember that the condition on a select operation may only examine the values of the attributes in
one tuple, not whole columns. In other words, to use a value (other than a literal value such as 100 or
“Adele”), you must get that value into the tuples that your select will examine.
• The condition on a select operation can use comparison operators (such as ≤ and 6=) and boolean
operators (∨, ∧ and ¬). Simple arithmetic is also okay, e.g., attribute1 ≤ attribute2 + 5000.
• Some relations in our schema have a date-time attribute. You may use comparison operators on such
values. You may refer to the year component of a date-time attribute d using the notation d.year.
• You are encouraged to use assignment to define intermediate results.
• It’s a good idea to add commentary explaining what you’re doing. This way, even if your final answer
is not completely correct, you may receive partial marks.
• The order of the columns in the result doesn’t matter.
• When asked for a maximum or minimum, if there are ties, report all of them.
At least one of the queries cannot be expressed in the language that you are using. In those cases, simply
write “cannot be expressed”. Note: The queries are not in order according to difficulty.
1. Find all the users who have never liked or viewed a post or story of a user that they do not follow.
Report their user id and “about” information. Put the information into a relation with attributes
“username” and “description”.
2. Find every hashtag that has been mentioned in at least two post captions on every day of 2018. You
may assume that there is at least one post on each day of a year.
3. Let’s say that a pair of users are “reciprocal followers” if they follow each other. For each pair of
reciprocal followers, find all of their “uncommon followers”: users who follow one of them but not the
other. Report one row for each of the pair’s uncommon follower. In it, include the identifiers of the
reciprocal followers, and the identifier, name and email of the uncommon follower.
4. Find the user who has liked the least posts. Report the user’s id, name and email, and the id of the
posts they have liked. If there is a tie, report them all.
5. Let’s say a pair of users are “backscratchers” if they follow each other and like all of each others’ posts.
Report the user id of all users who follow some pair of backscratcher users.
6. The “most recent activity” of a user is his or her latest story or post. The “most recently active user”
is the user whose most recent activity occurred most recently.
Report the name of every user, and for the most recently active user they follow, report their name
and email, and the date of their most-recent activity. If there is a tie for the most recently active user
that a user follows, report a row for each of them.
7. Report the name and email of the user who has gained the minimum number of new followers in 2018.
If there is a tie, report them all.
8. For each user who has ever put any comments, report their id and the id of the first and of the last
post they commented on.
4
Part 2: Additional Integrity Constraints
Express the following integrity constraints with the notation R = ∅, where R is an expression of relational
algebra. You are welcome to define intermediate results with assignment and then use them in an integrity
constraint.
1. A view on a story must occur after the date-time of the story itself. (Remember that you can compare
two date-time attributes with simple <, >= etc.)
2. Each user can have at most one current story.
When writing your queries for Part 1, don’t assume that these additional integrity constraints hold, except
for the second one — it was described above as a constraint that holds.
• In your assignment statements, you must include names for all attributes in the intermediate relation
you are defining. For example, write
HighestGrade(sID, oID, grade) := ...
• Use meaningful names for intermediate relations and attributes, just as you would in a program.
• If you want to include comments, put them before the algebra that they pertain to, not after. Make
them stand out from the algebra, for example by using a different font. For example, this looks
reasonable:
– Students who had very high grades in any offering of a csc course.
High(sID) := ΠsID σdept=0 csc0 ∧grade>95 (T ook ./ Of f ering)
A modest portion of your mark will be for good style and formatting.
Submission instructions
Your assignment must be typed; handwritten assignments will not be marked. You may use any word-
processing software you like. Many academics use LaTeX. It produces beautifully typeset text and handles
mathematical notation well. If you would like to learn LaTeX, there are helpful resources online. Whatever
you choose to use, you need to produce a final document in pdf format.
You must declare your team (whether it is a team of one, two or three students) and hand in your work
electronically using the MarkUs online system. Instructions for doing so are posted on the Assignments
page of the course website. Well before the due date, you should declare your team and try submitting with
MarkUs. You can submit an empty file as a placeholder, and then submit a new version of the file later
(before the deadline, of course); look in the “Replace” column.
For this assignment, hand in just one file: A1.pdf. If you are working in a pair, only one of you should
hand it in.
Check that you have submitted the correct version of your file by downloading it from MarkUs; new files
will not be accepted after the due date.