Project Task
Project Task
Project Task
System
Task
Develop a recommender system for a movie streaming service.
Using Knime you need to create and test a recommendation system that is as accurate as possible –
document your progress from the beginning, showing your approach, when and where you
encountered problems, made improvements, which algorithms you used, etc.
Dashboard
Once you have completed your recommendation system you need to use the results to present the
data within a dashboard using the Tableau software. Your dashboard should enable the user to filter
according to various data elements. The purpose of the dashboard should be to allow the user an
insight into their viewing preferences, highlighting various aspects that you feel would best represent
an interesting overview of their viewing history. You are limited in the granularity of the data
provided and therefore a thorough deep dive into the data will not be possible.
Material
Several articles are provided on Moodle to give an insight into recommendation systems, however,
the task is for you to research various approaches and come up with your own solution. Draw on
your knowledge gained from the previous tasks to evaluate various approaches such as association
rule within collaborative filtering (phase 2), decision tree based recommender systems (phase 2) or
support vector machine for collaborative filtering (phase 3).
The data-set that you shall use is provided on Moodle as several .csv files within the folder labelled
‘Project Data-set’.
Documentation Requirements
Each student needs to submit 10 pages documenting your progress and results on Moodle under the
link ‘Project Documentation’. It is recommended to document everything you do from the very
beginning in the case that you encounter serious problems you are still able to submit what problems
you encountered, how you went about trying to solve them and whether you successfully achieved
this or not. Dedicate a section (2 pages) to explaining how you solved the python task, even if your
solution is not perfect you will get grades by showing what sources you accessed, what statements
you tested and what the output was.
Knime submission
Each students needs to submit their Knime source file on Moodle under the link ‘Knime files’.
Grading
Due to the nature of the project, there will be numerous solutions possible, some more successful
than others. Therefore, you will be graded, not only on your final solution and how accurate your
recommendation system is, but also on the approach you took, your understanding of the problems
encountered and how you went about trying to solve these.
Keywords
Below is a list of keywords that may help your search for relevant material:
Collaborative filtering, user-based filtering, hybrid filtering, support vector machines, information
gain splitting, Netflix, AWS, association rule, decision tree.
Python Task
You are required to create a file within pythonanywhere called ISProject_yoursurname.py which will
read the movies.csv file as well as a file from an online source and complete the following tasks:
1. Read the first ten movies from the movies.csv file and write them to a new csv file called
yoursurname_output.csv, your output should only include the MovieID and Title, not genre.
Advanced task - remove year from the title column and place it in a new column called year.
2. Read a file from the following online source -
http://pythonscraping.com/files/MontyPythonAlbums.csv and append it to the end of your
output file. Included in your output needs to be the actual date and time that your program
ran to retrieve the data. Advanced task - Assign and include a movieid attribute in the output
which continues from where the previous data ended (i.e. movies.csv movieid 1-10)
3. Prompt the user to enter a title for the output file and write this as the first line in your
output file, to be followed by a blank line. Advanced task - save and convert your python file
to an executable, able to run independent of python environment, this file should be uploaded
to the additional task submission link on Moodle.
Important: you will be required to work within the pythonanywhere workspace so that I can
regularly monitor your work, however, you will also be required to submit your final version by the
project deadline on Moodle under the link ‘Python task’. You must code your work within a single
file, leaving space between each task to clearly identify what code is for what task, although they do
not have to be completed in sequence - use comments to start each section and comment after each
line to give a short description of what each line of code is supposed to achieve.
Advanced tasks will contribute to extra grades above and beyond the 10 allocated to this task, so
only attempt these if you have additional time and/or intermediate/advanced knowledge of python.